Question... Planning a Spark DataSource (v2...) to...
# dev
a
Question... Planning a Spark DataSource (v2...) to read Rocks metadata. It will run on a JVM. Is there a particular community preference for language (Java or Scala) in terms of readability? In terms of hackability? In any other terms?
o
I think Java is still the lowest common denominator.. Most ppl that speak scala also know java, but not necessarily the other way around
a
OK. Most documentation I could find seems to assume Scala, though.
o
What are delta hudi iceberg written in?
a
DeltaLake: Scala Hudi: Java with a sprinkiling of Scala. Iceberg: Java So Java leads by a small margin by these projects. Spark itself seems mostly Scala, which may be why so many presentations about it use Scala. So... they're neck-and-neck.
o
So assuming we ever decide on supporting other Hadoop ecosystem tools like hive or Presto, Java code will probably be reusable while scala code would introduce a huge dependency? (asking, not declaring)
a
Not sure why: I would expect to generate (roughly) the same compiled form.
BTW Spark docs are funny. E.g. https://spark.apache.org/docs/latest/api/java/index.html gives examples in Scala.
o
So there's no strong reason to go with either?
a
I don't see a strong technical reason. I want to go with one of them, if many users say: 1. They will look at it, and 2. They prefer a particular language. Otherwise the advantage of Java is familiarity on the home team, and of Scala is feeling more Spark-y.