Vaibhav Kumar
05/19/2023, 7:06 PMspark-shell --conf spark.hadoop.fs.s3a.access.key=minioadmin\
--conf spark.hadoop.fs.s3a.secret.key=minioadmin\
--conf spark.hadoop.fs.s3a.endpoint=<http://127.0.0.1:9090>\
--conf spark.hadoop.fs.lakefs.impl=io.lakefs.LakeFSFileSystem\
--conf spark.hadoop.fs.lakefs.access.key=AKIAIOSFODNN7EXAMPLE\
--conf spark.hadoop.fs.lakefs.secret.key='wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY'\
--conf spark.hadoop.fs.lakefs.endpoint=<http://localhost:8000/api/v1>\
—jars /Users/simar/lakeFS/clients/hadoopfs/target/hadoop-lakefs-0.1.0.jar\
io.lakefs.LakeFSFileSystem
While reading it I am getting the below error
scala> val df = spark.read.parquet("<lakefs://example/main/sample1.json>")
23/05/20 00:24:42 WARN FileStreamSink: Assume no metadata directory. Error while looking for metadata directory in the path: <lakefs://example/main/sample1.json>.
java.lang.RuntimeException: java.lang.ClassNotFoundException: Class io.lakefs.LakeFSFileSystem not found
Does anyone know what could be causing this?Iddo Avneri
05/19/2023, 7:35 PMBarak Amar
Vaibhav Kumar
05/20/2023, 6:31 AM--packages
instread of the --jars
in spark shell in getting some error around object store
23/05/20 11:58:02 WARN FileSystem: Failed to initialize fileystem <lakefs://example/main/sample1.json>: java.io.IOException: Failed to get lakeFS blockstore type
java.io.IOException: Failed to get lakeFS blockstore type
-
in the jars
. After correcting it now I am getting a different error.
scala> val df = spark.read.json("<lakefs://example/main/sample1.json>")
java.lang.NoClassDefFoundError: io/lakefs/clients/api/ApiException
at java.base/java.lang.Class.forName0(Native Method)
at java.base/java.lang.Class.forName(Class.java:496)
at java.base/java.lang.Class.forName(Class.java:475)
at org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:2625)
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2590)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2686)
at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:3431)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3466)
at org.apache.hadoop.fs.FileSystem.access$300(FileSystem.java:174)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3574)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3521)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:540)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:365)
at org.apache.spark.sql.execution.streaming.FileStreamSink$.hasMetadata(FileStreamSink.scala:53)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:370)
at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:228)
at org.apache.spark.sql.DataFrameReader.$anonfun$load$2(DataFrameReader.scala:210)
at scala.Option.getOrElse(Option.scala:189)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:210)
at org.apache.spark.sql.DataFrameReader.json(DataFrameReader.scala:361)
at org.apache.spark.sql.DataFrameReader.json(DataFrameReader.scala:340)
... 43 elided
Caused by: java.lang.ClassNotFoundException: io.lakefs.clients.api.ApiException
Ariel Shaqed (Scolnicov)
05/20/2023, 7:44 AMVaibhav Kumar
05/20/2023, 7:55 AMAriel Shaqed (Scolnicov)
05/20/2023, 8:47 AMVaibhav Kumar
05/20/2023, 10:42 AMmvn package
under the hadoopfs client. I can only see the hadoop-lakefs-0.1.0.jar
. Please refer the screenshot once. I can't see any asssembly jar.Ariel Shaqed (Scolnicov)
05/20/2023, 10:47 AMVaibhav Kumar
05/20/2023, 10:59 AMResults :
Tests in error:
testExists_NotExistsNoPrefix(io.lakefs.LakeFSFileSystemPresignedModeTest): Unable to execute HTTP request: Read timed out
Tests run: 105, Failures: 0, Errors: 1, Skipped: 0
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 08:08 min
[INFO] Finished at: 2023-05-20T16:26:36+05:30
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:2.12.4:test (default-test) on project hadoop-lakefs-assembly: There are test failures.
[ERROR]
[ERROR] Please refer to /Users/simar/lakeFS/clients/hadoopfs/target/surefire-reports for the individual test results.
[ERROR] -> [Help 1]
Ariel Shaqed (Scolnicov)
05/20/2023, 12:50 PMVaibhav Kumar
05/20/2023, 1:17 PM--packages io.lakefs:hadoop-lakefs-assembly:0.1.14
.I am not sure why I am getting blocktype error considering I have already passed params them through my spark shell.
Spark shell command
spark-shell --conf spark.hadoop.fs.s3a.access.key=minioadmin --conf spark.hadoop.fs.s3a.secret.key=minioadmin --conf spark.hadoop.fs.s3a.endpoint=<http://127.0.0.1:9090> --conf spark.hadoop.fs.lakefs.impl=io.lakefs.LakeFSFileSystem --conf spark.hadoop.fs.lakefs.access.key=AKIAIOSFODNN7EXAMPLE --conf spark.hadoop.fs.lakefs.secret.key='wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY' --conf spark.hadoop.fs.lakefs.endpoint=<http://localhost:8000/api/v1> —jars /Users/simar/lakeFS/clients/hadoopfs/target/hadoop-lakefs-assembly-0.1.0.jar
--class io.lakefs.LakeFSFileSystem
Error while reading
scala> val df = spark.read.json("<lakefs://example/main/sample1.json>")
23/05/20 18:42:18 WARN FileSystem: Failed to initialize fileystem <lakefs://example/main/sample1.json>: java.io.IOException: Failed to get lakeFS blockstore type
23/05/20 18:42:18 WARN FileStreamSink: Assume no metadata directory. Error while looking for metadata directory in the path: <lakefs://example/main/sample1.json>.
java.io.IOException: Failed to get lakeFS blockstore type
at io.lakefs.LakeFSFileSystem.initializeWithClientFactory(LakeFSFileSystem.java:134)
at io.lakefs.LakeFSFileSystem.initialize(LakeFSFileSystem.java:110)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3469)
at org.apache.hadoop.fs.FileSystem.access$300(FileSystem.java:174)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3574)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3521)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:540)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:365)
at org.apache.spark.sql.execution.streaming.FileStreamSink$.hasMetadata(FileStreamSink.scala:53)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:370)
at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:228)
at org.apache.spark.sql.DataFrameReader.$anonfun$load$2(DataFrameReader.scala:210)
at scala.Option.getOrElse(Option.scala:189)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:210)
at org.apache.spark.sql.DataFrameReader.json(DataFrameReader.scala:361)
at org.apache.spark.sql.DataFrameReader.json(DataFrameReader.scala:340)
at $line14.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(<console>:22)
at $line14.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(<console>:26)
at $line14.$read$$iw$$iw$$iw$$iw$$iw$$iw.<init>(<console>:28)
at $line14.$read$$iw$$iw$$iw$$iw$$iw.<init>(<console>:30)
at $line14.$read$$iw$$iw$$iw$$iw.<init>(<console>:32)
at $line14.$read$$iw$$iw$$iw.<init>(<console>:34)
at $line14.$read$$iw$$iw.<init>(<console>:36)
at $line14.$read$$iw.<init>(<console>:38)
at $line14.$read.<init>(<console>:40)
at $line14.$read$.<init>(<console>:44)
at $line14.$read$.<clinit>(<console>)
at $line14.$eval$.$print$lzycompute(<console>:7)
at $line14.$eval$.$print(<console>:6)
at $line14.$eval.$print(<console>)
at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104)
at java.base/java.lang.reflect.Method.invoke(Method.java:578)
at scala.tools.nsc.interpreter.IMain$ReadEvalPrint.call(IMain.scala:747)
at scala.tools.nsc.interpreter.IMain$Request.loadAndRun(IMain.scala:1020)
at scala.tools.nsc.interpreter.IMain.$anonfun$interpret$1(IMain.scala:568)
at scala.reflect.internal.util.ScalaClassLoader.asContext(ScalaClassLoader.scala:36)
at scala.reflect.internal.util.ScalaClassLoader.asContext$(ScalaClassLoader.scala:116)
at scala.reflect.internal.util.AbstractFileClassLoader.asContext(AbstractFileClassLoader.scala:41)
at scala.tools.nsc.interpreter.IMain.loadAndRunReq$1(IMain.scala:567)
at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:594)
at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:564)
at scala.tools.nsc.interpreter.ILoop.interpretStartingWith(ILoop.scala:865)
at scala.tools.nsc.interpreter.ILoop.command(ILoop.scala:733)
at scala.tools.nsc.interpreter.ILoop.processLine(ILoop.scala:435)
at scala.tools.nsc.interpreter.ILoop.loop(ILoop.scala:456)
at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:239)
at org.apache.spark.repl.Main$.doMain(Main.scala:78)
at org.apache.spark.repl.Main$.main(Main.scala:58)
at org.apache.spark.repl.Main.main(Main.scala)
at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104)
at java.base/java.lang.reflect.Method.invoke(Method.java:578)
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at <http://org.apache.spark.deploy.SparkSubmit.org|org.apache.spark.deploy.SparkSubmit.org>$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:958)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1046)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1055)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: io.lakefs.hadoop.shade.api.ApiException: Unauthorized
at io.lakefs.hadoop.shade.api.ApiClient.handleResponse(ApiClient.java:1031)
at io.lakefs.hadoop.shade.api.ApiClient.execute(ApiClient.java:944)
at io.lakefs.hadoop.shade.api.ConfigApi.getStorageConfigWithHttpInfo(ConfigApi.java:466)
at io.lakefs.hadoop.shade.api.ConfigApi.getStorageConfig(ConfigApi.java:447)
at io.lakefs.LakeFSFileSystem.initializeWithClientFactory(LakeFSFileSystem.java:130)
... 58 more
23/05/20 18:42:18 WARN FileSystem: Failed to initialize fileystem <lakefs://example/main/sample1.json>: java.io.IOException: Failed to get lakeFS blockstore type
java.io.IOException: Failed to get lakeFS blockstore type
at io.lakefs.LakeFSFileSystem.initializeWithClientFactory(LakeFSFileSystem.java:134)
at io.lakefs.LakeFSFileSystem.initialize(LakeFSFileSystem.java:110)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3469)
at org.apache.hadoop.fs.FileSystem.access$300(FileSystem.java:174)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3574)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3521)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:540)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:365)
at org.apache.spark.sql.execution.datasources.DataSource$.$anonfun$checkAndGlobPathIfNecessary$1(DataSource.scala:752)
at scala.collection.immutable.List.map(List.scala:293)
at org.apache.spark.sql.execution.datasources.DataSource$.checkAndGlobPathIfNecessary(DataSource.scala:750)
at org.apache.spark.sql.execution.datasources.DataSource.checkAndGlobPathIfNecessary(DataSource.scala:579)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:408)
at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:228)
at org.apache.spark.sql.DataFrameReader.$anonfun$load$2(DataFrameReader.scala:210)
at scala.Option.getOrElse(Option.scala:189)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:210)
at org.apache.spark.sql.DataFrameReader.json(DataFrameReader.scala:361)
at org.apache.spark.sql.DataFrameReader.json(DataFrameReader.scala:340)
... 43 elided
Caused by: io.lakefs.hadoop.shade.api.ApiException: Unauthorized
at io.lakefs.hadoop.shade.api.ApiClient.handleResponse(ApiClient.java:1031)
at io.lakefs.hadoop.shade.api.ApiClient.execute(ApiClient.java:944)
at io.lakefs.hadoop.shade.api.ConfigApi.getStorageConfigWithHttpInfo(ConfigApi.java:466)
at io.lakefs.hadoop.shade.api.ConfigApi.getStorageConfig(ConfigApi.java:447)
at io.lakefs.LakeFSFileSystem.initializeWithClientFactory(LakeFSFileSystem.java:130)
Ariel Shaqed (Scolnicov)
05/20/2023, 2:22 PMVaibhav Kumar
05/20/2023, 5:39 PMAriel Shaqed (Scolnicov)
05/22/2023, 5:42 AMdocker logs lakefs
.Vaibhav Kumar
05/22/2023, 5:58 AMJonathan Rosenberg
05/22/2023, 8:42 AMdocker build
and then set it as the docker image that docker compose uses to run lakeFS.
You can instead, for the meantime, use the following image as the lakeFS server:
treeverse/experimental-lakefs:v0.100.0-23-g8b29-rc-rbac3
Vaibhav Kumar
05/22/2023, 8:59 AMtreeverse/experimental-lakefs:v0.100.0-23-g8b29-rc-rbac3
.
When you say latest image, I referred this as latest image: treeverse/lakefs:latest
. I believe this would also pull the latest lakfefs only? Correct me if i am wrong.
@Jonathan RosenbergJonathan Rosenberg
05/22/2023, 9:23 AMimage: treeverse/experimental-lakefs:v0.100.0-23-g8b29-rc-rbac3
Vaibhav Kumar
05/22/2023, 10:26 AMversion: "3.5"
services:
lakefs:
image: treeverse/experimental-lakefs:v0.100.0-23-g8b29-rc-rbac3
container_name: lakefs
depends_on:
- minio-setup
ports:
- "8000:8000"
environment:
- LAKEFS_DATABASE_TYPE=local
- LAKEFS_BLOCKSTORE_TYPE=s3
- LAKEFS_BLOCKSTORE_S3_FORCE_PATH_STYLE=true
- LAKEFS_BLOCKSTORE_S3_ENDPOINT=<http://minio:9000>
- LAKEFS_BLOCKSTORE_S3_CREDENTIALS_ACCESS_KEY_ID=minioadmin
- LAKEFS_BLOCKSTORE_S3_CREDENTIALS_SECRET_ACCESS_KEY=minioadmin
- LAKEFS_AUTH_ENCRYPT_SECRET_KEY=some random secret string
- LAKEFS_STATS_ENABLED
- LAKEFS_LOGGING_LEVEL
- LAKECTL_CREDENTIALS_ACCESS_KEY_ID=AKIAIOSFODNN7EXAMPLE
- LAKECTL_CREDENTIALS_SECRET_ACCESS_KEY=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
- LAKECTL_SERVER_ENDPOINT_URL=<http://localhost:8000>
entrypoint: ["/bin/sh", "-c"]
command:
- |
lakefs setup --local-settings --user-name docker --access-key-id AKIAIOSFODNN7EXAMPLE --secret-access-key wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY || true
lakefs run --local-settings &
wait-for -t 60 lakefs:8000 -- lakectl repo create <lakefs://example> <s3://example> || true
wait
minio-setup:
image: minio/mc
container_name: minio-setup
environment:
- MC_HOST_lakefs=<http://minioadmin:minioadmin@minio:9000>
depends_on:
- minio
command: ["mb", "lakefs/example"]
minio:
image: minio/minio
container_name: minio
ports:
- "9000:9000"
- "9001:9001"
entrypoint: ["minio", "server", "/data", "--console-address", ":9001"]
networks:
default:
name: bagel
Jonathan Rosenberg
05/22/2023, 10:30 AMVaibhav Kumar
05/22/2023, 12:26 PMscala> val df = spark.read.json("<lakefs://example/main/sample1.json>")
23/05/22 15:52:12 WARN FileSystem: Failed to initialize fileystem <lakefs://example/main/sample1.json>: java.io.IOException: Failed to get lakeFS blockstore type
java.io.IOException: Failed to get lakeFS blockstore type
at io.lakefs.LakeFSFileSystem.initializeWithClientFactory(LakeFSFileSystem.java:136)
at io.lakefs.LakeFSFileSystem.initialize(LakeFSFileSystem.java:112)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3469)
at org.apache.hadoop.fs.FileSystem.access$300(FileSystem.java:174)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3574)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3521)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:540)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:365)
at org.apache.spark.sql.execution.datasources.DataSource$.$anonfun$checkAndGlobPathIfNecessary$1(DataSource.scala:752)
at scala.collection.immutable.List.map(List.scala:293)
at org.apache.spark.sql.execution.datasources.DataSource$.checkAndGlobPathIfNecessary(DataSource.scala:750)
at org.apache.spark.sql.execution.datasources.DataSource.checkAndGlobPathIfNecessary(DataSource.scala:579)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:408)
at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:228)
at org.apache.spark.sql.DataFrameReader.$anonfun$load$2(DataFrameReader.scala:210)
at scala.Option.getOrElse(Option.scala:189)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:210)
at org.apache.spark.sql.DataFrameReader.json(DataFrameReader.scala:361)
at org.apache.spark.sql.DataFrameReader.json(DataFrameReader.scala:340)
... 43 elided
Caused by: io.lakefs.hadoop.shade.api.ApiException: Unauthorized
at io.lakefs.hadoop.shade.api.ApiClient.handleResponse(ApiClient.java:1031)
at io.lakefs.hadoop.shade.api.ApiClient.execute(ApiClient.java:944)
at io.lakefs.hadoop.shade.api.ConfigApi.getStorageConfigWithHttpInfo(ConfigApi.java:466)
at io.lakefs.hadoop.shade.api.ConfigApi.getStorageConfig(ConfigApi.java:447)
at io.lakefs.LakeFSFileSystem.initializeWithClientFactory(LakeFSFileSystem.java:132)
... 61 more
Ariel Shaqed (Scolnicov)
05/23/2023, 8:48 AMVaibhav Kumar
05/23/2023, 6:18 PM