Vaibhav Kumar
03/20/2023, 2:13 PMLynn Rozen
03/20/2023, 2:31 PMVaibhav Kumar
03/20/2023, 2:32 PMLynn Rozen
03/20/2023, 2:32 PMVaibhav Kumar
03/20/2023, 2:34 PMLynn Rozen
03/20/2023, 3:17 PMVaibhav Kumar
04/01/2023, 11:40 AMspark.read.parquet("<lakefs://repo-that-doesnt-exist/main/path/to/data>")
for the control to come to HadoopFS client
the lakeFS should be running somewhere right and within that I will have to put the debugger.
Does that make sense?Yoni Augarten
04/01/2023, 11:45 AMVaibhav Kumar
04/01/2023, 11:51 AMspark.read.parquet("<lakefs://repo-that-doesnt-exist/main/path/to/data>") -
-> I know i can put a debugger on spark app side
2. Hadoopfs The server side I am not sure how to use it in debugger mode. And how things will land here?Yoni Augarten
04/01/2023, 11:55 AMmake all
4. Import the project to your IDE.
5. Run cmd/lakefs/cmd/main.go
with the run --local-settings
arguments. (you will later change these to connect your installation to the storage).
6. Come back here and let me know how it works 🙂Vaibhav Kumar
04/01/2023, 12:00 PMmake-all
I got the below error
go: downloading <http://golang.org/x/xerrors|golang.org/x/xerrors> v0.0.0-20200804184101-5ec99f83aff1
go: downloading <http://golang.org/x/sys|golang.org/x/sys> v0.0.0-20210510120138-977fb7262007
Makefile:103: *** "Missing dependency - no docker in PATH". Stop.
Yoni Augarten
04/01/2023, 12:06 PMmake gen
instead - but you may still need docker.Vaibhav Kumar
04/01/2023, 12:57 PMgo run main.go run --local-settings
As main.go was a Go file and our HadoopFS client is in Java so how do I link these things together. Eventually I have to put breakpoints in HadoopFS client code.Yoni Augarten
04/01/2023, 1:17 PMclients/hadoopfs
as a separate IDE project.
2. Write a main method to call the code that you want to test. You will set the fs.lakefs.*
configuration to point to your local instance of lakeFS.
This is simpler than running a Spark program and debugging it - although that's also possible. Let me know if that makes sense.Vaibhav Kumar
04/01/2023, 1:31 PMgetFileStatus
(Line 749) is the first function where the call goes to.This function expect path
as a param, So I hope this the same spark path which we pass from spark.read.parquet("<lakefs://repo-that-doesnt-exist/main/path/to/data>")
Kindly confirm if my observation is correct.
Trace from the issue 2801
java.io.IOException: listObjects
at io.lakefs.LakeFSFileSystem$ListingIterator.readNextChunk(LakeFSFileSystem.java:901)
at io.lakefs.LakeFSFileSystem$ListingIterator.hasNext(LakeFSFileSystem.java:881)
at io.lakefs.LakeFSFileSystem.getFileStatus(LakeFSFileSystem.java:707)
at io.lakefs.LakeFSFileSystem.getFileStatus(LakeFSFileSystem.java:40)
at org.apache.hadoop.fs.FileSystem.isDirectory(FileSystem.java:1439)
Yoni Augarten
04/01/2023, 1:38 PMlakefs://
paths when calling getFileStatus
on the LakeFSFileSystem.Vaibhav Kumar
04/01/2023, 1:59 PMYoni Augarten
04/01/2023, 2:04 PMVaibhav Kumar
04/01/2023, 7:05 PMCaused by: java.lang.IllegalArgumentException: Unsupported class file major version 63
at net.bytebuddy.jar.asm.ClassReader.<init>(ClassReader.java:196)
at net.bytebuddy.jar.asm.ClassReader.<init>(ClassReader.java:177)
at net.bytebuddy.jar.asm.ClassReader.<init>(ClassReader.java:163)
at net.bytebuddy.utility.OpenedClassReader.of(OpenedClassReader.java:86)
at net.bytebuddy.dynamic.scaffold.TypeWriter$Default$ForInlining.create(TypeWriter.java:3889)
at net.bytebuddy.dynamic.scaffold.TypeWriter$Default.make(TypeWriter.java:2166)
at net.bytebuddy.dynamic.scaffold.inline.RedefinitionDynamicTypeBuilder.make(RedefinitionDynamicTypeBuilder.java:224)
at net.bytebuddy.dynamic.scaffold.inline.AbstractInliningDynamicTypeBuilder.make(AbstractInliningDynamicTypeBuilder.java:123)
at net.bytebuddy.dynamic.DynamicType$Builder$AbstractBase.make(DynamicType.java:3659)
at org.mockito.internal.creation.bytebuddy.InlineBytecodeGenerator.transform(InlineBytecodeGenerator.java:391)
at java.instrument/java.lang.instrument.ClassFileTransformer.transform(ClassFileTransformer.java:244)
at java.instrument/sun.instrument.TransformerManager.transform(TransformerManager.java:188)
at java.instrument/sun.instrument.InstrumentationImpl.transform(InstrumentationImpl.java:541)
at java.instrument/sun.instrument.InstrumentationImpl.retransformClasses0(Native Method)
at java.instrument/sun.instrument.InstrumentationImpl.retransformClasses(InstrumentationImpl.java:169)
at org.mockito.internal.creation.bytebuddy.InlineBytecodeGenerator.triggerRetransformation(InlineBytecodeGenerator.java:276)
... 46 more
Running io.lakefs.FSConfigurationTest
Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0 sec
Results :
Tests in error:
testGetFileStatus_ExistingFile(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
testExists_ExistsAsDirectoryInSecondList(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
testExists_NotExistsNoPrefix(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
testRename_existingDirToExistingFileName(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
testDeleteDirectoryRecursiveBatch120(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
testDeleteDirectoryRecursiveBatch123(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
testCreateExistingDirectory(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
testExists_ExistsAsDirectoryContents(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
testRename_srcEqualsDst(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
testRename_existingDirToNonExistingDirWithParent(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
getUri(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
testOpen(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
testDelete_FileNotExists(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
testRename_existingDirToExistingNonEmptyDirName(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
testRename_existingFileToExistingDirName(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
testListStatusDirectory(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
testExists_NotExistsPrefixWithNoSlashTwoLists(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
testRename_nonExistingSrcFile(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
testExists_NotExistsPrefixWithNoSlash(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
testListStatusNotFound(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
testRename_srcAndDstOnDifferentBranch(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
testDelete_NotExistsRecursive(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
testDelete_FileExists(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
testDelete_EmptyDirectoryExists(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
testExists_ExistsAsObject(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
testDeleteDirectoryRecursiveBatch1(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
testDeleteDirectoryRecursiveBatch2(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
testDeleteDirectoryRecursiveBatch3(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
testDeleteDirectoryRecursiveBatch5(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
testGetFileStatus_NoFile(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
testCreateExistingFile(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
testRename_existingDirToNonExistingDirWithoutParent(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
testListStatusFile(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
testAppend(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
testCreate(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
testRename_fallbackStageAPI(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
testOpen_NotExists(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
testMkdirs(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
testGetFileStatus_DirectoryMarker(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
testGlobStatus_SingleFile(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
testRename_existingFileToExistingFileName(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
testDelete_DirectoryWithFile(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
testDelete_DirectoryWithFileRecursive(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
testExists_ExistsAsDirectoryMarker(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
testRename_existingFileToNonExistingDst(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
testGetFileStatus_ExistingFile(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
testExists_ExistsAsDirectoryInSecondList(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
testExists_NotExistsNoPrefix(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
testRename_existingDirToExistingFileName(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
testDeleteDirectoryRecursiveBatch120(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
testDeleteDirectoryRecursiveBatch123(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
testCreateExistingDirectory(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
testExists_ExistsAsDirectoryContents(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
testRename_srcEqualsDst(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
testRename_existingDirToNonExistingDirWithParent(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
getUri(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
testOpen(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
testDelete_FileNotExists(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
testRename_existingDirToExistingNonEmptyDirName(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
testRename_existingFileToExistingDirName(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
testListStatusDirectory(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
testExists_NotExistsPrefixWithNoSlashTwoLists(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
testRename_nonExistingSrcFile(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
testExists_NotExistsPrefixWithNoSlash(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
testListStatusNotFound(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
testRename_srcAndDstOnDifferentBranch(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
testDelete_NotExistsRecursive(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
testDelete_FileExists(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
testDelete_EmptyDirectoryExists(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
testExists_ExistsAsObject(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
testDeleteDirectoryRecursiveBatch1(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
testDeleteDirectoryRecursiveBatch2(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
testDeleteDirectoryRecursiveBatch3(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
testDeleteDirectoryRecursiveBatch5(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
testGetFileStatus_NoFile(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
testCreateExistingFile(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
testRename_existingDirToNonExistingDirWithoutParent(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
testListStatusFile(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
testAppend(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
testCreate(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
testRename_fallbackStageAPI(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
testOpen_NotExists(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
testMkdirs(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
testGetFileStatus_DirectoryMarker(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
testGlobStatus_SingleFile(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
testRename_existingFileToExistingFileName(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
testDelete_DirectoryWithFile(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
testDelete_DirectoryWithFileRecursive(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
testExists_ExistsAsDirectoryMarker(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
testRename_existingFileToNonExistingDst(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
Tests run: 105, Failures: 0, Errors: 90, Skipped: 0
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 05:05 min
[INFO] Finished at: 2023-04-02T00:19:55+05:30
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:2.12.4:test (default-test) on project hadoop-lakefs: There are test failures.
[ERROR]
[ERROR] Please refer to /Users/simar/lakeFS/clients/hadoopfs/target/surefire-reports for the individual test results.
[ERROR] -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] <http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException>
Yoni Augarten
04/01/2023, 7:06 PMVaibhav Kumar
04/01/2023, 7:10 PM[INFO] Building jar: /Users/simar/lakeFS/clients/hadoopfs/target/hadoop-lakefs-0.1.0.jar
[INFO]
[INFO] --- gpg:1.5:sign (sign-artifacts) @ hadoop-lakefs ---
Downloading from central: <https://repo.maven.apache.org/maven2/org/codehaus/plexus/plexus-utils/3.0.15/plexus-utils-3.0.15.pom>
Downloaded from central: <https://repo.maven.apache.org/maven2/org/codehaus/plexus/plexus-utils/3.0.15/plexus-utils-3.0.15.pom> (3.1 kB at 51 kB/s)
Downloading from central: <https://repo.maven.apache.org/maven2/org/codehaus/plexus/plexus-utils/3.0.15/plexus-utils-3.0.15.jar>
Downloaded from central: <https://repo.maven.apache.org/maven2/org/codehaus/plexus/plexus-utils/3.0.15/plexus-utils-3.0.15.jar> (239 kB at 4.1 MB/s)
/bin/sh: gpg: command not found
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 03:15 min
[INFO] Finished at: 2023-04-02T20:55:03+05:30
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-gpg-plugin:1.5:sign (sign-artifacts) on project hadoop-lakefs: Exit code: 127 -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] <http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException>
Yoni Augarten
04/02/2023, 3:55 PM-P'!treeverse-signing'
to the Maven commandVaibhav Kumar
04/02/2023, 4:39 PMLakeFSFileStatus
I am not sure how to use itYoni Augarten
04/02/2023, 4:42 PMVaibhav Kumar
04/02/2023, 4:44 PMgetFileStatus
and will pass the lakeFS file path here to read in it.Yoni Augarten
04/02/2023, 4:46 PMPath p = new Path("<lakefs://my-repo/main/1.txt>");
LakeFSFileSystem fs = FileSystem.get(hadoopConf, p);
fs.getFileStatus(p);
Vaibhav Kumar
04/02/2023, 4:54 PMgo run main.go run --local-settings
Yoni Augarten
04/02/2023, 4:58 PMVaibhav Kumar
04/02/2023, 6:14 PMspark.sparkContext.hadoopConfiguration.set
to set up Lakefs client but seems the syntax is not working. In the doc it was mentioned to use it the below way.
def main(args : Array[String]) {
val spark = SparkSession.builder()
.master("local[1]")
.appName("SparkByExample")
.getOrCreate();
val path = new Path("<S3a://test-repo/main/sample.json>")
spark.sparkContext.hadoopConfiguration.set("fs.s3a.access.key", "AKIAJLDV6JMK2R5TRQSQ")
spark.sparkContext.hadoopConfiguration.set("fs.s3a.secret.key", "oOal7VcsJQQnGoPcM9AEYXCe1Q76PHMpX4R1+Ai+")
spark.sparkContext.hadoopConfiguration.set("fs.s3a.endpoint", "http:localhost:8000")
spark.sparkContext.hadoopConfiguration.set("fs.s3a.path.style.access", "true")
val y = new LakeFSFileSystem()
y.getFileStatus(path)
Error
Exception in thread "main" java.lang.NullPointerException: Cannot invoke "io.lakefs.LakeFSClient.getObjectsApi()" because "this.lfsClient" is null
Yoni Augarten
04/03/2023, 9:04 AMFileSystem.get
like I mentioned above.Vaibhav Kumar
04/03/2023, 10:01 AMYoni Augarten
04/03/2023, 10:02 AMfs.s3a
configuration will not use LakeFSFileSystem, so it's not relevant for solving this issue.Vaibhav Kumar
04/03/2023, 11:50 AMPath p = new Path("<lakefs://my-repo/main/1.txt>");
LakeFSFileSystem fs = FileSystem.get(hadoopConf, p);
fs.getFileStatus(p);
In the above function shall I pass hadoopConf
as a map of fs.* params?Yoni Augarten
04/03/2023, 11:52 AMVaibhav Kumar
04/07/2023, 2:39 PMs3a://
. Please confirm If I have used it the correct way?
One observation FileSystem._get_
is working with the URI option not directly with the path
variable
def main(args : Array[String]) {
val conf = new Configuration()
conf.set("fs.s3a.access.key", "AKIAJLDV6JMK2R5TRQSQ")
conf.set("fs.s3a.secret.key", "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")
conf.set("fs.s3a.endpoint", "<http://localhost:8000>")
conf.set("fs.s3a.path.style.access", "true")
val uri = "<s3a://test-repo/main/sample.json>"
val path = new Path("<s3a://test-repo/main/sample1.json>")
URI._create_(uri)
val fs = FileSystem._get_(URI._create_(uri), conf)
fs.getFileStatus(path)
Elad Lachmi
04/07/2023, 2:53 PMlakefs://
uri, as you mentioned.Vaibhav Kumar
04/07/2023, 2:57 PMElad Lachmi
04/07/2023, 3:01 PMVaibhav Kumar
04/07/2023, 3:05 PMfs.getFileStatus(path)
from hadoop FileSytems above but according to the issue stack trace the function call should be from LakefsFileSystem?Elad Lachmi
04/07/2023, 3:10 PMlakefs://
uri tells the FileSystem
instance to use lakeFSFS, that's why you need the lakefs://
uriVaibhav Kumar
04/07/2023, 5:48 PMlakefs://
it doesn't look like it is referring to hadoopFs
client from lakefs. The trace below still shows hadoop's Filesystem error.
0 [main] WARN org.apache.hadoop.util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Exception in thread "main" org.apache.hadoop.fs.UnsupportedFileSystemException: No FileSystem for scheme "lakefs"
at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:3443)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3466)
at org.apache.hadoop.fs.FileSystem.access$300(FileSystem.java:174)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3574)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3521)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:540)
at org.example.SparkSessionTest$.main(SparkSessionTest.scala:27)
at org.example.SparkSessionTest.main(SparkSessionTest.scala)
Elad Lachmi
04/07/2023, 6:04 PMfs.lakefs.impl
?
That’s what registers a handler for the lakefs://
URIsVaibhav Kumar
04/08/2023, 9:15 AMS3
or lakefs
under uri and path. Below is my code
def main(args : Array[String]) {
val conf = new Configuration()
conf.set("fs.s3a.access.key", "0yfZnzCeJdB9Y2i1")
conf.set("fs.s3a.secret.key", "uhYMtk6s97qLKs8jnJhrIMLfBs3uGkv6")
conf.set("fs.lakefs.access.key", "AKIAJLDV6JMK2R5TRQSQ")
conf.set("fs.lakefs.secret.key", "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")
conf.set("fs.lakefs.endpoint", "<http://localhost:8000/api/v1>")
conf.set("fs.s3a.endpoint", "<http://127.0.0.1:9090>")
conf.set("fs.lakefs.impl", "io.lakefs.LakeFSFileSystem")
val uri = "<lakefs://s3bucket/sample1.json>"
val path = new Path("<lakefs://s3bucket/sample1.json>")
val fs = FileSystem._get_(URI._create_(uri), conf)
fs.getFileStatus(path)
Error
805 [main] WARN org.apache.hadoop.fs.FileSystem - Failed to initialize fileystem <lakefs://test-repo/main/sample1.json>: java.lang.RuntimeException: lakeFS blockstore type local unsupported by this FileSystem
Exception in thread "main" java.lang.RuntimeException: lakeFS blockstore type local unsupported by this FileSystem
at io.lakefs.storage.PhysicalAddressTranslator.translate(PhysicalAddressTranslator.java:29)
at io.lakefs.LakeFSFileSystem.initializeWithClientFactory(LakeFSFileSystem.java:153)
at io.lakefs.LakeFSFileSystem.initialize(LakeFSFileSystem.java:110)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3469)
at org.apache.hadoop.fs.FileSystem.access$300(FileSystem.java:174)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3574)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3521)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:540)
at org.example.SparkSessionTest$.main(SparkSessionTest.scala:29)
at org.example.SparkSessionTest.main(SparkSessionTest.scala)
Ariel Shaqed (Scolnicov)
04/08/2023, 9:40 AMVaibhav Kumar
04/08/2023, 12:59 PMElad Lachmi
04/08/2023, 1:13 PMVaibhav Kumar
04/08/2023, 1:14 PMElad Lachmi
04/08/2023, 1:16 PMAriel Shaqed (Scolnicov)
04/08/2023, 2:20 PMVaibhav Kumar
04/08/2023, 4:36 PMAriel Shaqed (Scolnicov)
04/08/2023, 5:57 PMVaibhav Kumar
04/09/2023, 6:01 AMobject SparkSessionTest {
def main(args : Array[String]) {
val conf = new Configuration()
conf.set("fs.s3a.access.key", "minioadmin")
conf.set("fs.s3a.secret.key", "minioadmin")
conf.set("fs.lakefs.access.key", "AKIAIOSFODNN7EXAMPLE")
conf.set("fs.lakefs.secret.key", "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY")
conf.set("fs.lakefs.endpoint", "<http://localhost:8000/api/v1>")
conf.set("fs.s3a.endpoint", "<http://127.0.0.1:9090>")
conf.set("fs.lakefs.impl", "io.lakefs.LakeFSFileSystem")
val uri = "<lakefs://example/main/sample1.json>"
val path = new Path("<lakefs://example/main/sample1.json>")
val fs = FileSystem.get(URI.create(uri), conf)
fs.getFileStatus(path)
}}
Error
0 [main] WARN org.apache.hadoop.util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
682 [main] WARN org.apache.hadoop.metrics2.impl.MetricsConfig - Cannot locate configuration: tried hadoop-metrics2-s3a-file-system.properties,hadoop-metrics2.properties
754 [main] INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl - Scheduled Metric snapshot period at 10 second(s).
754 [main] INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl - s3a-file-system metrics system started
1200 [shutdown-hook-0] INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl - Stopping s3a-file-system metrics system...
1200 [shutdown-hook-0] INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl - s3a-file-system metrics system stopped.
1200 [shutdown-hook-0] INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl - s3a-file-system metrics system shutdown complete.
1201 [Thread-2] WARN org.apache.hadoop.util.ShutdownHookManager - ShutdownHook 'ClientFinalizer' failed, java.util.concurrent.ExecutionException: java.lang.NoSuchMethodError: 'void org.apache.hadoop.fs.statistics.IOStatisticsLogging.logIOStatisticsAtLevel(org.slf4j.Logger, java.lang.String, java.lang.Object)'
java.util.concurrent.ExecutionException: java.lang.NoSuchMethodError: 'void org.apache.hadoop.fs.statistics.IOStatisticsLogging.logIOStatisticsAtLevel(org.slf4j.Logger, java.lang.String, java.lang.Object)'
at java.base/java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.base/java.util.concurrent.FutureTask.get(FutureTask.java:205)
at org.apache.hadoop.util.ShutdownHookManager.executeShutdown(ShutdownHookManager.java:124)
at org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:95)
Caused by: java.lang.NoSuchMethodError: 'void org.apache.hadoop.fs.statistics.IOStatisticsLogging.logIOStatisticsAtLevel(org.slf4j.Logger, java.lang.String, java.lang.Object)'
at org.apache.hadoop.fs.s3a.S3AFileSystem.close(S3AFileSystem.java:3963)
at org.apache.hadoop.fs.FileSystem$Cache.closeAll(FileSystem.java:3678)
at org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer.run(FileSystem.java:3695)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:577)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:317)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
at java.base/java.lang.Thread.run(Thread.java:1623)
Ariel Shaqed (Scolnicov)
04/09/2023, 7:00 AMjava.util.concurrent.ExecutionException: java.lang.NoSuchMethodError: 'void org.apache.hadoop.fs.statistics.IOStatisticsLogging.logIOStatisticsAtLevel(org.slf4j.Logger, java.lang.String, java.lang.Object)'
java.util.concurrent.ExecutionException: java.lang.NoSuchMethodError: 'void org.apache.hadoop.fs.statistics.IOStatisticsLogging.logIOStatisticsAtLevel(org.slf4j.Logger, java.lang.String, java.lang.Object)'
make me suspect that the lakefs-spark-client assembly version might not match the Spark version that you are using, or possibly the Hadoop version.
• Which Spark version are you using? Is this what you get from the Everything Bagel docker-compose, or something else?
• Which lakeFS client are you using? That is, I need either the full Maven coordinates ( it will look something like "io.lakefslakefs spark client 301 2.120.6.5", and we need to look at the entire name) or the name of the jar (it will look something like ".../lakefs-spark-client-312-hadoop3-assembly-0.6.5.jar", and we need to look at the entire name).
THANKS!Vaibhav Kumar
04/09/2023, 7:03 AM<project xmlns="<http://maven.apache.org/POM/4.0.0>" xmlns:xsi="<http://www.w3.org/2001/XMLSchema-instance>" xsi:schemaLocation="<http://maven.apache.org/POM/4.0.0> <http://maven.apache.org/maven-v4_0_0.xsd>">
<modelVersion>4.0.0</modelVersion>
<groupId>org.example</groupId>
<artifactId>test-scala</artifactId>
<version>1.0-SNAPSHOT</version>
<name>${project.artifactId}</name>
<description>My wonderfull scala app</description>
<inceptionYear>2010</inceptionYear>
<licenses>
<license>
<name>My License</name>
<url>http://....</url>
<distribution>repo</distribution>
</license>
</licenses>
<properties>
<maven.compiler.source>1.5</maven.compiler.source>
<maven.compiler.target>1.5</maven.compiler.target>
<encoding>UTF-8</encoding>
<scala.version>2.13.0</scala.version>
</properties>
<dependencies>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.13</artifactId>
<version>3.2.1</version>
<scope>compile</scope>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.13</artifactId>
<version>3.2.1</version>
<scope>compile</scope>
</dependency>
<dependency>
<groupId>io.lakefs</groupId>
<artifactId>hadoop-lakefs</artifactId>
<version>0.1.0</version>
</dependency>
<!-- <https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-common> -->
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>3.3.5</version>
</dependency>
<!-- <https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-aws> -->
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-aws</artifactId>
<version>3.3.5</version>
</dependency>
</dependencies>
<build>
<sourceDirectory>src/main/scala</sourceDirectory>
<plugins>
</plugins>
</build>
</project>
Ariel Shaqed (Scolnicov)
04/09/2023, 8:35 AM<dependency>
<groupId>io.lakefs</groupId>
<artifactId>hadoop-lakefs-assembly</artifactId>
<version>0.1.13</version>
</dependency>
Vaibhav Kumar
04/09/2023, 12:45 PM0 [main] WARN org.apache.hadoop.util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
489 [main] INFO org.apache.commons.beanutils.FluentPropertyBeanIntrospector - Error when creating PropertyDescriptor for public final void org.apache.commons.configuration2.AbstractConfiguration.setProperty(java.lang.String,java.lang.Object)! Ignoring this property.
498 [main] WARN org.apache.hadoop.metrics2.impl.MetricsConfig - Cannot locate configuration: tried hadoop-metrics2-s3a-file-system.properties,hadoop-metrics2.properties
588 [main] INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl - Scheduled Metric snapshot period at 10 second(s).
588 [main] INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl - s3a-file-system metrics system started
Revised POM
<project xmlns="<http://maven.apache.org/POM/4.0.0>" xmlns:xsi="<http://www.w3.org/2001/XMLSchema-instance>" xsi:schemaLocation="<http://maven.apache.org/POM/4.0.0> <http://maven.apache.org/maven-v4_0_0.xsd>">
<modelVersion>4.0.0</modelVersion>
<groupId>org.example</groupId>
<artifactId>test-scala</artifactId>
<version>1.0-SNAPSHOT</version>
<name>${project.artifactId}</name>
<description>My wonderfull scala app</description>
<inceptionYear>2010</inceptionYear>
<licenses>
<license>
<name>My License</name>
<url>http://....</url>
<distribution>repo</distribution>
</license>
</licenses>
<properties>
<maven.compiler.source>1.5</maven.compiler.source>
<maven.compiler.target>1.5</maven.compiler.target>
<encoding>UTF-8</encoding>
<scala.version>2.13.0</scala.version>
</properties>
<dependencies>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.12</artifactId>
<version>3.1.2</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.12</artifactId>
<version>3.1.2</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>io.lakefs</groupId>
<artifactId>hadoop-lakefs-assembly</artifactId>
<version>0.1.13</version>
</dependency>
<!-- <https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-common> -->
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>3.1.2</version>
</dependency>
<!-- <https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-aws> -->
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-aws</artifactId>
<version>3.1.2</version>
</dependency>
</dependencies>
<build>
<sourceDirectory>src/main/scala</sourceDirectory>
<plugins>
</plugins>
</build>
</project>
Ariel Shaqed (Scolnicov)
04/09/2023, 1:05 PMVaibhav Kumar
04/09/2023, 3:36 PMAriel Shaqed (Scolnicov)
04/09/2023, 3:46 PMVaibhav Kumar
04/09/2023, 3:48 PMconf.set("fs.lakefs.impl", "io.lakefs.LakeFSFileSystem")
is doing something fishy it is not letting io.lakefs do things.
Exception in thread "main" java.lang.RuntimeException: java.lang.ClassNotFoundException: Class io.lakefs.LakeFSFileSystem not found
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2688)
at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:3431)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3466)
at org.apache.hadoop.fs.FileSystem.access$300(FileSystem.java:174)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3574)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3521)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:540)
at com.sparkbyexamples.spark.SparkSessionTest$.main(SparkSessionTest.scala:34)
at com.sparkbyexamples.spark.SparkSessionTest.main(SparkSessionTest.scala)
at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104)
at java.base/java.lang.reflect.Method.invoke(Method.java:578)
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at <http://org.apache.spark.deploy.SparkSubmit.org|org.apache.spark.deploy.SparkSubmit.org>$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:958)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1046)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1055)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Code
val spark = SparkSession.builder()
.master("local[1]")
.appName("SparkByExample")
.getOrCreate();
println("First SparkContext:")
println("APP Name :"+spark.sparkContext.appName);
println("Deploy Mode :"+spark.sparkContext.deployMode);
println("Master :"+spark.sparkContext.master);
val conf = new Configuration()
conf.set("fs.s3a.access.key", "minioadmin")
conf.set("fs.s3a.secret.key", "minioadmin")
conf.set("fs.lakefs.access.key", "AKIAIOSFODNN7EXAMPLE")
conf.set("fs.lakefs.secret.key", "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY")
conf.set("fs.lakefs.endpoint", "<http://localhost:8000/api/v1>")
conf.set("fs.s3a.endpoint", "<http://127.0.0.1:9090>")
conf.set("fs.lakefs.impl", "io.lakefs.LakeFSFileSystem")
val uri = "<lakefs://example/main/sample1.json>"
val path = new Path("<lakefs://example/main/sample1.json>")
val fs = FileSystem.get(URI.create(uri), conf)
fs.getFileStatus(path)
Pom
<project xmlns="<http://maven.apache.org/POM/4.0.0>" xmlns:xsi="<http://www.w3.org/2001/XMLSchema-instance>"
xsi:schemaLocation="<http://maven.apache.org/POM/4.0.0> <http://maven.apache.org/maven-v4_0_0.xsd>">
<groupId>com.sparkbyexamples</groupId>
<modelVersion>4.0.0</modelVersion>
<artifactId>spark-scala-examples</artifactId>
<version>1.0-SNAPSHOT</version>
<inceptionYear>2008</inceptionYear>
<packaging>jar</packaging>
<properties>
<scala.version>2.12.12</scala.version>
<spark.version>3.0.0</spark.version>
</properties>
<repositories>
<repository>
<id><http://scala-tools.org|scala-tools.org></id>
<name>Scala-Tools Maven2 Repository</name>
<url><http://scala-tools.org/repo-releases></url>
</repository>
</repositories>
<pluginRepositories>
<pluginRepository>
<id><http://scala-tools.org|scala-tools.org></id>
<name>Scala-Tools Maven2 Repository</name>
<url><http://scala-tools.org/repo-releases></url>
</pluginRepository>
</pluginRepositories>
<dependencies>
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-library</artifactId>
<version>${scala.version}</version>
</dependency>
<dependency>
<groupId>org.specs</groupId>
<artifactId>specs</artifactId>
<version>1.2.5</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>com.thoughtworks.xstream</groupId>
<artifactId>xstream</artifactId>
<version>1.4.11</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.12</artifactId>
<version>${spark.version}</version>
<scope>compile</scope>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.12</artifactId>
<version>${spark.version}</version>
<scope>compile</scope>
</dependency>
<dependency>
<groupId>com.databricks</groupId>
<artifactId>spark-xml_2.11</artifactId>
<version>0.4.1</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-avro_2.12</artifactId>
<version>${spark.version}</version>
</dependency>
<dependency>
<groupId>io.lakefs</groupId>
<artifactId>hadoop-lakefs-assembly</artifactId>
<version>0.1.13</version>
</dependency>
<!-- <https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-common> -->
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>3.0.0</version>
</dependency>
</dependencies>
<build>
<sourceDirectory>src/main/scala</sourceDirectory>
<resources><resource><directory>src/main/resources</directory></resource></resources>
<plugins>
<plugin>
<groupId>org.scala-tools</groupId>
<artifactId>maven-scala-plugin</artifactId>
<executions>
<execution>
<goals>
<goal>compile</goal>
<goal>testCompile</goal>
</goals>
</execution>
</executions>
<configuration>
<scalaVersion>${scala.version}</scalaVersion>
<args>
<arg>-target:jvm-1.5</arg>
</args>
</configuration>
</plugin>
</plugins>
</build>
<reporting>
<plugins>
<plugin>
<groupId>org.scala-tools</groupId>
<artifactId>maven-scala-plugin</artifactId>
<configuration>
<scalaVersion>${scala.version}</scalaVersion>
</configuration>
</plugin>
</plugins>
</reporting>
</project>
Yoni Augarten
04/10/2023, 9:16 AMmvn package exec:java -Dexec.mainClass=Main
(Assuming your code is in a main method in an object called Main)Vaibhav Kumar
04/10/2023, 9:31 AMspark-submit --class com.sparkbyexamples.spark.SparkSessionTest target/spark-scala-examples-1.0-SNAPSHOT.jar
Below is my whole code
package com.sparkbyexamples.spark
import org.apache.hadoop.fs.Path
import org.apache.hadoop.fs.FileSystem
import org.apache.hadoop.conf.Configuration
import java.net.URI
import org.apache.spark.sql.SparkSession
import io.lakefs
object SparkSessionTest {
def main(args:Array[String]): Unit ={
val spark = SparkSession.builder()
.master("local[1]")
.appName("SparkByExample")
.getOrCreate();
println("First SparkContext:")
println("APP Name :"+spark.sparkContext.appName);
println("Deploy Mode :"+spark.sparkContext.deployMode);
println("Master :"+spark.sparkContext.master);
val conf = new Configuration()
conf.set("fs.s3a.access.key", "minioadmin")
conf.set("fs.s3a.secret.key", "minioadmin")
conf.set("fs.lakefs.access.key", "AKIAIOSFODNN7EXAMPLE")
conf.set("fs.lakefs.secret.key", "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY")
conf.set("fs.lakefs.endpoint", "<http://localhost:8000/api/v1>")
conf.set("fs.s3a.endpoint", "<http://127.0.0.1:9090>")
conf.set("fs.lakefs.impl", "io.lakefs.LakeFSFileSystem")
val uri = "<lakefs://example/main/sample1.json>"
val path = new Path("<lakefs://example/main/sample1.json>")
val fs = FileSystem.get(URI.create(uri), conf)
fs.getFileStatus(path)
}
}
Yoni Augarten
04/10/2023, 9:40 AMAriel Shaqed (Scolnicov)
04/10/2023, 9:44 AMVaibhav Kumar
04/10/2023, 9:47 AMYoni Augarten
04/10/2023, 9:50 AMVaibhav Kumar
04/10/2023, 9:54 AMYoni Augarten
04/10/2023, 9:55 AMVaibhav Kumar
04/10/2023, 10:03 AMException in thread "main" java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.s3a.S3AFileSystem not found
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2479)
at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:3254)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3286)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:123)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3337)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3305)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:476)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:361)
at io.lakefs.LakeFSFileSystem.initializeWithClientFactory(LakeFSFileSystem.java:154)
at io.lakefs.LakeFSFileSystem.initialize(LakeFSFileSystem.java:110)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3288)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:123)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3337)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3305)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:476)
at com.sparkbyexamples.spark.SparkSessionTest$.main(SparkSessionTest.scala:34)
at com.sparkbyexamples.spark.SparkSessionTest.main(SparkSessionTest.scala)
Caused by: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.s3a.S3AFileSystem not found
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2383)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2477)
... 16 more
Yoni Augarten
04/10/2023, 10:05 AMVaibhav Kumar
04/10/2023, 10:06 AM<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-aws</artifactId>
<version>3.0.0</version>
</dependency>
request failed: parameter "path" in query has an error: value is required but missing: value is required but missing
Ariel Shaqed (Scolnicov)
04/10/2023, 10:24 AMVaibhav Kumar
04/10/2023, 10:27 AMAriel Shaqed (Scolnicov)
04/10/2023, 10:38 AMVaibhav Kumar
04/10/2023, 11:18 AMException in thread "main" org.apache.hadoop.fs.s3a.AWSClientIOException: doesBucketExist on example: com.amazonaws.SdkClientException: Unable to execute HTTP request: Connect to 127.0.0.1:9090 [/127.0.0.1] failed: Connection refused (Connection refused): Unable to execute HTTP request: Connect to 127.0.0.1:9090 [/127.0.0.1] failed: Connection refused (Connection refused)
at org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:144)
at org.apache.hadoop.fs.s3a.S3AFileSystem.verifyBucketExists(S3AFileSystem.java:332)
at org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:275)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3288)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:123)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3337)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3305)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:476)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:361)
at io.lakefs.LakeFSFileSystem.initializeWithClientFactory(LakeFSFileSystem.java:154)
at io.lakefs.LakeFSFileSystem.initialize(LakeFSFileSystem.java:110)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3288)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:123)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3337)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3305)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:476)
at com.sparkbyexamples.spark.SparkSessionTest$.main(SparkSessionTest.scala:34)
at com.sparkbyexamples.spark.SparkSessionTest.main(SparkSessionTest.scala)
Ariel Shaqed (Scolnicov)
04/10/2023, 11:31 AMConnect to 127.0.0.1:9090 [/127.0.0.1] failed: Connection refused (Connection refused):
and it looks to be using another port.Vaibhav Kumar
04/10/2023, 11:55 AMException in thread "main" java.io.FileNotFoundException: <lakefs://example/main/sample2.json> not found
at io.lakefs.LakeFSFileSystem.getFileStatus(LakeFSFileSystem.java:784)
at io.lakefs.LakeFSFileSystem.getFileStatus(LakeFSFileSystem.java:74)
at com.sparkbyexamples.spark.SparkSessionTest$.main(SparkSessionTest.scala:35)
at com.sparkbyexamples.spark.SparkSessionTest.main(SparkSessionTest.scala)
Ariel Shaqed (Scolnicov)
04/10/2023, 12:11 PMVaibhav Kumar
04/10/2023, 12:17 PMAriel Shaqed (Scolnicov)
04/10/2023, 12:30 PMVaibhav Kumar
04/10/2023, 1:00 PMAriel Shaqed (Scolnicov)
04/10/2023, 1:14 PMVaibhav Kumar
05/10/2023, 5:28 PMreadNextChunk()
in LakeFSFileSystem.java
I am not getting how shall I get the response code in LakeFSFileSystem.java
Elad Lachmi
05/10/2023, 5:33 PMVaibhav Kumar
05/10/2023, 5:37 PMElad Lachmi
05/10/2023, 5:44 PMcatch
block
If it throws only for e.g. 5xx, then it might require checking the response status on the resp
If the client throws an error for any non-200 response HTTP status, then you’ll need to handle it in the👆🏻 Looking through some of the existing code, it looks like this is the caseblockcatch
Vaibhav Kumar
05/10/2023, 5:49 PMElad Lachmi
05/10/2023, 5:51 PMe
I believe resp
is out of scope when you’re in the catch
blockVaibhav Kumar
05/10/2023, 5:53 PMe
as wellElad Lachmi
05/10/2023, 5:56 PMApiException
has a getCode()
method and a few more you might want to use for this purposeVaibhav Kumar
05/10/2023, 6:06 PMElad Lachmi
05/10/2023, 6:08 PMApiException
class here
clients/java/src/main/java/io/lakefs/clients/api/ApiException.java
You can look at it and see which methods are availableVaibhav Kumar
05/10/2023, 6:12 PMgetCode()
is there. So now all I have to do is e.getCode()
in the catch block right?Elad Lachmi
05/10/2023, 6:24 PMHttpStatus
enum to compare to for different HTTP statusesVaibhav Kumar
05/10/2023, 6:29 PMdef main(args : Array[String]) {
// val spark = SparkSession.builder.master("local[1]").appName("SparkByExample").getOrCreate
val conf = new Configuration()
conf.set("fs.s3a.access.key", "minioadmin")
conf.set("fs.s3a.secret.key", "minioadmin")
conf.set("fs.lakefs.access.key", "AKIAIOSFODNN7EXAMPLE")
conf.set("fs.lakefs.secret.key", "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY")
conf.set("fs.lakefs.endpoint", "<http://localhost:8000/api/v1>")
conf.set("fs.s3a.endpoint", "<http://127.0.0.1:9090>")
conf.set("fs.lakefs.impl", "io.lakefs.LakeFSFileSystem")
val uri = "<lakefs://test1/main/sample1.json>"
val path = new Path("<lakefs://test1/main/sample1.json>")
val fs = FileSystem.get(URI.create(uri), conf)
fs.getFileStatus(path)
Elad Lachmi
05/10/2023, 6:30 PMVaibhav Kumar
05/10/2023, 6:31 PMElad Lachmi
05/10/2023, 6:32 PMVaibhav Kumar
05/13/2023, 6:38 PMLakefsFilesystem.java
and run it from the go command lakefs % go run main.go run --local-settings
I have created a hadoop client on my local and trying to test the changes (response code in exception)] to my lakefs code .
The changes that I have made I cannot see those reflecting in the logs when I run my client shown below.
Do you know what could be the issue here?
package org.example
import org.apache.hadoop.fs.Path
import org.apache.hadoop.fs.FileSystem
import org.apache.hadoop.conf.Configuration
import java.net.URI
object SparkSessionTest {
def main(args : Array[String]) {
val conf = new Configuration()
conf.set("fs.s3a.access.key", "minioadmin")
conf.set("fs.s3a.secret.key", "minioadmin")
conf.set("fs.lakefs.access.key", "AKIAJLDV6JMK2R5TRQSQ")
conf.set("fs.lakefs.secret.key", "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")
conf.set("fs.lakefs.endpoint", "<http://localhost:8000/api/v1>")
conf.set("fs.s3a.endpoint", "<http://127.0.0.1:9090>")
conf.set("fs.lakefs.impl", "io.lakefs.LakeFSFileSystem")
val uri = "<lakefs://test1/main/sample1.json>"
val path = new Path("<lakefs://test1/main/sample1.json>")
val fs = FileSystem.get(URI.create(uri), conf)
fs.getFileStatus(path)
Isan Rivkin
05/14/2023, 8:28 AMVaibhav Kumar
05/14/2023, 8:37 AMLakefsFilesystem.java
. Let me know if that is incorrect
2. I have the server running from this changed code. I have the client running separately. This thing I have checked.lakeFS blockstore type local unsupported by this FileSystem
Isan Rivkin
05/14/2023, 9:04 AMVaibhav Kumar
05/14/2023, 10:16 AMLOG.debug
as well but it is not working.
Can someone from Lakefs connect with me for sometime on this? It is not a really complex PR but to set things for testing it took me a lot of time.Isan Rivkin
05/14/2023, 10:54 AMVaibhav Kumar
05/14/2023, 11:37 AMgetFileStatus
function . I am trying to hit the lakeFS from outside by creating the hadoop client and then to check if I get those logs or not.Isan Rivkin
05/14/2023, 12:01 PMVaibhav Kumar
05/14/2023, 12:28 PMIsan Rivkin
05/14/2023, 5:24 PMVaibhav Kumar
05/14/2023, 6:18 PMmain.go
on my terminal.
Now to narrow down the problem how shall I set the blockstore to minio when I run using go run command.
In other words what are the argument required to set the below properties in go run main.go run --local-settings
## Commands from docker compose file to set blockstore to MINIO
- LAKEFS_DATABASE_TYPE=local
- LAKEFS_BLOCKSTORE_TYPE=s3
- LAKEFS_BLOCKSTORE_S3_FORCE_PATH_STYLE=true
- LAKEFS_BLOCKSTORE_S3_ENDPOINT=<http://minio:9000>
- LAKEFS_BLOCKSTORE_S3_CREDENTIALS_ACCESS_KEY_ID=minioadmin
- LAKEFS_BLOCKSTORE_S3_CREDENTIALS_SECRET_ACCESS_KEY=minioadmin
- LAKEFS_AUTH_ENCRYPT_SECRET_KEY=some random secret string
- LAKEFS_STATS_ENABLED
- LAKEFS_LOGGING_LEVEL
- LAKECTL_CREDENTIALS_ACCESS_KEY_ID=AKIAIOSFODNN7EXAMPLE
- LAKECTL_CREDENTIALS_SECRET_ACCESS_KEY=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
- LAKECTL_SERVER_ENDPOINT_URL=<http://localhost:8000>
Isan Rivkin
05/14/2023, 6:20 PMVaibhav Kumar
05/14/2023, 6:26 PMAriel Shaqed (Scolnicov)
05/14/2023, 6:27 PMmake build-docker
and rm the lakeFS container and start a new one, you should be able to see your code.
Alternatively, you might be able to docker cp
a lakefs executable into the container on /app/lakefs. Restart that container and your code should run. (This one can be trickier if your container runs a sufficiently different version of Linux than your physical machine.)Vaibhav Kumar
05/14/2023, 6:47 PMdatabase.type="local"
let me know if I am going in the right direction.
Sorry, but I didn't followed you exactly on the docker related stuff that you have suggested. I think you are saying to build the local image using pom.xml? Can you share some doc for me to take a look into that as well?.lakefs.yaml
but I am getting some syntax issues 🫣
database.type="local"
blockstore.type="s3"
blockstore.s3.force_path_style=true
blockstore.s3.endpoint="<http://minio:9000>"
blockstore.s3.credentials.access_key_id="minioadmin"
blockstore.s3.credentials.secret_access_key="minioadmin"
Ariel Shaqed (Scolnicov)
05/15/2023, 5:29 AMVaibhav Kumar
05/15/2023, 5:57 AM