Hi Team I want to to use lakefs in the debugger mode Can you lakeFS #help

Hi Team I want to to use lakefs in the debugger mo...

Vaibhav Kumar

03/20/2023, 2:13 PM

Hi Team I want to to use lakefs in the debugger mode. Can you please me how can I set one up? I want to debug one Hadoop filesystem

Lynn Rozen

03/20/2023, 2:31 PM

Hi @Vaibhav Kumar, you want to debug the Hadoop filesystem code? Run lakeFS server with the dubugger?

Vaibhav Kumar

03/20/2023, 2:32 PM

yes @Lynn Rozen

Lynn Rozen

03/20/2023, 2:32 PM

Yes on both?(:

Vaibhav Kumar

03/20/2023, 2:34 PM

Actually I am working on this issue https://github.com/treeverse/lakeFS/issues/2801 So, Need to put debugger in place. So I am not sure as of now do I need both or not

Lynn Rozen

03/20/2023, 3:17 PM

There are different ways to debug the HadoopFS client, I'm using for example spark local job with intellij. You can read more here https://sparkbyexamples.com/spark/how-to-debug-spark-application-locally-or-remote/

Vaibhav Kumar

04/01/2023, 11:40 AM

I checked it, but this is the way to put debugger on the spark app. If we were to put debugger on the HadoopFS client don't you think we need to run the lakeFS in some mode (I feel binary mode). When we will put

spark.read.parquet("<lakefs://repo-that-doesnt-exist/main/path/to/data>")

for the control to come to

HadoopFS client

the lakeFS should be running somewhere right and within that I will have to put the debugger. Does that make sense?

Yoni Augarten

04/01/2023, 11:45 AM

Hey @Vaibhav Kumar, if I understand correctly, you would like to run the lakeFS Golang server in debug mode. That is, while using/developing the Hadoop FileSystem, you want to be able to put breakpoints in the server-side code. Is that correct?

Vaibhav Kumar

04/01/2023, 11:51 AM

yes correct @Yoni Augarten As I am working on this issue So there are two parts to it I feel 1.

spark.read.parquet("<lakefs://repo-that-doesnt-exist/main/path/to/data>") -

-> I know i can put a debugger on spark app side 2. Hadoopfs The server side I am not sure how to use it in debugger mode. And how things will land here?

Yoni Augarten

04/01/2023, 11:55 AM

Great! I'm sorry that we don't have an up-to-date guide on this one, but here is the gist of it: 1. Clone the lakeFS project. 2. Make sure you have Golang installed. 3. Run

make all

4. Import the project to your IDE. 5. Run

cmd/lakefs/cmd/main.go

with the

run --local-settings

arguments. (you will later change these to connect your installation to the storage). 6. Come back here and let me know how it works 🙂

Vaibhav Kumar

04/01/2023, 12:00 PM

Sure, thanks

Vaibhav Kumar

04/01/2023, 12:06 PM

During

make-all

I got the below error

Copy code

go: downloading <http://golang.org/x/xerrors|golang.org/x/xerrors> v0.0.0-20200804184101-5ec99f83aff1
go: downloading <http://golang.org/x/sys|golang.org/x/sys> v0.0.0-20210510120138-977fb7262007
Makefile:103: *** "Missing dependency - no docker in PATH". Stop.

Yoni Augarten

04/01/2023, 12:06 PM

Yep - so you also need docker for the tests. For now maybe try

make gen

instead - but you may still need docker.

Vaibhav Kumar

04/01/2023, 12:57 PM

@Yoni Augarten I was able to run the lakeFS server using -

go run main.go run --local-settings

As main.go was a Go file and our HadoopFS client is in Java so how do I link these things together. Eventually I have to put breakpoints in HadoopFS client code.

Yoni Augarten

04/01/2023, 1:17 PM

Great progress! So the client is a little trickier to debug. The simplest way is to: 1. Import

clients/hadoopfs

as a separate IDE project. 2. Write a main method to call the code that you want to test. You will set the

fs.lakefs.*

configuration to point to your local instance of lakeFS. This is simpler than running a Spark program and debugging it - although that's also possible. Let me know if that makes sense.

Vaibhav Kumar

04/01/2023, 1:31 PM

The way you suggested is the better way I think as well. Using spark seems complicating things. Now the question is reading the below trace from the issue I see

getFileStatus

(Line 749) is the first function where the call goes to.This function expect path
as a param, So I hope this the same spark path which we pass from

spark.read.parquet("<lakefs://repo-that-doesnt-exist/main/path/to/data>")

Kindly confirm if my observation is correct. Trace from the issue 2801

Copy code

java.io.IOException: listObjects
at io.lakefs.LakeFSFileSystem$ListingIterator.readNextChunk(LakeFSFileSystem.java:901)
	at io.lakefs.LakeFSFileSystem$ListingIterator.hasNext(LakeFSFileSystem.java:881)
	at io.lakefs.LakeFSFileSystem.getFileStatus(LakeFSFileSystem.java:707)
	at io.lakefs.LakeFSFileSystem.getFileStatus(LakeFSFileSystem.java:40)
	at org.apache.hadoop.fs.FileSystem.isDirectory(FileSystem.java:1439)

Yoni Augarten

04/01/2023, 1:38 PM

You are correct: you would use

lakefs://

paths when calling

getFileStatus

on the LakeFSFileSystem.

Vaibhav Kumar

04/01/2023, 1:59 PM

Thank you for confirming. Will start my work from here and will let you know if in case I see any issues

Yoni Augarten

04/01/2023, 2:04 PM

Sure thing, we're lucky to have you as a contributor! 🙂 Let me know if there's anything else.

jiggling lakefs 1

dancing lakefs 1

Vaibhav Kumar

04/01/2023, 7:05 PM

To start using the hadoopFS client as a JAR in another project I tried building the JAR from this POM. I am getting this weird error, Seems like something wrong with Java version . Do you know what is causing this error?

Copy code

Caused by: java.lang.IllegalArgumentException: Unsupported class file major version 63
        at net.bytebuddy.jar.asm.ClassReader.<init>(ClassReader.java:196)
        at net.bytebuddy.jar.asm.ClassReader.<init>(ClassReader.java:177)
        at net.bytebuddy.jar.asm.ClassReader.<init>(ClassReader.java:163)
        at net.bytebuddy.utility.OpenedClassReader.of(OpenedClassReader.java:86)
        at net.bytebuddy.dynamic.scaffold.TypeWriter$Default$ForInlining.create(TypeWriter.java:3889)
        at net.bytebuddy.dynamic.scaffold.TypeWriter$Default.make(TypeWriter.java:2166)
        at net.bytebuddy.dynamic.scaffold.inline.RedefinitionDynamicTypeBuilder.make(RedefinitionDynamicTypeBuilder.java:224)
        at net.bytebuddy.dynamic.scaffold.inline.AbstractInliningDynamicTypeBuilder.make(AbstractInliningDynamicTypeBuilder.java:123)
        at net.bytebuddy.dynamic.DynamicType$Builder$AbstractBase.make(DynamicType.java:3659)
        at org.mockito.internal.creation.bytebuddy.InlineBytecodeGenerator.transform(InlineBytecodeGenerator.java:391)
        at java.instrument/java.lang.instrument.ClassFileTransformer.transform(ClassFileTransformer.java:244)
        at java.instrument/sun.instrument.TransformerManager.transform(TransformerManager.java:188)
        at java.instrument/sun.instrument.InstrumentationImpl.transform(InstrumentationImpl.java:541)
        at java.instrument/sun.instrument.InstrumentationImpl.retransformClasses0(Native Method)
        at java.instrument/sun.instrument.InstrumentationImpl.retransformClasses(InstrumentationImpl.java:169)
        at org.mockito.internal.creation.bytebuddy.InlineBytecodeGenerator.triggerRetransformation(InlineBytecodeGenerator.java:276)
        ... 46 more

Running io.lakefs.FSConfigurationTest
Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0 sec

Results :

Tests in error: 
  testGetFileStatus_ExistingFile(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
  testExists_ExistsAsDirectoryInSecondList(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
  testExists_NotExistsNoPrefix(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
  testRename_existingDirToExistingFileName(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
  testDeleteDirectoryRecursiveBatch120(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
  testDeleteDirectoryRecursiveBatch123(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
  testCreateExistingDirectory(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
  testExists_ExistsAsDirectoryContents(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
  testRename_srcEqualsDst(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
  testRename_existingDirToNonExistingDirWithParent(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
  getUri(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
  testOpen(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
  testDelete_FileNotExists(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
  testRename_existingDirToExistingNonEmptyDirName(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
  testRename_existingFileToExistingDirName(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
  testListStatusDirectory(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
  testExists_NotExistsPrefixWithNoSlashTwoLists(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
  testRename_nonExistingSrcFile(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
  testExists_NotExistsPrefixWithNoSlash(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
  testListStatusNotFound(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
  testRename_srcAndDstOnDifferentBranch(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
  testDelete_NotExistsRecursive(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
  testDelete_FileExists(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
  testDelete_EmptyDirectoryExists(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
  testExists_ExistsAsObject(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
  testDeleteDirectoryRecursiveBatch1(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
  testDeleteDirectoryRecursiveBatch2(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
  testDeleteDirectoryRecursiveBatch3(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
  testDeleteDirectoryRecursiveBatch5(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
  testGetFileStatus_NoFile(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
  testCreateExistingFile(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
  testRename_existingDirToNonExistingDirWithoutParent(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
  testListStatusFile(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
  testAppend(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
  testCreate(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
  testRename_fallbackStageAPI(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
  testOpen_NotExists(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
  testMkdirs(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
  testGetFileStatus_DirectoryMarker(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
  testGlobStatus_SingleFile(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
  testRename_existingFileToExistingFileName(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
  testDelete_DirectoryWithFile(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
  testDelete_DirectoryWithFileRecursive(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
  testExists_ExistsAsDirectoryMarker(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
  testRename_existingFileToNonExistingDst(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
  testGetFileStatus_ExistingFile(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
  testExists_ExistsAsDirectoryInSecondList(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
  testExists_NotExistsNoPrefix(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
  testRename_existingDirToExistingFileName(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
  testDeleteDirectoryRecursiveBatch120(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
  testDeleteDirectoryRecursiveBatch123(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
  testCreateExistingDirectory(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
  testExists_ExistsAsDirectoryContents(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
  testRename_srcEqualsDst(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
  testRename_existingDirToNonExistingDirWithParent(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
  getUri(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
  testOpen(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
  testDelete_FileNotExists(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
  testRename_existingDirToExistingNonEmptyDirName(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
  testRename_existingFileToExistingDirName(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
  testListStatusDirectory(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
  testExists_NotExistsPrefixWithNoSlashTwoLists(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
  testRename_nonExistingSrcFile(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
  testExists_NotExistsPrefixWithNoSlash(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
  testListStatusNotFound(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
  testRename_srcAndDstOnDifferentBranch(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
  testDelete_NotExistsRecursive(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
  testDelete_FileExists(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
  testDelete_EmptyDirectoryExists(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
  testExists_ExistsAsObject(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
  testDeleteDirectoryRecursiveBatch1(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
  testDeleteDirectoryRecursiveBatch2(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
  testDeleteDirectoryRecursiveBatch3(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
  testDeleteDirectoryRecursiveBatch5(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
  testGetFileStatus_NoFile(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
  testCreateExistingFile(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
  testRename_existingDirToNonExistingDirWithoutParent(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
  testListStatusFile(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
  testAppend(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
  testCreate(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
  testRename_fallbackStageAPI(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
  testOpen_NotExists(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
  testMkdirs(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
  testGetFileStatus_DirectoryMarker(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
  testGlobStatus_SingleFile(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
  testRename_existingFileToExistingFileName(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
  testDelete_DirectoryWithFile(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
  testDelete_DirectoryWithFileRecursive(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
  testExists_ExistsAsDirectoryMarker(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
  testRename_existingFileToNonExistingDst(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)

Tests run: 105, Failures: 0, Errors: 90, Skipped: 0

[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  05:05 min
[INFO] Finished at: 2023-04-02T00:19:55+05:30
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:2.12.4:test (default-test) on project hadoop-lakefs: There are test failures.
[ERROR] 
[ERROR] Please refer to /Users/simar/lakeFS/clients/hadoopfs/target/surefire-reports for the individual test results.
[ERROR] -> [Help 1]
[ERROR] 
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR] 
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] <http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException>

Yoni Augarten

04/01/2023, 7:06 PM

@Vaibhav Kumar, the FileSystem is compile and tested with Java 8, since it needs to be compatible with legacy Hadoop versions. You should have Java 8 installed, and set your JAVA_HOME to it when running the tests.

Vaibhav Kumar

04/01/2023, 7:10 PM

Ok sure

Vaibhav Kumar

04/02/2023, 3:54 PM

I tried building it with Java 8, The build still failed but with different error.

Copy code

[INFO] Building jar: /Users/simar/lakeFS/clients/hadoopfs/target/hadoop-lakefs-0.1.0.jar
[INFO] 
[INFO] --- gpg:1.5:sign (sign-artifacts) @ hadoop-lakefs ---
Downloading from central: <https://repo.maven.apache.org/maven2/org/codehaus/plexus/plexus-utils/3.0.15/plexus-utils-3.0.15.pom>
Downloaded from central: <https://repo.maven.apache.org/maven2/org/codehaus/plexus/plexus-utils/3.0.15/plexus-utils-3.0.15.pom> (3.1 kB at 51 kB/s)
Downloading from central: <https://repo.maven.apache.org/maven2/org/codehaus/plexus/plexus-utils/3.0.15/plexus-utils-3.0.15.jar>
Downloaded from central: <https://repo.maven.apache.org/maven2/org/codehaus/plexus/plexus-utils/3.0.15/plexus-utils-3.0.15.jar> (239 kB at 4.1 MB/s)
/bin/sh: gpg: command not found
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  03:15 min
[INFO] Finished at: 2023-04-02T20:55:03+05:30
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-gpg-plugin:1.5:sign (sign-artifacts) on project hadoop-lakefs: Exit code: 127 -> [Help 1]
[ERROR] 
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR] 
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] <http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException>

Yoni Augarten

04/02/2023, 3:55 PM

There is a signing step there that you should skip, by adding

-P'!treeverse-signing'

to the Maven command

Vaibhav Kumar

04/02/2023, 4:39 PM

done

Vaibhav Kumar

04/02/2023, 4:40 PM

I tried importing the JAR to my test project but I see some builder function in

LakeFSFileStatus

I am not sure how to use it

Yoni Augarten

04/02/2023, 4:42 PM

What are you trying to achieve?

Vaibhav Kumar

04/02/2023, 4:44 PM

As per we discussed above, I am trying to call

getFileStatus

and will pass the lakeFS file path here to read in it.

Yoni Augarten

04/02/2023, 4:46 PM

getFileStatus is a method on the LakeFSFileSystem. So you want to have an instance of the LakeFSFileSystem.

Yoni Augarten

04/02/2023, 4:46 PM

FileSystem.get is your friend here

Yoni Augarten

04/02/2023, 4:51 PM

In pseudo code, you want to do something like:

Copy code

Path p = new Path("<lakefs://my-repo/main/1.txt>");
LakeFSFileSystem fs = FileSystem.get(hadoopConf, p);
fs.getFileStatus(p);

Yoni Augarten

04/02/2023, 4:52 PM

You don't necessarily need a Spark session around

Vaibhav Kumar

04/02/2023, 4:54 PM

My bad I was calling the wrong file above. 😶 Apart from this I should just have my LakefsServer running right like this

go run main.go run --local-settings

Yoni Augarten

04/02/2023, 4:58 PM

Yes. Take a look at the LakeFSFileSystem to learn how to connect it to lakeFS

Vaibhav Kumar

04/02/2023, 6:14 PM

I am facing some issue while creating lakefsClient . I had expected

spark.sparkContext.hadoopConfiguration.set

to set up Lakefs client but seems the syntax is not working. In the doc it was mentioned to use it the below way.

Copy code

def main(args : Array[String]) {
  val spark = SparkSession.builder()
    .master("local[1]")
    .appName("SparkByExample")
    .getOrCreate();
  val path = new Path("<S3a://test-repo/main/sample.json>")
  spark.sparkContext.hadoopConfiguration.set("fs.s3a.access.key", "AKIAJLDV6JMK2R5TRQSQ")
  spark.sparkContext.hadoopConfiguration.set("fs.s3a.secret.key", "oOal7VcsJQQnGoPcM9AEYXCe1Q76PHMpX4R1+Ai+")
  spark.sparkContext.hadoopConfiguration.set("fs.s3a.endpoint", "http:localhost:8000")
  spark.sparkContext.hadoopConfiguration.set("fs.s3a.path.style.access", "true")

  val y = new LakeFSFileSystem()
  y.getFileStatus(path)

Error

Exception in thread "main" java.lang.NullPointerException: Cannot invoke "io.lakefs.LakeFSClient.getObjectsApi()" because "this.lfsClient" is null

Yoni Augarten

04/03/2023, 9:04 AM

@Vaibhav Kumar, please read about the distinction between using the S3-compatible API and using the LakeFSFileSystem. Your configuration is for the former, while it should be for the latter. Also, you shouldn't simply instantiate a FileSystem instance, you should use

FileSystem.get

like I mentioned above.

Vaibhav Kumar

04/03/2023, 10:01 AM

Yes, I used s3 one because I don’t have exhausted the AWS account free credits.. Please confirm if this test will work on the local storage as well.

Yoni Augarten

04/03/2023, 10:02 AM

LakeFSFileSystem will only work with S3 as the backing storage.

Yoni Augarten

04/03/2023, 10:03 AM

Working with the

fs.s3a

configuration will not use LakeFSFileSystem, so it's not relevant for solving this issue.

Yoni Augarten

04/03/2023, 10:03 AM

You can try to use MinIO or another S3-compatible storage instead of S3.

Yoni Augarten

04/03/2023, 10:03 AM

Again, please read the attached doc to understand how LakeFSFileSystem works.

Vaibhav Kumar

04/03/2023, 11:50 AM

Yes, I have gone the doc. I was just confirming it once with you 🙂

Vaibhav Kumar

04/03/2023, 11:51 AM

Copy code

Path p = new Path("<lakefs://my-repo/main/1.txt>");
LakeFSFileSystem fs = FileSystem.get(hadoopConf, p);
fs.getFileStatus(p);

In the above function shall I pass

hadoopConf

as a map of fs.* params?

Yoni Augarten

04/03/2023, 11:52 AM

https://hadoop.apache.org/docs/stable/api/org/apache/hadoop/fs/FileSystem.html#get-java.net.URI-org.apache.hadoop.conf.Configuration-

gratitude thank you 1

Vaibhav Kumar

04/07/2023, 2:39 PM

I have used tried the implementation the below way. Now is just need to work with lakefs:// prefix instead of

s3a://

. Please confirm If I have used it the correct way? One observation

FileSystem._get_

is working with the URI option not directly with the

path

variable

def main(args : Array[String]) {

val conf = new Configuration()

conf.set("fs.s3a.access.key", "AKIAJLDV6JMK2R5TRQSQ")

conf.set("fs.s3a.secret.key", "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")

conf.set("fs.s3a.endpoint", "<http://localhost:8000>")

conf.set("fs.s3a.path.style.access", "true")

val uri = "<s3a://test-repo/main/sample.json>"

val path = new Path("<s3a://test-repo/main/sample1.json>")

URI._create_(uri)

val  fs =  FileSystem._get_(URI._create_(uri), conf)

fs.getFileStatus(path)

Elad Lachmi

04/07/2023, 2:53 PM

Hi @Vaibhav Kumar, Yes, that looks about right. You'll need to use a

lakefs://

uri, as you mentioned.

Vaibhav Kumar

04/07/2023, 2:57 PM

Ok and the rest things look good right , the way I have used path and URI ?this is in reference to solving issue 2801 as mentioned above.

Elad Lachmi

04/07/2023, 3:01 PM

Yes, looks about right

Vaibhav Kumar

04/07/2023, 3:05 PM

One more thing I have used

fs.getFileStatus(path)

from hadoop FileSytems above but according to the issue stack trace the function call should be from LakefsFileSystem?

Elad Lachmi

04/07/2023, 3:10 PM

The

lakefs://

uri tells the

FileSystem

instance to use lakeFSFS, that's why you need the

lakefs://

uri

Vaibhav Kumar

04/07/2023, 5:48 PM

after changing it to

lakefs://

it doesn't look like it is referring to

hadoopFs

client from lakefs. The trace below still shows hadoop's Filesystem error.

0    [main] WARN  org.apache.hadoop.util.NativeCodeLoader  - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

Exception in thread "main" org.apache.hadoop.fs.UnsupportedFileSystemException: No FileSystem for scheme "lakefs"

at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:3443)

at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3466)

at org.apache.hadoop.fs.FileSystem.access$300(FileSystem.java:174)

at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3574)

at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3521)

at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:540)

at org.example.SparkSessionTest$.main(SparkSessionTest.scala:27)

at org.example.SparkSessionTest.main(SparkSessionTest.scala)

Elad Lachmi

04/07/2023, 6:04 PM

Did you set

fs.lakefs.impl

? That’s what registers a handler for the

lakefs://

URIs

Vaibhav Kumar

04/08/2023, 9:15 AM

I have set that. I am using Minio as storage. Shall I now read the file from

S3

lakefs

under uri and path. Below is my code

def main(args : Array[String]) {

val conf = new Configuration()

conf.set("fs.s3a.access.key", "0yfZnzCeJdB9Y2i1")

conf.set("fs.s3a.secret.key", "uhYMtk6s97qLKs8jnJhrIMLfBs3uGkv6")

conf.set("fs.lakefs.access.key", "AKIAJLDV6JMK2R5TRQSQ")

conf.set("fs.lakefs.secret.key", "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")

conf.set("fs.lakefs.endpoint", "<http://localhost:8000/api/v1>")

conf.set("fs.s3a.endpoint", "<http://127.0.0.1:9090>")

conf.set("fs.lakefs.impl", "io.lakefs.LakeFSFileSystem")

val uri = "<lakefs://s3bucket/sample1.json>"

val path = new Path("<lakefs://s3bucket/sample1.json>")

val  fs =  FileSystem._get_(URI._create_(uri), conf)

fs.getFileStatus(path)

Error

Copy code

805  [main] WARN  org.apache.hadoop.fs.FileSystem  - Failed to initialize fileystem <lakefs://test-repo/main/sample1.json>: java.lang.RuntimeException: lakeFS blockstore type local unsupported by this FileSystem
Exception in thread "main" java.lang.RuntimeException: lakeFS blockstore type local unsupported by this FileSystem
	at io.lakefs.storage.PhysicalAddressTranslator.translate(PhysicalAddressTranslator.java:29)
	at io.lakefs.LakeFSFileSystem.initializeWithClientFactory(LakeFSFileSystem.java:153)
	at io.lakefs.LakeFSFileSystem.initialize(LakeFSFileSystem.java:110)
	at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3469)
	at org.apache.hadoop.fs.FileSystem.access$300(FileSystem.java:174)
	at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3574)
	at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3521)
	at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:540)
	at org.example.SparkSessionTest$.main(SparkSessionTest.scala:29)
	at org.example.SparkSessionTest.main(SparkSessionTest.scala)

Ariel Shaqed (Scolnicov)

04/08/2023, 9:40 AM

Sorry to hear it's not working. I suspect we still don't have a correct configuration. According to the error message your lakeFS blockstore type is "local". If you configured lakeFS to use an s3 compatible blockstore, that type should be "s3". I can offer 2 options... 1. If you can send your lakeFS configuration I can take a look. But please make sure to remove all secret information before sending! 2. This old guide explains how to configure lakeFS to work with minio. I went over it and it seems still relevant. (Supporting MinIO in production is a roadmap item; afair mostly we're missing continuous integration).

Vaibhav Kumar

04/08/2023, 12:59 PM

I have lakefs running on docker and storage namespace was local. Was this you mean as incorrectly configured?

Elad Lachmi

04/08/2023, 1:13 PM

lakeFSFS only works with S3-compatible block store adaptors

Vaibhav Kumar

04/08/2023, 1:14 PM

Ok so what I shall change in the lakefs config and restart the container

Elad Lachmi

04/08/2023, 1:16 PM

Please take a look at Ariel's last message Those are the options

Elad Lachmi

04/08/2023, 1:17 PM

If you'd like to work locally, you'll need to set up minio

Ariel Shaqed (Scolnicov)

04/08/2023, 2:20 PM

If you enjoy using Docker, another easy way to spin something up is to load the Everything Bagel docker-compose. That's a docker-compose file that's in the lakeFS repo that brings up lakeFS with MinIO and a bunch of popular data science tools. You might bring it up, then go over the Docker compose file and get rid of all containers except lakefs, minio, and minio-setup. It's probably a question of what's most convenient for you, mostly in terms of how you like to build and develop.

lakefs 1

Vaibhav Kumar

04/08/2023, 4:36 PM

Thanks @Ariel Shaqed (Scolnicov) @Elad Lachmi. This was helpful. I am however facing issue uploading the file on Lakefs. The button 'choose file ' seems not working.

Ariel Shaqed (Scolnicov)

04/08/2023, 5:57 PM

That's odd! Can you try to enter a destination path? Does it still stay grayed out? Or, can you try to use "lakectl fs upload"? I'll only be able to take a look tomorrow morning, sorry.

Vaibhav Kumar

04/09/2023, 6:01 AM

@Ariel Shaqed (Scolnicov) it seems like something has been wrong with my system. I am facing this in every other apps as well. Never mind the file has been uploaded to lakefs file system backed by Minio. Now when I run it I am getting the below error. Code

Copy code

object SparkSessionTest {
  def main(args : Array[String]) {
    val conf = new Configuration()
        conf.set("fs.s3a.access.key", "minioadmin")
        conf.set("fs.s3a.secret.key", "minioadmin")
        conf.set("fs.lakefs.access.key", "AKIAIOSFODNN7EXAMPLE")
        conf.set("fs.lakefs.secret.key", "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY")
        conf.set("fs.lakefs.endpoint", "<http://localhost:8000/api/v1>")
        conf.set("fs.s3a.endpoint", "<http://127.0.0.1:9090>")
        conf.set("fs.lakefs.impl", "io.lakefs.LakeFSFileSystem")

    val uri = "<lakefs://example/main/sample1.json>"
    val path = new Path("<lakefs://example/main/sample1.json>")
    val  fs =  FileSystem.get(URI.create(uri), conf)
    fs.getFileStatus(path)
 }}

Error

Copy code

0    [main] WARN  org.apache.hadoop.util.NativeCodeLoader  - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
682  [main] WARN  org.apache.hadoop.metrics2.impl.MetricsConfig  - Cannot locate configuration: tried hadoop-metrics2-s3a-file-system.properties,hadoop-metrics2.properties
754  [main] INFO  org.apache.hadoop.metrics2.impl.MetricsSystemImpl  - Scheduled Metric snapshot period at 10 second(s).
754  [main] INFO  org.apache.hadoop.metrics2.impl.MetricsSystemImpl  - s3a-file-system metrics system started
1200 [shutdown-hook-0] INFO  org.apache.hadoop.metrics2.impl.MetricsSystemImpl  - Stopping s3a-file-system metrics system...
1200 [shutdown-hook-0] INFO  org.apache.hadoop.metrics2.impl.MetricsSystemImpl  - s3a-file-system metrics system stopped.
1200 [shutdown-hook-0] INFO  org.apache.hadoop.metrics2.impl.MetricsSystemImpl  - s3a-file-system metrics system shutdown complete.
1201 [Thread-2] WARN  org.apache.hadoop.util.ShutdownHookManager  - ShutdownHook 'ClientFinalizer' failed, java.util.concurrent.ExecutionException: java.lang.NoSuchMethodError: 'void org.apache.hadoop.fs.statistics.IOStatisticsLogging.logIOStatisticsAtLevel(org.slf4j.Logger, java.lang.String, java.lang.Object)'
java.util.concurrent.ExecutionException: java.lang.NoSuchMethodError: 'void org.apache.hadoop.fs.statistics.IOStatisticsLogging.logIOStatisticsAtLevel(org.slf4j.Logger, java.lang.String, java.lang.Object)'
	at java.base/java.util.concurrent.FutureTask.report(FutureTask.java:122)
	at java.base/java.util.concurrent.FutureTask.get(FutureTask.java:205)
	at org.apache.hadoop.util.ShutdownHookManager.executeShutdown(ShutdownHookManager.java:124)
	at org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:95)
Caused by: java.lang.NoSuchMethodError: 'void org.apache.hadoop.fs.statistics.IOStatisticsLogging.logIOStatisticsAtLevel(org.slf4j.Logger, java.lang.String, java.lang.Object)'
	at org.apache.hadoop.fs.s3a.S3AFileSystem.close(S3AFileSystem.java:3963)
	at org.apache.hadoop.fs.FileSystem$Cache.closeAll(FileSystem.java:3678)
	at org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer.run(FileSystem.java:3695)
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:577)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:317)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
	at java.base/java.lang.Thread.run(Thread.java:1623)

Ariel Shaqed (Scolnicov)

04/09/2023, 7:00 AM

Glad to hear things are starting to improve! These lines

Copy code

java.util.concurrent.ExecutionException: java.lang.NoSuchMethodError: 'void org.apache.hadoop.fs.statistics.IOStatisticsLogging.logIOStatisticsAtLevel(org.slf4j.Logger, java.lang.String, java.lang.Object)'
java.util.concurrent.ExecutionException: java.lang.NoSuchMethodError: 'void org.apache.hadoop.fs.statistics.IOStatisticsLogging.logIOStatisticsAtLevel(org.slf4j.Logger, java.lang.String, java.lang.Object)'

make me suspect that the lakefs-spark-client assembly version might not match the Spark version that you are using, or possibly the Hadoop version. • Which Spark version are you using? Is this what you get from the Everything Bagel docker-compose, or something else? • Which lakeFS client are you using? That is, I need either the full Maven coordinates ( it will look something like "io.lakefslakefs spark client 301 2.120.6.5", and we need to look at the entire name) or the name of the jar (it will look something like ".../lakefs-spark-client-312-hadoop3-assembly-0.6.5.jar", and we need to look at the entire name). THANKS!

Vaibhav Kumar

04/09/2023, 7:03 AM

@Ariel Shaqed (Scolnicov) is it possible for us to connect sometime for this issue and resolve things in a go? I feel to cut down on the iterations it would be easier as well. What do you suggest 🙂?

✔️ 1

Vaibhav Kumar

04/09/2023, 7:28 AM

I am using the below POM.xml which shows my spark ,hadoop lakefs version Within Bagel I removed everything else but just kept Minio, Minio set up and lakefs

Copy code

<project xmlns="<http://maven.apache.org/POM/4.0.0>" xmlns:xsi="<http://www.w3.org/2001/XMLSchema-instance>" xsi:schemaLocation="<http://maven.apache.org/POM/4.0.0> <http://maven.apache.org/maven-v4_0_0.xsd>">
  <modelVersion>4.0.0</modelVersion>
  <groupId>org.example</groupId>
  <artifactId>test-scala</artifactId>
  <version>1.0-SNAPSHOT</version>
  <name>${project.artifactId}</name>
  <description>My wonderfull scala app</description>
  <inceptionYear>2010</inceptionYear>
  <licenses>
    <license>
      <name>My License</name>
      <url>http://....</url>
      <distribution>repo</distribution>
    </license>
  </licenses>

  <properties>
    <maven.compiler.source>1.5</maven.compiler.source>
    <maven.compiler.target>1.5</maven.compiler.target>
    <encoding>UTF-8</encoding>
    <scala.version>2.13.0</scala.version>
  </properties>


  <dependencies>
    <dependency>
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-core_2.13</artifactId>
      <version>3.2.1</version>
      <scope>compile</scope>
    </dependency>

    <dependency>
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-sql_2.13</artifactId>
      <version>3.2.1</version>
      <scope>compile</scope>
    </dependency>
    <dependency>
      <groupId>io.lakefs</groupId>
      <artifactId>hadoop-lakefs</artifactId>
      <version>0.1.0</version>
    </dependency>
    <!-- <https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-common> -->
    <dependency>
      <groupId>org.apache.hadoop</groupId>
      <artifactId>hadoop-common</artifactId>
      <version>3.3.5</version>
    </dependency>
    <!-- <https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-aws> -->
    <dependency>
      <groupId>org.apache.hadoop</groupId>
      <artifactId>hadoop-aws</artifactId>
      <version>3.3.5</version>
    </dependency>
  </dependencies>

  <build>
    <sourceDirectory>src/main/scala</sourceDirectory>
    <plugins>
    </plugins>
  </build>
</project>

Ariel Shaqed (Scolnicov)

04/09/2023, 8:35 AM

There are some mismatched versions in there 😞 • All Spark versions should be the same, and compatible with what you use for the Jar. I would consider using spark-3.1.2 for everything. • The FS version here seem very odd! Our guide recommends using version 0.1.13, so something like

Copy code

<dependency>
      <groupId>io.lakefs</groupId>
      <artifactId>hadoop-lakefs-assembly</artifactId>
      <version>0.1.13</version>
    </dependency>

Vaibhav Kumar

04/09/2023, 12:45 PM

I changed the versions suggested by you. Now my program goes to hung state

Copy code

0    [main] WARN  org.apache.hadoop.util.NativeCodeLoader  - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
489  [main] INFO  org.apache.commons.beanutils.FluentPropertyBeanIntrospector  - Error when creating PropertyDescriptor for public final void org.apache.commons.configuration2.AbstractConfiguration.setProperty(java.lang.String,java.lang.Object)! Ignoring this property.
498  [main] WARN  org.apache.hadoop.metrics2.impl.MetricsConfig  - Cannot locate configuration: tried hadoop-metrics2-s3a-file-system.properties,hadoop-metrics2.properties
588  [main] INFO  org.apache.hadoop.metrics2.impl.MetricsSystemImpl  - Scheduled Metric snapshot period at 10 second(s).
588  [main] INFO  org.apache.hadoop.metrics2.impl.MetricsSystemImpl  - s3a-file-system metrics system started

Revised POM

Copy code

<project xmlns="<http://maven.apache.org/POM/4.0.0>" xmlns:xsi="<http://www.w3.org/2001/XMLSchema-instance>" xsi:schemaLocation="<http://maven.apache.org/POM/4.0.0> <http://maven.apache.org/maven-v4_0_0.xsd>">
  <modelVersion>4.0.0</modelVersion>
  <groupId>org.example</groupId>
  <artifactId>test-scala</artifactId>
  <version>1.0-SNAPSHOT</version>
  <name>${project.artifactId}</name>
  <description>My wonderfull scala app</description>
  <inceptionYear>2010</inceptionYear>
  <licenses>
    <license>
      <name>My License</name>
      <url>http://....</url>
      <distribution>repo</distribution>
    </license>
  </licenses>

  <properties>
    <maven.compiler.source>1.5</maven.compiler.source>
    <maven.compiler.target>1.5</maven.compiler.target>
    <encoding>UTF-8</encoding>
    <scala.version>2.13.0</scala.version>
  </properties>


  <dependencies>
    <dependency>
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-core_2.12</artifactId>
      <version>3.1.2</version>
    </dependency>

    <dependency>
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-sql_2.12</artifactId>
      <version>3.1.2</version>
      <scope>provided</scope>
    </dependency>


    <dependency>
      <groupId>io.lakefs</groupId>
      <artifactId>hadoop-lakefs-assembly</artifactId>
      <version>0.1.13</version>
    </dependency>
    <!-- <https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-common> -->
    <dependency>
      <groupId>org.apache.hadoop</groupId>
      <artifactId>hadoop-common</artifactId>
      <version>3.1.2</version>
    </dependency>
    <!-- <https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-aws> -->
    <dependency>
      <groupId>org.apache.hadoop</groupId>
      <artifactId>hadoop-aws</artifactId>
      <version>3.1.2</version>
    </dependency>
  </dependencies>

  <build>
    <sourceDirectory>src/main/scala</sourceDirectory>
    <plugins>
    </plugins>
  </build>
</project>

Ariel Shaqed (Scolnicov)

04/09/2023, 1:05 PM

The fact that it doesn't do anything is worrying 😞 Spark is running, no worrying messages 🙂 I am really not sure where to go from here -- I would expect to see some outputs from spark-submit, somewhere. Does Docker see any interesting log lines from any of the Spark containers?

Vaibhav Kumar

04/09/2023, 3:36 PM

I know it is a bit frustrating I have spent more than a week setting it but still not able to replicate it 😑 I am using intelliJ and running the App directly. Do I need to use spark-submit?

Ariel Shaqed (Scolnicov)

04/09/2023, 3:46 PM

I'll find someone who can advise with running in IntelliJ. I literally never run IntelliJ (or Spark in the debugger).

Vaibhav Kumar

04/09/2023, 3:48 PM

I have spark container running as well. If you help me with how can I run the JAR in those container may be I can give it another try . there is running spark3.0 there not 3.1.2

Vaibhav Kumar

04/09/2023, 5:44 PM

Some good news. I have tried spark submit and got a different error. I hope the below trace helps @Ariel Shaqed (Scolnicov). I feel

conf.set("fs.lakefs.impl", "io.lakefs.LakeFSFileSystem")

is doing something fishy it is not letting io.lakefs do things.

Copy code

Exception in thread "main" java.lang.RuntimeException: java.lang.ClassNotFoundException: Class io.lakefs.LakeFSFileSystem not found
        at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2688)
        at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:3431)
        at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3466)
        at org.apache.hadoop.fs.FileSystem.access$300(FileSystem.java:174)
        at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3574)
        at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3521)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:540)
        at com.sparkbyexamples.spark.SparkSessionTest$.main(SparkSessionTest.scala:34)
        at com.sparkbyexamples.spark.SparkSessionTest.main(SparkSessionTest.scala)
        at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104)
        at java.base/java.lang.reflect.Method.invoke(Method.java:578)
        at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
        at <http://org.apache.spark.deploy.SparkSubmit.org|org.apache.spark.deploy.SparkSubmit.org>$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:958)
        at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
        at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
        at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
        at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1046)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1055)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

Code

Copy code

val spark = SparkSession.builder()
  .master("local[1]")
  .appName("SparkByExample")
  .getOrCreate();

println("First SparkContext:")
println("APP Name :"+spark.sparkContext.appName);
println("Deploy Mode :"+spark.sparkContext.deployMode);
println("Master :"+spark.sparkContext.master);

val conf = new Configuration()
conf.set("fs.s3a.access.key", "minioadmin")
conf.set("fs.s3a.secret.key", "minioadmin")
conf.set("fs.lakefs.access.key", "AKIAIOSFODNN7EXAMPLE")
conf.set("fs.lakefs.secret.key", "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY")
conf.set("fs.lakefs.endpoint", "<http://localhost:8000/api/v1>")
conf.set("fs.s3a.endpoint", "<http://127.0.0.1:9090>")
conf.set("fs.lakefs.impl", "io.lakefs.LakeFSFileSystem")

val uri = "<lakefs://example/main/sample1.json>"
val path = new Path("<lakefs://example/main/sample1.json>")
val  fs =  FileSystem.get(URI.create(uri), conf)
fs.getFileStatus(path)

Pom

Copy code

<project xmlns="<http://maven.apache.org/POM/4.0.0>" xmlns:xsi="<http://www.w3.org/2001/XMLSchema-instance>"
         xsi:schemaLocation="<http://maven.apache.org/POM/4.0.0> <http://maven.apache.org/maven-v4_0_0.xsd>">


    <groupId>com.sparkbyexamples</groupId>
    <modelVersion>4.0.0</modelVersion>
    <artifactId>spark-scala-examples</artifactId>

    <version>1.0-SNAPSHOT</version>
    <inceptionYear>2008</inceptionYear>
    <packaging>jar</packaging>
    <properties>
        <scala.version>2.12.12</scala.version>
        <spark.version>3.0.0</spark.version>
    </properties>

    <repositories>
        <repository>
            <id><http://scala-tools.org|scala-tools.org></id>
            <name>Scala-Tools Maven2 Repository</name>
            <url><http://scala-tools.org/repo-releases></url>
        </repository>
    </repositories>

    <pluginRepositories>
        <pluginRepository>
            <id><http://scala-tools.org|scala-tools.org></id>
            <name>Scala-Tools Maven2 Repository</name>
            <url><http://scala-tools.org/repo-releases></url>
        </pluginRepository>
    </pluginRepositories>

    <dependencies>

        <dependency>
            <groupId>org.scala-lang</groupId>
            <artifactId>scala-library</artifactId>
            <version>${scala.version}</version>
        </dependency>
        <dependency>
            <groupId>org.specs</groupId>
            <artifactId>specs</artifactId>
            <version>1.2.5</version>
            <scope>test</scope>
        </dependency>

        <dependency>
            <groupId>com.thoughtworks.xstream</groupId>
            <artifactId>xstream</artifactId>
            <version>1.4.11</version>
        </dependency>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-core_2.12</artifactId>
            <version>${spark.version}</version>
            <scope>compile</scope>
        </dependency>

        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-sql_2.12</artifactId>
            <version>${spark.version}</version>
         <scope>compile</scope>
        </dependency>

        <dependency>
           <groupId>com.databricks</groupId>
           <artifactId>spark-xml_2.11</artifactId>
           <version>0.4.1</version>
       </dependency>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-avro_2.12</artifactId>
            <version>${spark.version}</version>
        </dependency>
        <dependency>
            <groupId>io.lakefs</groupId>
            <artifactId>hadoop-lakefs-assembly</artifactId>
            <version>0.1.13</version>
        </dependency>
        <!-- <https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-common> -->
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-common</artifactId>
            <version>3.0.0</version>
        </dependency>


    </dependencies>

    <build>
        <sourceDirectory>src/main/scala</sourceDirectory>
        <resources><resource><directory>src/main/resources</directory></resource></resources>
        <plugins>
            <plugin>
                <groupId>org.scala-tools</groupId>
                <artifactId>maven-scala-plugin</artifactId>
                <executions>
                    <execution>
                        <goals>
                            <goal>compile</goal>
                            <goal>testCompile</goal>
                        </goals>
                    </execution>
                </executions>
                <configuration>
                    <scalaVersion>${scala.version}</scalaVersion>
                    <args>
                        <arg>-target:jvm-1.5</arg>
                    </args>
                </configuration>
            </plugin>

        </plugins>
    </build>
    <reporting>
        <plugins>
            <plugin>
                <groupId>org.scala-tools</groupId>
                <artifactId>maven-scala-plugin</artifactId>
                <configuration>
                    <scalaVersion>${scala.version}</scalaVersion>
                </configuration>
            </plugin>
        </plugins>
    </reporting>
</project>

Yoni Augarten

04/10/2023, 9:16 AM

Hey @Vaibhav Kumar, how are you running your code? I was able to use your pom file to successfully run the attached code. I used the following command:

Copy code

mvn package exec:java -Dexec.mainClass=Main

(Assuming your code is in a main method in an object called Main)

Vaibhav Kumar

04/10/2023, 9:31 AM

@Yoni Augarten I am using

spark-submit --class com.sparkbyexamples.spark.SparkSessionTest target/spark-scala-examples-1.0-SNAPSHOT.jar

Below is my whole code

Copy code

package com.sparkbyexamples.spark
import org.apache.hadoop.fs.Path
import org.apache.hadoop.fs.FileSystem
import org.apache.hadoop.conf.Configuration
import java.net.URI
import org.apache.spark.sql.SparkSession
import io.lakefs
object SparkSessionTest {

  def main(args:Array[String]): Unit ={


    val spark = SparkSession.builder()
      .master("local[1]")
      .appName("SparkByExample")
      .getOrCreate();
    
    println("First SparkContext:")
    println("APP Name :"+spark.sparkContext.appName);
    println("Deploy Mode :"+spark.sparkContext.deployMode);
    println("Master :"+spark.sparkContext.master);

    val conf = new Configuration()
    conf.set("fs.s3a.access.key", "minioadmin")
    conf.set("fs.s3a.secret.key", "minioadmin")
    conf.set("fs.lakefs.access.key", "AKIAIOSFODNN7EXAMPLE")
    conf.set("fs.lakefs.secret.key", "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY")
    conf.set("fs.lakefs.endpoint", "<http://localhost:8000/api/v1>")
    conf.set("fs.s3a.endpoint", "<http://127.0.0.1:9090>")
    conf.set("fs.lakefs.impl", "io.lakefs.LakeFSFileSystem")

    val uri = "<lakefs://example/main/sample1.json>"
    val path = new Path("<lakefs://example/main/sample1.json>")
    val  fs =  FileSystem.get(URI.create(uri), conf)
    fs.getFileStatus(path)

  }
}

Yoni Augarten

04/10/2023, 9:40 AM

I don't think your Jar contains any of your dependencies, that's why LakeFSFileSystem cannot be found. Make sure you are running your program with a classpath that includes all of the Maven dependencies. This can be done either using Maven like I did above, or by creating an "uber-jar", or by using an IDE to manage dependencies, or by some other way.

Ariel Shaqed (Scolnicov)

04/10/2023, 9:44 AM

Thanks, Yoni, that really sounds like a Spark issue! Vaibhav, are you using a downloaded jar or did you build it yourself? You'll want to run "mvn package" to assemble a hat that brings in all its dependencies.

Vaibhav Kumar

04/10/2023, 9:47 AM

I can see the jar under External Libraries .PFB the screenshots.

Yoni Augarten

04/10/2023, 9:50 AM

Then if you run it from your IDE it should work. When you simply spark-submit a jar, these external dependencies are not included.

Vaibhav Kumar

04/10/2023, 9:54 AM

but when we package a project to JAR then all code and dependencies comes at a single place right which is the JAR Another point is if lib was not found my import statements would have failed but those are passing Let me know if my understanding is correct

Yoni Augarten

04/10/2023, 9:55 AM

In what way are they "passing"?

Yoni Augarten

04/10/2023, 9:55 AM

Import statements in Java are not executed in the same way code is

Yoni Augarten

04/10/2023, 9:57 AM

Regarding the JAR, it depends how you created it. Since your pom doesn't include any plugin that knows to create an uber-jar, your jar probably only includes your compiled classes, and not other dependencies.

Vaibhav Kumar

04/10/2023, 10:03 AM

@Yoni Augarten I think you are right when I run from IDE directly it is now throwing a different error

Copy code

Exception in thread "main" java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.s3a.S3AFileSystem not found
	at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2479)
	at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:3254)
	at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3286)
	at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:123)
	at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3337)
	at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3305)
	at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:476)
	at org.apache.hadoop.fs.Path.getFileSystem(Path.java:361)
	at io.lakefs.LakeFSFileSystem.initializeWithClientFactory(LakeFSFileSystem.java:154)
	at io.lakefs.LakeFSFileSystem.initialize(LakeFSFileSystem.java:110)
	at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3288)
	at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:123)
	at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3337)
	at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3305)
	at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:476)
	at com.sparkbyexamples.spark.SparkSessionTest$.main(SparkSessionTest.scala:34)
	at com.sparkbyexamples.spark.SparkSessionTest.main(SparkSessionTest.scala)
Caused by: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.s3a.S3AFileSystem not found
	at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2383)
	at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2477)
	... 16 more

Vaibhav Kumar

04/10/2023, 10:04 AM

aws-sdk is missing i think

Yoni Augarten

04/10/2023, 10:05 AM

@Vaibhav Kumar - thanks for updating. Developing Spark applications does have its learning curve when trying to work with external libraries. Some are easier to import using pom files, while with others there is no other choice but downloading the Jar from the internet. Unfortunately there is no one-size-fits-all approach as there are just too many moving parts. It's a matter of trial and error in the end.

Vaibhav Kumar

04/10/2023, 10:06 AM

no problem at the end we all are leaning and growing 🙂

😊 1

Vaibhav Kumar

04/10/2023, 10:07 AM

I reran after importing the below module and now my program is in hung state

Copy code

<dependency>
    <groupId>org.apache.hadoop</groupId>
    <artifactId>hadoop-aws</artifactId>
    <version>3.0.0</version>
</dependency>

Vaibhav Kumar

04/10/2023, 10:22 AM

while copying I get the below error most of the time lakeFS % ./lakectl fs upload -s /Users/simar/Downloads/sample1.json -d lakefs://example/main/

request failed: parameter "path" in query has an error: value is required but missing: value is required but missing

Ariel Shaqed (Scolnicov)

04/10/2023, 10:24 AM

Can you try adding a path at the end of the destination URL?

Vaibhav Kumar

04/10/2023, 10:27 AM

path as in ?lakefs://example/main/ is the path where I want to add the file

Ariel Shaqed (Scolnicov)

04/10/2023, 10:38 AM

lakefs://example/main/ is a root, can you try lakefs://example/main/sample1.json ?

Vaibhav Kumar

04/10/2023, 11:18 AM

@Ariel Shaqed (Scolnicov) @Yoni Augarten I saw some different error this time. This time there is some connection issue with Minio. Please see the log below

Copy code

Exception in thread "main" org.apache.hadoop.fs.s3a.AWSClientIOException: doesBucketExist on example: com.amazonaws.SdkClientException: Unable to execute HTTP request: Connect to 127.0.0.1:9090 [/127.0.0.1] failed: Connection refused (Connection refused): Unable to execute HTTP request: Connect to 127.0.0.1:9090 [/127.0.0.1] failed: Connection refused (Connection refused)
	at org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:144)
	at org.apache.hadoop.fs.s3a.S3AFileSystem.verifyBucketExists(S3AFileSystem.java:332)
	at org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:275)
	at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3288)
	at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:123)
	at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3337)
	at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3305)
	at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:476)
	at org.apache.hadoop.fs.Path.getFileSystem(Path.java:361)
	at io.lakefs.LakeFSFileSystem.initializeWithClientFactory(LakeFSFileSystem.java:154)
	at io.lakefs.LakeFSFileSystem.initialize(LakeFSFileSystem.java:110)
	at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3288)
	at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:123)
	at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3337)
	at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3305)
	at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:476)
	at com.sparkbyexamples.spark.SparkSessionTest$.main(SparkSessionTest.scala:34)
	at com.sparkbyexamples.spark.SparkSessionTest.main(SparkSessionTest.scala)

Vaibhav Kumar

04/10/2023, 11:19 AM

Minio is running already at http://127.0.0.1:9001/ in my browser

Ariel Shaqed (Scolnicov)

04/10/2023, 11:31 AM

I think you have a configuration issue with your S3 endoint somewhere: it says

Copy code

Connect to 127.0.0.1:9090 [/127.0.0.1] failed: Connection refused (Connection refused):

and it looks to be using another port.

Vaibhav Kumar

04/10/2023, 11:55 AM

@Ariel Shaqed (Scolnicov) it has worked I changed the port to the API one. jumping lakefs

Vaibhav Kumar

04/10/2023, 11:59 AM

but how to print anything out of it? I mean when the file was present it worked without any error But when the file is not present it didn't exactly gave me the error that you showed in the issue in the stack trace.

Copy code

Exception in thread "main" java.io.FileNotFoundException: <lakefs://example/main/sample2.json> not found
	at io.lakefs.LakeFSFileSystem.getFileStatus(LakeFSFileSystem.java:784)
	at io.lakefs.LakeFSFileSystem.getFileStatus(LakeFSFileSystem.java:74)
	at com.sparkbyexamples.spark.SparkSessionTest$.main(SparkSessionTest.scala:35)
	at com.sparkbyexamples.spark.SparkSessionTest.main(SparkSessionTest.scala)

👀 1

Ariel Shaqed (Scolnicov)

04/10/2023, 12:11 PM

Yay, now we're cooking! Yeah, that's the good flow. Now try a repo that doesn't exist, or a branch that doesn't exist.

Vaibhav Kumar

04/10/2023, 12:17 PM

Yup, It worked fully.

Vaibhav Kumar

04/10/2023, 12:19 PM

Now, moving and solving the problem I need to gather the response from lakefs server that is being received in LakeFSFileSystem right?

Ariel Shaqed (Scolnicov)

04/10/2023, 12:30 PM

Yeah, swagger.yml can give all possible return status codes and messages. This issue is to translate these exceptions into e.g. FileNotFoundException, which is more informative to the user.

Vaibhav Kumar

04/10/2023, 1:00 PM

https://github.com/treeverse/lakeFS/blob/master/api/swagger.yml this one right?

Ariel Shaqed (Scolnicov)

04/10/2023, 1:14 PM

Yup! lakeFSFS problems a bunch of calls, reach can fail. I would start with the open and create methods, they hold many of the most common failures. After that list, getFileStatus, listStatus might also have similar failure modes.

Vaibhav Kumar

05/10/2023, 5:28 PM

@Ariel Shaqed (Scolnicov) I went through the swagger doc and responses we are getting using postman. I think I have figured out where we need to add the code. It should be somewhere in

readNextChunk()

LakeFSFileSystem.java

I am not getting how shall I get the response code in

LakeFSFileSystem.java

Elad Lachmi

05/10/2023, 5:33 PM

Hi @Vaibhav Kumar, To make sure I understand what you’re trying to do… The idea is to handle errors better instead of simply throwing them?

Vaibhav Kumar

05/10/2023, 5:37 PM

I want to get the response directly from the lakefs server for clear messages. If you see in this issue which I am working on the error thrown is not very clear. While if you hit the list object api we get proper error like '"repository not found". I have set up ready already to reproduce the same error but I am not sure how to get those response codes @Elad Lachmi

Elad Lachmi

05/10/2023, 5:44 PM

It depends on how the client is implemented If the client throws an error for any non-200 response HTTP status, then you’ll need to handle it in the

catch

block If it throws only for e.g. 5xx, then it might require checking the response status on the

resp

Elad Lachmi

05/10/2023, 5:48 PM

If the client throws an error for any non-200 response HTTP status, then you’ll need to handle it in the
catch
block

👆🏻 Looking through some of the existing code, it looks like this is the case

Vaibhav Kumar

05/10/2023, 5:49 PM

yes you are correct but I am not getting any suggestions around response code in the below screenshot

Elad Lachmi

05/10/2023, 5:51 PM

You need to look at

I believe

resp

is out of scope when you’re in the

catch

block

Vaibhav Kumar

05/10/2023, 5:53 PM

Same with

as well

Elad Lachmi

05/10/2023, 5:56 PM

ApiException

has a

getCode()

method and a few more you might want to use for this purpose

👍🏼 1

Vaibhav Kumar

05/10/2023, 6:06 PM

ok, can you please share some document for my reference? I am not getting it how will I use it.

Elad Lachmi

05/10/2023, 6:08 PM

You can have the

ApiException

class here

Copy code

clients/java/src/main/java/io/lakefs/clients/api/ApiException.java

You can look at it and see which methods are available

Vaibhav Kumar

05/10/2023, 6:12 PM

yes I can see now

getCode()

is there. So now all I have to do is

e.getCode()

in the catch block right?

Elad Lachmi

05/10/2023, 6:24 PM

Yes, and you have the

HttpStatus

enum to compare to for different HTTP statuses

Vaibhav Kumar

05/10/2023, 6:29 PM

ok sure, So after making the changes in this code. I can just run the lakeFS go server and test my changes right? To replicate the error till nor I was using hadoop client (pasted below) from my intelliJ. This talks to my lakefs docker and MInio as of now.

Copy code

def main(args : Array[String]) {
//    val spark = SparkSession.builder.master("local[1]").appName("SparkByExample").getOrCreate
    val conf = new Configuration()
        conf.set("fs.s3a.access.key", "minioadmin")
        conf.set("fs.s3a.secret.key", "minioadmin")
        conf.set("fs.lakefs.access.key", "AKIAIOSFODNN7EXAMPLE")
        conf.set("fs.lakefs.secret.key", "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY")
        conf.set("fs.lakefs.endpoint", "<http://localhost:8000/api/v1>")
        conf.set("fs.s3a.endpoint", "<http://127.0.0.1:9090>")
        conf.set("fs.lakefs.impl", "io.lakefs.LakeFSFileSystem")


    val uri = "<lakefs://test1/main/sample1.json>"
    val path = new Path("<lakefs://test1/main/sample1.json>")
    val  fs =  FileSystem.get(URI.create(uri), conf)
    fs.getFileStatus(path)

Elad Lachmi

05/10/2023, 6:30 PM

Sounds like a good plan Good luck!

Vaibhav Kumar

05/10/2023, 6:31 PM

Thankyou @Elad Lachmi. Appreciate this help from you. Will let you my findings.

Elad Lachmi

05/10/2023, 6:32 PM

Sure, happy to help 🙂

Vaibhav Kumar

05/13/2023, 6:38 PM

I have made the changes to my

LakefsFilesystem.java

and run it from the go command

lakefs % go run main.go run --local-settings

I have created a hadoop client on my local and trying to test the changes (response code in exception)] to my lakefs code . The changes that I have made I cannot see those reflecting in the logs when I run my client shown below. Do you know what could be the issue here?

Copy code

package org.example
import org.apache.hadoop.fs.Path
import org.apache.hadoop.fs.FileSystem
import org.apache.hadoop.conf.Configuration
import java.net.URI

object SparkSessionTest {
  def main(args : Array[String]) {
    val conf = new Configuration()
        conf.set("fs.s3a.access.key", "minioadmin")
        conf.set("fs.s3a.secret.key", "minioadmin")
        conf.set("fs.lakefs.access.key", "AKIAJLDV6JMK2R5TRQSQ")
        conf.set("fs.lakefs.secret.key", "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")
        conf.set("fs.lakefs.endpoint", "<http://localhost:8000/api/v1>")
        conf.set("fs.s3a.endpoint", "<http://127.0.0.1:9090>")
        conf.set("fs.lakefs.impl", "io.lakefs.LakeFSFileSystem")


    val uri = "<lakefs://test1/main/sample1.json>"
    val path = new Path("<lakefs://test1/main/sample1.json>")
    val  fs =  FileSystem.get(URI.create(uri), conf)
    fs.getFileStatus(path)

Isan Rivkin

05/14/2023, 8:28 AM

Hi, I would first make sure that I’m setting the correct log level, second would be making sure that the code change you made are reflecting in the code that is running.

Vaibhav Kumar

05/14/2023, 8:37 AM

@Isan Rivkin 1. I am using LOG.info in

LakefsFilesystem.java

. Let me know if that is incorrect 2. I have the server running from this changed code. I have the client running separately. This thing I have checked.

Vaibhav Kumar

05/14/2023, 8:48 AM

I have observed something unusual , Earlier when I was running my lakefs,Minio in a docker setup(Baggle). So as soon as I upload file to lakefs repo it got reflected in minio as well. Now as I running lakefs from terminal using go command and minio still from docker. I cannot see the same behaviour . the file are not syncing. Now I am getting error like

lakeFS blockstore type local unsupported by this FileSystem

Isan Rivkin

05/14/2023, 9:04 AM

I can’t tell if INFO log level is the right one, it all depends on the logs you are trying to see, if you’re logging DEBUG then you should probably use that. Regarding your error, looking at the PhysicalAddressTranslator.java class the Hadoop client supports s3 and does not support local storage.

Vaibhav Kumar

05/14/2023, 10:16 AM

Yes but I am using minio storage and I have used the properties accordingly in client I have tried with

LOG.debug

as well but it is not working. Can someone from Lakefs connect with me for sometime on this? It is not a really complex PR but to set things for testing it took me a lot of time.

Isan Rivkin

05/14/2023, 10:54 AM

It would be more beneficial for the community to be able to search different questions. Im not sure what is the issue you are currently facing, outside the PR you are trying to do. What’s the issue you are asking about because this thread contains a lot of conversations. Is your current problem that you run lakeFS and expect to see logs that you added but don’t actually see them?

Vaibhav Kumar

05/14/2023, 11:37 AM

I am working on this issue. Upon further investigating I got to know that I have to work on catching some server responses in this file File . So now I am trying to add some debug statement in the same file under

getFileStatus

function . I am trying to hit the lakeFS from outside by creating the hadoop client and then to check if I get those logs or not.

Isan Rivkin

05/14/2023, 12:01 PM

Thanks for the context, To solve the logs issue I would suggest adding simple log even at the start of the code you are running in Java and try to see if it logs or not, then if not debug it from there.

Vaibhav Kumar

05/14/2023, 12:28 PM

I have tried almost everything around logs.but it is not working.

Isan Rivkin

05/14/2023, 5:24 PM

If you see other logs then I guess it’s either that that the Java code you are running is not the code you think is running therefore your logs are not appearing, or something overrides the log level. Try making a change that’s not related to logs then and see if the behavior changes at all.

Vaibhav Kumar

05/14/2023, 6:18 PM

@Isan Rivkin I think I got the issue. I am running this compose file which obviously pulls the latest lakefs image and not my changed code. When I added some logs in the main.go file and running it locally I can see the logs from

main.go

on my terminal. Now to narrow down the problem how shall I set the blockstore to minio when I run using go run command. In other words what are the argument required to set the below properties in

go run main.go run --local-settings

Copy code

## Commands from docker compose file to set blockstore to MINIO 
- LAKEFS_DATABASE_TYPE=local
      - LAKEFS_BLOCKSTORE_TYPE=s3
      - LAKEFS_BLOCKSTORE_S3_FORCE_PATH_STYLE=true
      - LAKEFS_BLOCKSTORE_S3_ENDPOINT=<http://minio:9000>
      - LAKEFS_BLOCKSTORE_S3_CREDENTIALS_ACCESS_KEY_ID=minioadmin
      - LAKEFS_BLOCKSTORE_S3_CREDENTIALS_SECRET_ACCESS_KEY=minioadmin
      - LAKEFS_AUTH_ENCRYPT_SECRET_KEY=some random secret string
      - LAKEFS_STATS_ENABLED
      - LAKEFS_LOGGING_LEVEL
      - LAKECTL_CREDENTIALS_ACCESS_KEY_ID=AKIAIOSFODNN7EXAMPLE
      - LAKECTL_CREDENTIALS_SECRET_ACCESS_KEY=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
      - LAKECTL_SERVER_ENDPOINT_URL=<http://localhost:8000>

Isan Rivkin

05/14/2023, 6:20 PM

Glad the logs issue solved

Vaibhav Kumar

05/14/2023, 6:26 PM

yes, but I still have the other part left to be solved 😄

Ariel Shaqed (Scolnicov)

05/14/2023, 6:27 PM

I believe you can change your compose file to run treeverse/lakefs:dev. Now if you

make build-docker

and rm the lakeFS container and start a new one, you should be able to see your code. Alternatively, you might be able to

docker cp

a lakefs executable into the container on /app/lakefs. Restart that container and your code should run. (This one can be trickier if your container runs a sufficiently different version of Linux than your physical machine.)

Vaibhav Kumar

05/14/2023, 6:47 PM

As of now I found this config doc . So I was creating `.lakefs.yaml`file and adding configs this way

database.type="local"

let me know if I am going in the right direction. Sorry, but I didn't followed you exactly on the docker related stuff that you have suggested. I think you are saying to build the local image using pom.xml? Can you share some doc for me to take a look into that as well?

Vaibhav Kumar

05/14/2023, 7:01 PM

I have set the config the below way in

.lakefs.yaml

but I am getting some syntax issues 🫣

Copy code

database.type="local"
blockstore.type="s3"
blockstore.s3.force_path_style=true
blockstore.s3.endpoint="<http://minio:9000>"
blockstore.s3.credentials.access_key_id="minioadmin"
blockstore.s3.credentials.secret_access_key="minioadmin"

Ariel Shaqed (Scolnicov)

05/15/2023, 5:29 AM

That's not YAML. I think you are having all of these configuration issues that have nothing to do with the issue that you are trying to resolve. You might consider taking some time off fixing the issue to read up on how lakeFS configuration works. We have some examples on the page (here's one). We try to make our configuration very similar to configuration of other systems. So hopefully it will also be helpful for configuring other systems.

Vaibhav Kumar

05/15/2023, 5:57 AM

yes I think I am not using the configs the right way. So my links between the lakefs server , Minio and the hadoop client are broken.

234 Views

Open in Slack

Previous Next