https://lakefs.io/ logo
Docs
Join the conversationJoin Slack
Channels
announcements
blockers_for_windward
career-opportunities
celebrations
cuddle-corner
data-discussion
data-events
dev
events
general
help
iceberg-integration
lakefs-for-beginners
lakefs-hubspot-cloud-registration-email-automation
lakefs-releases
lakefs-suggestions
lakefs-twitter
linen-dev
memes-and-banter
new-channel
new-channel
say-hello
stackoverflow
test
Powered by Linen
blockers_for_windward
  • a

    Ariel Shaqed (Scolnicov)

    12/22/2021, 12:20 PM
    https://github.com/treeverse/lakeFS/issues/2810#issuecomment-999534511 has a reproduction of the content-length issue without loading any lakeFS libs into Spark. I know that yesterday we saw something else; perhaps some libraries are different inside your Spark cluster? @Daniel Satubi grateful if you could take a look; I give MD5s (and versions...) for the exact JARs for the reproduction, and would be happy for your sanity checking. Thanks!
    d
    3 replies · 2 participants
  • l

    Lior Resisi

    12/22/2021, 5:00 PM
    @Barak Amar and @Daniel Satubi - could you please work tomorrow together with an open zoom so you'll be able to make some progress together on the other 2 issues?
    👍 1
    b
    d
    7 replies · 3 participants
  • d

    Daniel Satubi

    12/23/2021, 12:41 PM
    Hi, just wanted to update on our status: we’re currently stabilising our environment so we won’t be blocked by the current limitations. just to verify: the next steps on our side is to switch from using “hadoop-lakefs” as part of our pom dependencies to providing “hadoop-lakefs-assembly” as a package for the spark-submit command
    b
    1 reply · 2 participants
  • d

    Daniel Satubi

    02/23/2022, 8:55 AM
    Hi Good Morning Tal! • since we fail the diff we are not doing the merge but previously when we received those timeouts it happened both in merge/diff • we increased the timeframe from of the app from 3 years to 10 years, 1 file per day in both cases •
    refsApi.diffRefs(repositoryName, branchName, sourceBranchName, offset, batchSize, prefix, null, null, null);
    ◦ branchName is the current app branch ◦ sourceBranchName is “main” ◦ batchSize = 1000 ◦ prefix =
    ww-production3/merged-monthly-vesselstories/ww_partition_date=20
    • we are branching from main when we start an app, write the new data (override previous data) and commit it to branch and then diff our branch with main to see if someone else wrote something to the files we changed (prefix) how can we verify we are using the 100GB cache? what is the measure for long operations (num of files, size of files)? Thanks 🙂
    👀 1
    1 reply · 1 participant
  • d

    Daniel Satubi

    02/24/2022, 12:57 PM
    here’s the request url:
    api/v1/repositories/windward/refs/main/diff/production3-vs-monthly-merger-app-2022-02-24_07-44-44-295?after=&amount=100&delimiter=/&prefix=
    o
    3 replies · 2 participants
  • a

    Ariel Shaqed (Scolnicov)

    04/27/2022, 7:00 AM
    Good morning Daniel, Lior, We are reviewing this top-level design for the proposed LakeFSOutputCommitter. If you can read it I'd love to hear your comments (before or after the merge, of course). So @Tal Sofer and I would like to meet you next week to go over our next steps. When would be convenient for you?
    l
    d
    5 replies · 3 participants
  • d

    Daniel Satubi

    10/02/2022, 11:48 AM
    Hi Guys! It’s been a while…we had a problem and collected some logs from around that time. we’re not sure we understand what happened there - could you help investigating further? Thanks 🙂
  • b

    Barak Amar

    10/02/2022, 12:21 PM
    Hi @Daniel Satubi, sure
  • d

    Daniel Satubi

    01/01/2023, 12:54 PM
    Hi Guys! it’s been a while… 🙂 we’re trying to run the Garbage Collector and ran into some dependencies issues we tried running it (sweep stage - mark stage finished successfully) on spark 2.4.8 (2.4.7 with --packages does not work anymore because of bintray deprecations) when specifiying
    hadoop-aws 2.7.7
    in packages it installs a dependency of
    aws-java-sdk 1.7.4
    and we get an error for classDef not found (the class definition is found in newer
    aws-java-sdk
    but not in 1.7.4)
    Caused by: java.lang.NoClassDefFoundError: com/amazonaws/auth/AWSStaticCredentialsProvider
    	at io.treeverse.clients.conditional.S3ClientBuilder$.build(S3ClientBuilder.scala:26)
    	at io.treeverse.clients.BulkRemoverFactory$S3BulkRemover.getS3Client(BulkRemoverFactory.scala:69)
    	at io.treeverse.clients.BulkRemoverFactory$S3BulkRemover.deleteObjects(BulkRemoverFactory.scala:58)
    could you help us figure how to run the sweep? Thanks :)
    o
    t
    38 replies · 3 participants
Powered by Linen
Title
d

Daniel Satubi

01/01/2023, 12:54 PM
Hi Guys! it’s been a while… 🙂 we’re trying to run the Garbage Collector and ran into some dependencies issues we tried running it (sweep stage - mark stage finished successfully) on spark 2.4.8 (2.4.7 with --packages does not work anymore because of bintray deprecations) when specifiying
hadoop-aws 2.7.7
in packages it installs a dependency of
aws-java-sdk 1.7.4
and we get an error for classDef not found (the class definition is found in newer
aws-java-sdk
but not in 1.7.4)
Caused by: java.lang.NoClassDefFoundError: com/amazonaws/auth/AWSStaticCredentialsProvider
	at io.treeverse.clients.conditional.S3ClientBuilder$.build(S3ClientBuilder.scala:26)
	at io.treeverse.clients.BulkRemoverFactory$S3BulkRemover.getS3Client(BulkRemoverFactory.scala:69)
	at io.treeverse.clients.BulkRemoverFactory$S3BulkRemover.deleteObjects(BulkRemoverFactory.scala:58)
could you help us figure how to run the sweep? Thanks :)
o

Oz Katz

01/01/2023, 1:13 PM
Hey @Daniel Satubi! Sure 🙂 looping in @Tal Sofer
t

Tal Sofer

01/01/2023, 1:20 PM
Thanks @Oz Katz! and hi @Daniel Satubi 🙂 looking into it
d

Daniel Satubi

01/01/2023, 1:47 PM
Thanks, I’ll add the full stack trace with the submit command
t

Tal Sofer

01/01/2023, 1:47 PM
Thank you!
Hi @Daniel Satubi, I managed to reproduce the issue and we already have a draft pr that fixes it. We would like to run more testing tomorrow to validate the solution and then we will get the fix in. I will update you on the progress tomorrow and on when we are expecting to release the fix (I expect it to be a matter of 1-3 days). Thanks for reporting the issue and have a great evening!
d

Daniel Satubi

01/02/2023, 8:54 AM
Thanks! 🙂 let me know if I can help test it somehow
🙏 1
t

Tal Sofer

01/02/2023, 4:03 PM
Hi @Daniel Satubi 🙂 I’m getting back to you with updates. We ran some tests on the solution in the pr above and planning to run more validations. So far, things look good. so if everything works well we will release the fix by end of this week, hopefully before Thursday. I will of course continue to share updates with you
d

Daniel Satubi

01/03/2023, 9:52 AM
Thanks! 🙂 I read some of the discussion in the PR, are you testing with 2.4.7 or 2.4.8? which hadoop version in the spark jars?
t

Tal Sofer

01/03/2023, 10:38 AM
i’m testing with 2.4.7, hadoop version 2.7.7
d

Daniel Satubi

01/03/2023, 2:35 PM
we have a 2.4.7 & 2.4.8 cluster with hadoop 2.7.3 - I’m not sure it’ll work I’ll see what we can do…
t

Tal Sofer

01/04/2023, 1:21 PM
Hi @Daniel Satubi! i’m considering another optional solution that may be independent of your hadoop version. I will keep you updated.
d

Daniel Satubi

01/04/2023, 1:53 PM
Thanks! looking forward… 🥳
:heart_lakefs: 2
t

Tal Sofer

01/05/2023, 2:29 PM
That’s perfect! 🤗 thanks for letting us know, and for offering to test things up! Sharing updates on our current status: during the last we tested the solution in https://github.com/treeverse/lakeFS/pull/4920, while it works with hadoop 2, it doesn’t work with hadoop 3 which broke our Spark3-hadoop3 client. We are now testing another solution that limits this change only to our hadoop2 builds. For that reason we will not be releasing the fix this week but during the next week after completing validations 🙂
I will update you on the when we expect to release early next week. once the client is out we will appreciate your help in testing it out.
d

Daniel Satubi

01/05/2023, 3:56 PM
Great! thank you very much 🙂 have a nice weekend
:heart_lakefs: 1
Hi, Good Morning ☀️ any news? sorry for pinging about this so much, we saw our storage costs go up linearly with our usage (branch per application run) but we don’t access the data after the application run (only main & ongoing apps branches are relevant) so we’d really like to cleanup our s3 🙂 🙏
t

Tal Sofer

01/09/2023, 8:57 AM
hi Daniel good morning!
I was planning to write you during the next few hours but you are faster! :lakefs: All testing went successfully, but we added an integration test that simulates a run of Spark 3 with hadoop 2.7.4. this test fails probably for hadoop reasons (unrelated to lakeFS). Manual testing on environment of your spec worked, but I’m hoping to try to understand the impact for your env before we release this. We will decide today how to proceed and when to release and I will write back to you by EOD today. I hope this helps!
d

Daniel Satubi

01/09/2023, 9:17 AM
Thanks! if you’d like to send us the artifact for testing before releasing we’d be more than happy 🙂
t

Tal Sofer

01/09/2023, 9:35 AM
Thanks for the kind offer! I will be actually happy to do so. what email I can sent the artifact to?
d

Daniel Satubi

01/09/2023, 9:37 AM
daniels@windward.ai or even here if it works (not sure if there’s a size limit in slack)
t

Tal Sofer

01/09/2023, 9:37 AM
thanks, will let you know once I sent it 🙂 in a meeting will do it right after
🙏 1
@Daniel Satubi I sent you the client via email, let me know if you have any issues with it. Looking forward to hear how testing goes!
d

Daniel Satubi

01/09/2023, 10:47 AM
Thanks! Should I ran it with the same command as in the docs?
--packages
with same dependency?
t

Tal Sofer

01/09/2023, 10:48 AM
yes,
--packages org.apache.hadoop:hadoop-aws:2.7.7
it is
d

Daniel Satubi

01/09/2023, 10:58 AM
Thanks! will run an report back 🙂
🤗 1
It finished successfully but I don’t think the files we expected were deleted…
I’m re-running both mark & sweep together now instead of separate.
t

Tal Sofer

01/09/2023, 11:58 AM
Thanks for reporting back 🙂
It finished successfully but I don’t think the files we expected were deleted…`
For example, what did you expect to happen but didn’t happen? I understand that you ran a sweep only job on a previously marked addresses?
I’m re-running both mark & sweep together now instead of separate.
Cool, let me know how did this go
d

Daniel Satubi

01/09/2023, 12:05 PM
previously we deleted all stale branches and run the mark job, we have a 300TB+ bucket for lakefs with about 20TB of actively accessed data, also we see the data is growing linearly so we assume 20TB is the size of the “main” give or take and the ~300TB is the data from the branches. we expected to see some changes after the sweep but none took effect… can we set the job to log more info?
t

Tal Sofer

01/09/2023, 1:05 PM
Really happy that the artifact we shared with you fixes the dependency issue, thanks for testing it out!:heart_lakefs: As for your question, since it is related to the way GC works can I ask you to move it to our #help channel? I’m sure that other community members can benefit from it. When you move it there I would add more details about your GC setup. For example, what retention policy you have configured, how did you run the job (mark and then sweep or mark+sweep together), and attach logs you feel comfortable sharing. If you think there is a bug you are also welcome to open a github issue describing it.
:lakefs: 1
d

Daniel Satubi

01/09/2023, 1:29 PM
Thanks! I’ll wait for the new run to finish and if there’s any problem I’ll write in the #help channel We’re very grateful for the quick replays and all the help 🙂
t

Tal Sofer

01/09/2023, 2:39 PM
Our pleasure! thank you!
Hi @Daniel Satubi, i’m getting back to you with an update on the client release - I merged the fix you’ve tested today, note that it’ll only work for a hadoop 2.7.7 setup. Given that you already have the build, we will be releasing it by end of the week. does this work for you? Our integration test setup was not accurate and therefore failed and we will enable it in another pr.
d

Daniel Satubi

01/09/2023, 3:57 PM
still running…I’ll report back tomorrow with all the results and our cluster versions
t

Tal Sofer

01/09/2023, 3:58 PM
Cool! thank you!
View count: 3