https://lakefs.io/ logo
Docs
Join the conversationJoin Slack
Channels
announcements
blockers_for_windward
career-opportunities
celebrations
cuddle-corner
data-discussion
data-events
dev
events
general
help
iceberg-integration
lakefs-for-beginners
lakefs-hubspot-cloud-registration-email-automation
lakefs-releases
lakefs-suggestions
lakefs-twitter
linen-dev
memes-and-banter
new-channel
new-channel
say-hello
stackoverflow
test
Powered by Linen
general
  • g

    Giorgio Zoppi

    08/25/2020, 7:18 AM
    we did this maybe can be useful https://github.com/RstorLabs/rs-benchmark
    👍 1
    a
    1 reply · 2 participants
  • a

    Andor Markus

    08/26/2020, 1:45 PM
    HI Itai, Im looking for the recommended instance size on AWS. Can it run on RISK or it needs X86?
    a
    3 replies · 2 participants
  • y

    Yael Rivkind

    08/27/2020, 3:46 PM
    Welcome @Joydeep Banik Roy!
    👋🏻 1
    j
    1 reply · 2 participants
  • j

    Joydeep Banik Roy

    09/13/2020, 8:43 AM
    If you guys want me to specifically look into some issues or point me to some doc
    a
    1 reply · 2 participants
  • o

    Oz Katz

    09/13/2020, 1:08 PM
    I'll be giving a talk this Tuesday on Data Versioning - use cases, tool comparison and an overview of lakeFS - feel free to join here: https://www.meetup.com/Big-things-are-happening-here/events/272943689/
    👍 1
    a
    2 replies · 2 participants
  • a

    Ariel Shaqed (Scolnicov)

    09/24/2020, 6:39 AM
    Welcome, Devarsh! You'll probably want to read https://hacktoberfest.digitalocean.com/details/. But tl;dr: • Optional: Find an issue https://github.com/treeverse/lakeFS/issues?q=is%3Aissue+is%3Aopen+label%3Ahacktoberfesthttps://docs.lakefs.io/contributing that isn't assigned to anyone • Optional: Ping us to assign you to that issue • Please read https://docs.lakefs.io/contributing (it's short and simple!) • Send us a PR during October We're always happy to accept PRs for stuff that isn't on our list! Please do ask any questions here or on #dev, or just DM me. Looking forward to seeing you on GitHub!
    d
    p
    4 replies · 3 participants
  • o

    Oz Katz

    10/05/2020, 10:47 AM
    Hey @ABC 🙂 - in short: DVC is more focused on ML pipeline management including versioning of sample data, model and metrics. lakeFS focuses on Data Lake development life cycle - making changes to very large scales of data safely by allowing isolation, revert and CI/CD capabilities on the data itself. For a more thorough overview, you can check out this talk I gave recently comparing the different data versioning tools:

    https://www.youtube.com/watch?v=wMwhVeU36bc▾

    (including DVC)
    💯 2
    a
    1 reply · 2 participants
  • f

    Furkan demir

    10/09/2020, 1:13 PM
    hi, everyone
    👋 2
    a
    1 reply · 2 participants
  • p

    Priyanshu Gaikwad

    10/12/2020, 10:48 AM
    Hey!
    👋 5
    b
    a
    2 replies · 3 participants
  • e

    einat.orr

    11/06/2020, 3:04 PM
    Are you currently using an object storage?
    n
    2 replies · 2 participants
  • t

    T. P.

    11/13/2020, 3:51 PM
    i have a working pg, minio, lakefs within eks in case anyone needs help in future, please ping me
    🎊 1
    o
    5 replies · 2 participants
  • t

    T. P.

    11/13/2020, 3:57 PM
    @Barak Amar really appreciate your two points above as you mentioned, they are not in docs 🙂 those were the key, again, thanks
    👍 1
    b
    2 replies · 2 participants
  • t

    T. P.

    11/13/2020, 9:18 PM
    is lakefs k8s operator in development ?
    o
    1 reply · 2 participants
  • g

    Giorgio Zoppi

    11/16/2020, 4:43 PM
    what are you planning for testing
    i
    13 replies · 2 participants
  • d

    Daniel Shuy

    11/18/2020, 7:28 AM
    hello, will there be additional swag for Hacktoberfest contributors? :P
    y
    s
    2 replies · 3 participants
  • m

    murat migdisoglu

    11/27/2020, 10:07 AM
    Hi, I'm a newbie on lakefs and congrats to the team contributing to it. It might be me but I couldn't find the supported operations on S3. I've setup a docker swarm env using minio and lakefs(for testing purposes) When I try to revert a branch to a certain commit id, i get the following error: lakectl branch revert lakefs://example-repo@master --commit ~KJ8Wd1Rs96Z Error executing command: feature not supported is that expected?
    o
    3 replies · 2 participants
  • o

    Oz Katz

    12/02/2020, 11:51 AM
    we support Python using both boto - https://docs.lakefs.io/using/boto.html and our full api - https://docs.lakefs.io/using/python.html
    s
    1 reply · 2 participants
  • k

    Kyle Bader

    12/03/2020, 3:32 PM
    where iceberg is limited to tabular data, this sort of design would be good for other types as well
    t
    1 reply · 2 participants
  • k

    Kyle Bader

    12/10/2020, 3:48 PM
    I think time travel is sufficient, and probably easier to reason about than a DAG for tabular data. If I can point a workload to a point in time, I have reproducibility in terms of interacting with the same data
    a
    1 reply · 2 participants
  • s

    sys

    01/11/2021, 2:07 PM
    Hello there, How does the integration of lakefs with Airflow work? Thanks
    o
    3 replies · 2 participants
  • e

    einat.orr

    01/29/2021, 7:57 AM
    Main two things to note are they only support Iceberg and potentially Delta, and their architecture is less cost effective (dynamo).
    k
    j
    +1
    8 replies · 4 participants
  • a

    Ariel Shaqed (Scolnicov)

    02/09/2021, 1:07 PM
    Hi, tl;dr: Please fill out https://forms.gle/cLAzDMpTB9vpndR6A if you can answer some questions about a Spark client for reading lakeFS metadata. We're designing a thick client for lakeFS metadata. It will allow writing Spark jobs to perform operations on metadata (inventory of files in a commit) on lakeFS. I'm interested in what features would be most useful to you. If you could fill out https://forms.gle/cLAzDMpTB9vpndR6A (should take 5 minutes) it would be great! I shall summarize results here. If you like, fill out your email on the form and I'll be happy to email you the results. Thanks!
    👍 6
    1 reply · 1 participant
  • e

    einat.orr

    02/19/2021, 5:33 PM
    Hi Barrett, Yes, a client is on our road map to be released on a few weeks. Please share your use case, it is important for our learning of the needs.
    b
    o
    7 replies · 3 participants
  • b

    Barrett Strausser

    02/22/2021, 1:23 AM
    Following up on that previous question. I have another, I think I know the answer based on the reading of the docs, based it's worth an ask. What is assumption of ownership of the underlying object on the part of LakeFS?
    o
    8 replies · 2 participants
  • e

    einat.orr

    02/23/2021, 6:47 AM
    Hi Nishant, yes, you can. Here's how: https://lakefs.io/git-like-operations-over-minio-with-lakefs/
    j
    1 reply · 2 participants
  • e

    einat.orr

    02/23/2021, 7:02 AM
    You can use lakeFS with Hadoop, Spark and Hive. See the section "use lakeFS with" in our documentation for how. However, lakeFS doesn't support HDFS as a storage layer. For storage you need to use an object storage such as minIO.
    n
    10 replies · 2 participants
  • a

    Adly Mousa

    03/15/2021, 10:12 AM
    Hello guys, I found out this project on yesterday and it's amazing what you're doing here, I am really impressed. I need to start navigating the tool but I wanna make sure if it integrates with AWS glue or not, if so, could u please send me some reference of the way of doing that? I googled a bit but seems there's no much resources.
    y
    4 replies · 2 participants
  • y

    Yoni Augarten

    04/01/2021, 7:49 AM
    Hi @Jai Jagani, welcome! This post explains how to use lakeFS over MinIO: https://lakefs.io/git-like-operations-over-minio-with-lakefs/ The example there starts lakeFS using docker-compose, with postgres also running as a docker container.
    j
    18 replies · 2 participants
  • j

    Jai Jagani

    04/06/2021, 4:25 AM
    Can you please let me know how to install lakectl ?
    o
    2 replies · 2 participants
  • d

    Dinakar Chennubotla

    04/06/2021, 8:23 AM
    i am working on POC, i.e. need to submit data from kafka to lakefs and then finally to Minio. can anyone help with docs or knowledge base. please do the needful
    a
    8 replies · 2 participants
Powered by Linen
Title
d

Dinakar Chennubotla

04/06/2021, 8:23 AM
i am working on POC, i.e. need to submit data from kafka to lakefs and then finally to Minio. can anyone help with docs or knowledge base. please do the needful
a

Ariel Shaqed (Scolnicov)

04/06/2021, 8:53 AM
Hi Dinakar! I'd start with the quick-start section of the docs is at https://docs.lakefs.io/quickstart/. The left-hand sidebar on all docs pages has links to all sections, including detailed deployment instructions and full reference.
When you get going and have some specific questions... It is usually best to ask questions on #help, as we monitor that channel most closely and will be able to respond sooner. Good luck!
d

Dinakar Chennubotla

04/06/2021, 9:02 AM
great and thank you
My question: ============= 1. how can integrate lakefs with minio. i need your help on config files and what should I tweet
a

Ariel Shaqed (Scolnicov)

04/06/2021, 9:17 AM
Our @Yoni Augarten wrote a great blog post https://lakefs.io/git-like-operations-over-minio-with-lakefs/ that does exactly that!
d

Dinakar Chennubotla

04/06/2021, 10:05 AM
cool, it is very good documented.
thanks a lot
will continue by work
View count: 2