Hey Everyone, I was trying to read lakefs reposito...
# help
r
Hey Everyone, I was trying to read lakefs repository data from python but not able to succeed on that, can somebody guide or suggest me something on this. In lakefs data is stored in below manner: Lakefs://tab/main/kpi/matic/year=22/month=10/day=10/file.parquet So for every day have data in this day/month wise pattern. My requirement is to read whole matic table's data at once.
a
Hi Rahul, It is usually easiest to read data using the S3 gateway. By setting an appropriate endpoint URL to point at the lakeFS server, your Python (or any other language) programs can read and write objects on lakeFS exactly as they would on S3. Examples of how to do this with boto (the S3 client for Python) appear in our integrations section. This also works with other programs with S3 clients, including Spark and most other data processing programs. The short version: Set your boto client to read from lakeFS according to the Boto (Python) integration guide. And tell it to read from an S3-style URL; in your case it will look like s3://tab/main/kpi/matic/ with whatever partition arguments you need. The advantage is that if you can do it with Python on S3, you already know how to do it with Python on lakeFS. Does this help?