Temilola Onaneye
03/08/2023, 7:05 PMGuy Hardonag
03/08/2023, 7:13 PMTemilola Onaneye
03/08/2023, 7:16 PMAmit Kesarwani
03/08/2023, 7:24 PMTemilola Onaneye
03/08/2023, 7:44 PMGuy Hardonag
03/08/2023, 8:12 PMrepo
and branch
when accessing files
Assuming your lakeFS is running on localhost:8000
and you want to access a file that exists on
Repo: your-repo
Branch: your-branch
file-path: patth/to/file.csv
It should look something like this:
df = pd.read_csv(
"<s3://your-repo/your-branch/path/to/file.csv>",
storage_options={
"key": AWS_ACCESS_KEY_ID,
"secret": AWS_SECRET_ACCESS_KEY,
"token": AWS_SESSION_TOKEN,
"client_kwargs": {"endpoint_url": "localhost:8000"}
Temilola Onaneye
03/08/2023, 8:56 PMBarak Amar
import pandas as pd
df = pd.read_csv(
f"<s3://repo/main/data.csv>",
storage_options={
"key": "AKIAIOSFDNN7EXAMPLEQ",
"secret": "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY",
"client_kwargs": {
"endpoint_url": "<http://localhost:8000>",
}
},
)
print(df.to_string())
Same as @Guy Hardonag mentioned - the call here goes to lakefs, the example is running on a local instanceTemilola Onaneye
03/08/2023, 9:17 PMBarak Amar
boto
package or s3fs
(like the code above) - both uses lakeFS's S3 compatible API.
lakeFS also provide API + python SDK lakefs_client
package - https://pydocs.lakefs.io/.Temilola Onaneye
03/09/2023, 11:09 AMBarak Amar
Temilola Onaneye
03/09/2023, 11:11 AMBarak Amar
Temilola Onaneye
03/09/2023, 11:12 AMBarak Amar
import boto3
session = boto3.session.Session()
s3_client = session.client(
service_name='s3',
aws_access_key_id='AKIAIOSFDNN7EXAMPLEQ',
aws_secret_access_key='wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY',
endpoint_url='<http://localhost:8000>',
)
s3_client.download_file('repo', 'main/userdata1.parquet', 'userdata1.parquet')
Temilola Onaneye
03/09/2023, 11:21 AMBarak Amar
import lakefs_client
from lakefs_client.client import LakeFSClient
configuration = lakefs_client.Configuration()
configuration.username = 'AKIAIOSFDNN7EXAMPLEQ'
configuration.password = 'wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY'
configuration.host = '<http://localhost:8000>'
client = LakeFSClient(configuration)
with (
client.objects.get_object(repository='ugc', ref='main', path='userdata1.parquet') as f,
open('userdata1.parquet', "wb") as o
):
o.write(f.read())
Temilola Onaneye
03/09/2023, 11:35 AM