Nethsara Siyum
03/05/2024, 6:58 PMpartitioning = ['cust_id', 'file_name', 'added_year', 'added_month', 'added_date']
loop = asyncio.get_event_loop()
s3_path = _f_"s3://{AWS_BUCKET_NAME}/parquet_data"
await loop.run_in_executor(None, _lambda_: <http://wr.s3.to|wr.s3.to>_parquet(
_df_=<http://batch.to|batch.to>_pandas() ,
_path_=s3_path,
_dataset_=True,
_max_rows_by_file_=MAX_ROWS_PER_FILE,
_use_threads_=True,
_partition_cols_ = partitioning,
_mode_='append',
_boto3_session_=s3_session,
_filename_prefix_=basename_template
))
Like this.Niro
03/05/2024, 8:12 PMNethsara Siyum
03/05/2024, 8:33 PMs3_path = "<s3://customer-data/parquet_data>"
I created boto using this
s3 = boto3.client('s3',
_endpoint_url_=lakefsEndPoint,
_aws_access_key_id_=lakefsAccessKey,
_aws_secret_access_key_=lakefsSecretKey
)
s3_session = boto3.Session(
_aws_access_key_id_=lakefsAccessKey,
_aws_secret_access_key_=lakefsSecretKey
)
The issue is this is getting errors.
ERROR:app.config:Error occurred during writing to Parquet file: An error occurred (InvalidAccessKeyId) when calling the PutObject operation: The AWS Access Key Id you provided does not exist in our records.
Niro
03/05/2024, 8:35 PMNethsara Siyum
03/05/2024, 8:39 PMNiro
03/05/2024, 8:42 PM<s3://customer-data/main/parquet_data|s3://customer-data/main/parquet_data>
Nethsara Siyum
03/05/2024, 8:43 PMERROR:app.config:Error occurred during writing to Parquet file: An error occurred (InvalidAccessKeyId) when calling the PutObject operation: The AWS Access Key Id you provided does not exist in our records.
Niro
03/05/2024, 8:44 PMNethsara Siyum
03/05/2024, 8:44 PMlakefsEndPoint = '<http://localhost:8000>'
lakefsAccessKey = 'AKIAJxxMVQ'
lakefsSecretKey = 'I1V0Mp26GxxtJYUYF4TL'
Niro
03/05/2024, 8:45 PMNethsara Siyum
03/05/2024, 8:46 PMNiro
03/05/2024, 8:52 PMNethsara Siyum
03/06/2024, 9:51 AM