Hi another question This may be a bug not sure I did some fi lakeFS #help

Hi, another question. This may be a bug, not sure...

Joe M

05/19/2024, 4:01 AM

Hi, another question. This may be a bug, not sure. I did some file importing into an ingestion branch. merged to another branch. partitioned the data merged in, into another folder. committed that, and then merged that branch to main. • The data I "imported" to the first branch shows that their physical location is in S3, in a separate bucket from the repo, which is correct. • The partitioned data that i generated is showing as internal to lakefs, which is correct. (but the files are on s3 in the repo folder, also correct). After that, i exported the main branch to a location on s3. I used spark-submit for this, for the time being, using this command:

Copy code

spark-submit --conf "spark.hadoop.lakefs.api.url=http://<host>:8000/api/v1" \
--conf spark.hadoop.lakefs.api.access_key=<access-key> \
--conf spark.hadoop.lakefs.api.secret_key=<secret-key> \
--packages io.lakefs:lakefs-spark-client_2.12:0.13.0 \
--class io.treeverse.clients.Main export-app ct-raw "s3://<bucket>/ct-raw-export" --branch=main

The export successfully wrote out the partitioned data. but the other files that i imported did not get written out (this may be intentional), and i got this file in the root of the ct-raw-export folder: It's a binary file so i can't see what's in it.

EXPORT_4b417f73bf49a584120cf96f768d6829477f7dcb524c1db4279e3977be911362_2024-05-19T03%3A51%3A19.725414482Z_FAILURE

3 Views

Open in Slack

Previous Next