Hi, another question. This may be a bug, not sure...
# help
j
Hi, another question. This may be a bug, not sure. I did some file importing into an ingestion branch. merged to another branch. partitioned the data merged in, into another folder. committed that, and then merged that branch to main. • The data I "imported" to the first branch shows that their physical location is in S3, in a separate bucket from the repo, which is correct. • The partitioned data that i generated is showing as internal to lakefs, which is correct. (but the files are on s3 in the repo folder, also correct). After that, i exported the main branch to a location on s3. I used spark-submit for this, for the time being, using this command:
Copy code
spark-submit --conf "spark.hadoop.lakefs.api.url=http://<host>:8000/api/v1" \
--conf spark.hadoop.lakefs.api.access_key=<access-key> \
--conf spark.hadoop.lakefs.api.secret_key=<secret-key> \
--packages io.lakefs:lakefs-spark-client_2.12:0.13.0 \
--class io.treeverse.clients.Main export-app ct-raw "s3://<bucket>/ct-raw-export" --branch=main
The export successfully wrote out the partitioned data. but the other files that i imported did not get written out (this may be intentional), and i got this file in the root of the ct-raw-export folder: It's a binary file so i can't see what's in it.