Aris Kalgreadis
04/18/2023, 11:07 AMOr Tzabary
04/18/2023, 11:11 AMAris Kalgreadis
04/18/2023, 11:39 AM23/04/18 13:35:09 WARN FileStreamSink: Assume no metadata directory. Error while looking for metadata directory in the path: gs://<MY_BUCKET>/_lakefs/retention/gc/commits/run_id=73c64576-6970-48e1-b013-83ba69f98fe6/commits.csv.
org.apache.hadoop.fs.UnsupportedFileSystemException: No FileSystem for scheme "gs"
Or Tzabary
04/18/2023, 11:42 AMAris Kalgreadis
04/18/2023, 11:43 AMOr Tzabary
04/18/2023, 12:47 PMAris Kalgreadis
04/18/2023, 1:21 PMspark-submit --class io.treeverse.clients.GarbageCollector \
--packages org.apache.hadoop:hadoop-aws:3.3.2 \
--packages com.google.cloud.bigdataoss:gcs-connector:hadoop3-2.2.11 \
--packages com.google.guava:guava:31.1-jre \
--conf spark.hadoop.fs.gs.impl=com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem \
--conf spark.hadoop.fs.AbstractFileSystem.gs.impl=com.google.cloud.hadoop.fs.gcs.GoogleHadoopFS \
--conf spark.hadoop.fs.gs.project.id=<PROJECT_ID> \
--conf spark.hadoop.google.cloud.auth.service.account.enable=true \
--conf spark.hadoop.google.cloud.auth.service.account.json.keyfile=<CREDENTIALS_FILES> \
-c spark.hadoop.lakefs.api.url= \
-c spark.hadoop.lakefs.api.access_key= \
-c spark.hadoop.lakefs.api.secret_key=\
-c spark.hadoop.lakefs.gc.do_sweep=false \
-c spark.hadoop.lakefs.gc.mark_id=mark_id \
<http://treeverse-clients-us-east.s3-website-us-east-1.amazonaws.com/lakefs-spark-client-312-hadoop3/0.7.0/lakefs-spark-client-312-hadoop3-assembly-0.7.0.jar> \
repo us-east-1
Or Tzabary
04/18/2023, 1:23 PMYoni Augarten
04/18/2023, 1:26 PMOr Tzabary
04/18/2023, 1:27 PMAris Kalgreadis
04/18/2023, 1:34 PMYoni Augarten
04/18/2023, 2:39 PMAris Kalgreadis
04/18/2023, 3:24 PM