Robin Moffatt
04/25/2023, 5:03 PMspark.sql.warehouse.dir
? I've tried that but getting org.apache.spark.SparkException: Unable to create database default as failed to create its directory <s3://example/main>.
and no obvious error on the lakeFS side that I can see.mishraprafful
04/28/2023, 1:56 PMSparkApplication
in k8s.
ThanksOmar Talbi
05/02/2023, 7:57 PMBudi
05/04/2023, 2:00 AMHT
05/04/2023, 7:22 AMPaul
05/04/2023, 2:24 PMRobin Moffatt
05/04/2023, 8:34 PMlakectl branch revert
from the web UI? e.g. what's shown here but without CLI https://docs.lakefs.io/use_cases/rollback.html#how-to-rollback-from-a-bad-data-syncPaul
05/09/2023, 11:47 AMJon Erik Kemi Warghed
05/11/2023, 11:35 AMIddo Avneri
05/14/2023, 11:00 AMVaibhav Kumar
05/15/2023, 5:14 PM./lakefs --config /Users/simar/lakeFS/cmd/lakefs/config.yaml run
but getting the below error
simar@192 lakeFS % ./lakefs --config /Users/simar/lakeFS/cmd/lakefs/config.yaml run
INFO[0000]/Users/simar/lakeFS/cmd/lakefs/cmd/root.go:80 <http://github.com/treeverse/lakefs/cmd/lakefs/cmd.initConfig()|github.com/treeverse/lakefs/cmd/lakefs/cmd.initConfig()> Configuration file fields.file=/Users/simar/lakeFS/cmd/lakefs/config.yaml file=/Users/simar/lakeFS/cmd/lakefs/config.yaml phase=startup
{"error":"InvalidParameterValue: Unsupported action GetCallerIdentity\n\tstatus code: 400, request id: 175F5FE4DE5BD201","file":"pkg/cloud/aws/metadata.go:64","func":"pkg/cloud/aws.(*MetadataProvider).GetMetadata","level":"warning","msg":"Failed to to get AWS account ID for BI","time":"2023-05-15T22:33:42+05:30"}
{"file":"pkg/logging/logger.go:266","func":"pkg/logging.(*logrusEntryWrapper).Fatalf","level":"fatal","msg":"Mismatched adapter detected. lakeFS started with adapter of type 's3', but repository 'test' is of type 'local'","time":"2023-05-15T22:33:42+05:30"}
-- Config.yaml
logging:
format: json
level: WARN
output: "-"
database:
type: "local"
auth:
encrypt:
secret_key: "10a718b3f285d89c36e9864494cdd1507f3bc85b342df24736ea81f9a1134bcc"
blockstore:
type: s3
s3:
force_path_style: true
endpoint: <http://localhost:9000>
discover_bucket_region: false
credentials:
access_key_id: minioadmin
secret_access_key: minioadmin
Cristian Caloian
05/16/2023, 9:29 AM3.1.2
with the following dependencies installed:
<https://repo1.maven.org/maven2/net/java/dev/jets3t/jets3t/0.9.4/jets3t-0.9.4.jar>
<https://repo1.maven.org/maven2/com/amazonaws/aws-java-sdk/1.12.178/aws-java-sdk-1.12.178.jar>
<https://repo1.maven.org/maven2/org/apache/hadoop/hadoop-aws/2.7.7/hadoop-aws-2.7.7.jar>
<https://repo1.maven.org/maven2/com/amazonaws/aws-java-sdk-bundle/1.11.375/aws-java-sdk-bundle-1.11.375.jar>
<https://repo1.maven.org/maven2/io/lakefs/hadoop-lakefs-assembly/0.1.9/hadoop-lakefs-assembly-0.1.9.jar>
<http://treeverse-clients-us-east.s3-website-us-east-1.amazonaws.com/lakefs-spark-client-312-hadoop3/0.7.2/lakefs-spark-client-312-hadoop3-assembly-0.7.2.jar>
I am using the following command to run the garbage collector:
spark-submit --class io.treeverse.clients.GarbageCollector \
--jars <http://treeverse-clients-us-east.s3-website-us-east-1.amazonaws.com/hadoop/hadoop-lakefs-assembly-0.1.12.jar> \
-c spark.hadoop.lakefs.api.url="https://<api-url>/api/v1" \
-c spark.hadoop.lakefs.api.access_key="<api-access-key>" \
-c spark.hadoop.lakefs.api.secret_key="<api-secret-key>" \
-c spark.hadoop.fs.s3a.access.key="<s3a-access-key>" \
-c spark.hadoop.fs.s3a.secret.key="<s3a-secret-key>" \
$SPARK_HOME/jars/lakefs-spark-client-312-hadoop3-assembly-0.7.2.jar <repo-name> eu-west-1
This is the error I’m getting:
Exception in thread "main" java.lang.NumberFormatException: For input string: "100M"
at java.base/java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.base/java.lang.Long.parseLong(Long.java:692)
at java.base/java.lang.Long.parseLong(Long.java:817)
at org.apache.hadoop.conf.Configuration.getLong(Configuration.java:1538)
at org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:248)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3303)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:124)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3352)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3320)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:479)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:361)
at org.apache.spark.sql.execution.streaming.FileStreamSink$.hasMetadata(FileStreamSink.scala:46)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:377)
at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:325)
at org.apache.spark.sql.DataFrameReader.$anonfun$load$3(DataFrameReader.scala:307)
at scala.Option.getOrElse(Option.scala:189)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:307)
at org.apache.spark.sql.DataFrameReader.csv(DataFrameReader.scala:795)
at org.apache.spark.sql.DataFrameReader.csv(DataFrameReader.scala:595)
at io.treeverse.clients.GarbageCollector.getCommitsDF(GarbageCollector.scala:105)
at io.treeverse.clients.GarbageCollector.getExpiredAddresses(GarbageCollector.scala:217)
at io.treeverse.clients.GarbageCollector$.markAddresses(GarbageCollector.scala:500)
at io.treeverse.clients.GarbageCollector$.main(GarbageCollector.scala:382)
at io.treeverse.clients.GarbageCollector.main(GarbageCollector.scala)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at <http://org.apache.spark.deploy.SparkSubmit.org|org.apache.spark.deploy.SparkSubmit.org>$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:951)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1039)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1048)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Arthur Fournier
05/17/2023, 3:42 AMSlackbot
05/17/2023, 3:42 AMTemilola Onaneye
05/19/2023, 10:44 AMSumit Sah
05/23/2023, 3:07 PMHT
05/25/2023, 2:51 AMLAKEFS-ENCRYPT-SECRET
is used for ?
We found that if we change that, all user cannot login anymore ... Is that key used to encrypt end user password/secret key ?
Edit: I mean LAKEFS_AUTH_ENCRYPT_SECRET_KEY
Ariel Shaqed (Scolnicov)
05/25/2023, 6:23 AMVaibhav Kumar
05/25/2023, 12:40 PMRobin Moffatt
05/25/2023, 4:59 PMcommit
is failing with
AirflowException("access_key_id must be specified in the lakeFS connection details")
I can see the connection defined as an env var in the Docker Compose, but it wasn't listed in the Airflow UI. I went ahead and created it and verified it from the CLI
airflow@f3ee9d2b2072:/opt/airflow$ airflow connections get lakefs
/home/airflow/.local/lib/python3.7/site-packages/airflow/configuration.py:386: FutureWarning: The auth_backends setting in [api] has had airflow.api.auth.backend.session added in the running config, which is needed by the UI. Please update your config before Apache Airflow 3.0.
FutureWarning,
| | | | | | | | | | is_extra_encrypte | |
id | conn_id | conn_type | description | host | schema | login | password | port | is_encrypted | d | extra_dejson | get_uri
=====+=========+===========+=============+===================+========+=======+==========+======+==============+===================+===================+===================
None | lakefs | HTTP | None | <http://lakefs:800> | None | None | None | None | None | None | {'access_key_id': | <http://http>%3A%2F%
| | | | 0/api/v1 | | | | | | | 'AKIAIOSFODNN7EXA | 2Flakefs%3A8000%2F
| | | | | | | | | | | MPLE', | api%2Fv1/?access_k
| | | | | | | | | | | 'secret_access_ke | ey_id=AKIAIOSFODNN
| | | | | | | | | | | y': | 7EXAMPLE&secret_ac
| | | | | | | | | | | 'wJalrXUtnFEMI/K7 | cess_key=wJalrXUtn
| | | | | | | | | | | MDENG/bPxRfiCYEXA | FEMI%2FK7MDENG%2Fb
| | | | | | | | | | | MPLEKEY'} | PxRfiCYEXAMPLEKEY
But the commit
still fails with the same error above.
I'm new to Airflow so any pointers on how to get this working much appreciated 🙂Narendra Nath
05/28/2023, 4:48 AMHT
05/30/2023, 2:05 AMHT
05/30/2023, 9:19 AMlakectl
config file (~/.lakectl.yaml
) ? A bit like AWS CLI and boto, both by default use the ~/.aws
configuration ?
In the doc, I only see defining auth in the code itself : https://docs.lakefs.io/integrations/python.html#initializing and https://pypi.org/project/lakefs-client/#installation--usageHT
05/31/2023, 3:20 AM$ lakectl --verbose commit <lakefs://hieu-test/main> --message "add file"
Branch: <lakefs://hieu-test/main>
It takes minutes for the command return, while the web interface say that the commit been done.Robin Moffatt
05/31/2023, 12:10 PMFailed to get lakeFS blockstore type
. Details in notebook - could someone point where I'm going wrong please? I'm using Everything Bagel with MinIO for storage
https://gist.github.com/rmoff/c3d6553aec11b569d8f1e1761b7182e5HT
05/31/2023, 9:25 PMHT
05/31/2023, 9:25 PMHT
05/31/2023, 9:27 PMTaha Sadiki
06/01/2023, 10:41 AMThomas
06/01/2023, 3:10 PMFailed to to get AWS account ID for BI
. My pod is running but it isn't ready to use and I don't have more logs... I think the problem comes from my configuration in values.yaml file but I've tested several others without success.
Have any of you had a similar failed deployment?