Richard Gilmore
07/02/2021, 4:13 PMspark.sparkContext.hadoopConfiguration.set("fs.lakefs.impl", "io.lakefs.LakeFSFileSystem")
spark.sparkContext.hadoopConfiguration.set("fs.lakefs.access.key", dbutils.secrets.get("development","LAKEFS_ACCESS_KEY"))
spark.sparkContext.hadoopConfiguration.set("fs.lakefs.secret.key", dbutils.secrets.get("development","LAKEFS_SECRET_ACCESS_KEY"))
spark.sparkContext.hadoopConfiguration.set("fs.lakefs.endpoint", "<http://lakefs.dev.company.com/api/v1/>")
This write fails silently, as in no failures but when I check location there is no data
I’m able to read data using the lakefs:// prefix if it has previously been written using the the other method for writing data
val masterPath = s"lakefs://${repo}/${master}/parquet/"
case class Test(story: String)
val masterStory = Seq(Test("I live on mastery")).toDS
println(masterPath)
masterStory.write.mode("overwrite").parquet(masterPath)
Barak Amar
spark.sparkContext.hadoopConfiguration.set("fs.lakefs.impl", "io.lakefs.LakeFSFileSystem")
spark.sparkContext.hadoopConfiguration.set("fs.lakefs.access.key", "<lakefs key>")
spark.sparkContext.hadoopConfiguration.set("fs.lakefs.secret.key", "<lakefs secret>")
spark.sparkContext.hadoopConfiguration.set("fs.lakefs.endpoint", "<https://bfs.lakefs.dev/api/v1>")
spark.sparkContext.hadoopConfiguration.set("fs.s3a.access.key", "<s3a key>")
spark.sparkContext.hadoopConfiguration.set("fs.s3a.secret.key", "<s3a secret>")
spark.sparkContext.hadoopConfiguration.set("fs.s3a.region", "us-east-1")
val masterPath = s"<lakefs://test31/master/parquet2/>"
case class Test(story: String)
val masterStory = Seq(Test("Say Yes More")).toDS
println(masterPath)
masterStory.write.mode("overwrite").parquet(masterPath)
I've updated the message after the first run.
Downloaded the parquet created and verified that it holds the updated message.Barak Amar
Richard Gilmore
07/02/2021, 4:54 PMRichard Gilmore
07/02/2021, 4:55 PM<lakefs://repo/branch/path/>
Barak Amar
Barak Amar
Barak Amar
Richard Gilmore
07/08/2021, 8:34 PM2021-07-08 20:15:08 [Executor task launch worker for task 1] INFO S3AFileSystem:V3: FS_OP_CREATE BUCKET[211459479356-databricks-data] FILE[<s3a://acc-databricks-data/lakefs-testing/data/testing-master:7815b225-ef96-450a-b79e-08a45a73e112/rkrI2cQ27qEpbJysg7J46>] Creating output stream; permission: rw-r--r--, overwrite: false, bufferSize: 4096
2021-07-08 20:15:08 [Executor task launch worker for task 1] INFO InternalParquetRecordWriter: Flushing mem columnStore to file. allocated memory: 25
2021-07-08 20:15:08 [Executor task launch worker for task 1] INFO S3ABlockOutputStream:V3: FS_OP_CREATE BUCKET[acc-databricks-data] FILE[lakefs-testing/data/testing-master:7815b225-ef96-450a-b79e-08a45a73e112/rkrI2cQ27qEpbJysg7J46] Closing stream; size: 579
2021-07-08 20:15:08 [Executor task launch worker for task 1] INFO S3ABlockOutputStream:V3: FS_OP_CREATE BUCKET[acc-databricks-data] FILE[lakefs-testing/data/testing-master:7815b225-ef96-450a-b79e-08a45a73e112/rkrI2cQ27qEpbJysg7J46] Upload complete; size: 579
2021-07-08 20:15:08 [Executor task launch worker for task 1] INFO BasicWriteTaskStatsTracker: Expected 1 files, but only saw 0. This could be due to the output format not writing empty files, or files being not immediately visible in the filesystem.
2021-07-08 20:15:08 [Executor task launch worker for task 1] INFO Executor: Finished task 0.0 in stage 1.0 (TID 1). 2114 bytes result sent to driver
I can read that path directly and it has the data it’s just not registered to lakefs ecosystem…Guy Hardonag
07/08/2021, 8:53 PMTal Sofer
07/08/2021, 9:07 PMspark.sparkContext.hadoopConfiguration.set("fs.lakefs.endpoint", "<http://lakefs.dev.company.com/api/v1/>")
It becomes:
spark.sparkContext.hadoopConfiguration.set("fs.lakefs.endpoint", "<http://lakefs.dev.company.com/api/v1>")
Please let us know if this fixes the problem. If this will not work we will investigate further!Tal Sofer
07/09/2021, 6:08 AMRichard Gilmore
07/09/2021, 10:06 AMdelta
tables. It fails on the initial write of the delta table on creating the delta log, but then is fine for subsequent updates to the table…Tal Sofer
07/09/2021, 10:11 AMRichard Gilmore
07/09/2021, 10:12 AMTal Sofer
07/09/2021, 10:14 AMRichard Gilmore
07/09/2021, 10:30 AMTal Sofer
07/09/2021, 10:36 AMBarak Amar
Richard Gilmore
07/09/2021, 10:39 AM2021-07-09 10:19:33 [WRAPPER-ReplId-6237a-3a591-a91d7-7] TRACE LakeFSFileSystem[OPERATION]: open(<lakefs://testing/master/deltatest/_delta_log/_last_checkpoint>)
2021-07-09 10:19:33 [WRAPPER-ReplId-6237a-3a591-a91d7-7] TRACE LakeFSFileSystem[OPERATION]: exists(<lakefs://testing/master/deltatest/_delta_log>)
2021-07-09 10:19:33 [rpc-server-4-2] INFO InitialSnapshot: [tableId=eb8b6bf5-ed4f-4d3f-ba09-b99c769cae20] Created snapshot InitialSnapshot(path=<lakefs://testing/master/deltatest/_delta_log>, version=-1, metadata=Metadata(7f335ec6-b67a-4445-a135-261caaf54ee7,null,null,Format(parquet,Map()),null,List(),Map(),Some(1625825973686)), logSegment=LogSegment(<lakefs://testing/master/deltatest/_delta_log,-1,List(),List(),None,-1>), checksumOpt=None)
Richard Gilmore
07/09/2021, 10:39 AMTal Sofer
07/09/2021, 11:33 AMRichard Gilmore
07/09/2021, 11:44 AMTal Sofer
07/09/2021, 11:47 AMRichard Gilmore
07/09/2021, 12:22 PMTal Sofer
07/09/2021, 12:26 PMRichard Gilmore
07/09/2021, 12:27 PMTal Sofer
07/09/2021, 12:30 PMRichard Gilmore
07/09/2021, 12:56 PMTal Sofer
07/09/2021, 1:00 PM