Hello! I trying to use LakeFS as my Iceberg rest c...
# help
p
Hello! I trying to use LakeFS as my Iceberg rest catalog. Looking at the documentation it seems that LakeFS provides and implementation of the Iceberg catalog. I am currently stuck trying to use the catalog from pyiceberg. Any suggestions or ideas? This is my pyiceberg.yaml file:
Copy code
catalog:
    default:
        uri: <http://lakefs:8000/my_repo>
        s3.endpoint: <http://minio:9000>
        s3.access-key-id: admin
        s3.secret-access-key: password
j
Hi @Paco Ibañez The current Iceberg Catalog implementation is JVM based so it won’t work with PyIceberg. Did you try running with Spark instead?
n
@Paco Ibañez PyIceberg currently supports a select number of catalog implementations:
Copy code
PyIceberg currently has native support for REST, SQL, Hive, Glue and DynamoDB.
You can work with lakeFS catalog using pyspark though
p
Hello! thank you very much for responding! I have not tried with Spark yet. I am using pyiceberg because starting in version 0.6.0 it supports writes, which for small use cases is really convenient (no Spark needed). The current setup I have working uses Tabulario's rest catalog implementation which also is JVM based and I can use from pyiceberg (the python client just makes api calls to the catalog service). I thought that LakeFS was also implementing the REST catalog which is in theory supported by pyiceberg. Am I mistaken? Is there a way to look at LakeFS api spec? Are these endpoints available in LakeFS?
j
Hi @Paco Ibañez lakeFS’s Iceberg catalog isn’t a REST but rather a wrapper around Hadoop catalog. Although Tabular’s catalog is JVM based, it doesn’t run as part of the executable (well, it’s a REST server), so it doesn’t really matter… Would you mind sharing some context on your usage and scenario for using Iceberg?
p
Ohh I see. I'm using Iceberg to store time series data for a POC. So far, using Tabular's catalog implementation, I am able to ingest data with Spark and also with Prefect using pyiceberg (without requiring a Spark cluster). I am currently exploring if it is possible to replace Tabular's catalog with LakeFS or Nessie and still do ingestion from Spark and Prefect.
1