Hey <@U01NE2PJU3D>, lakeFS exposes two APIs: 1. Th...
# help
y
Hey @mschuch, lakeFS exposes two APIs: 1. The lakeFS API, which is used by the
lakectl
command-line tool and our web interface. This API allows you to interact with lakeFS in many ways, e.g. with git-like operations (branching, diffing, merging etc.) 2. An S3 compatible API, which is the one you should use for Presto and any other query engine. I'm assuming the YAML file you are referring to is the configuration for the
lakectl
command. This one should point to the lakeFS API (1). Our convention for this domain is
<http://lakefs.example.com:8000|lakefs.example.com:8000>
(note that the s3 was dropped), and the path should be
/api/v1
. The presto configuration should point to (2). From a DNS standpoint, you should point both domains to the lakeFS server. The way lakeFS knows how to differentiate between them, is using the
gateways.s3.domain_name
configuration, which should be the domain for the S3-compatible API (2). The convention for this domain is
<http://s3.lakefs.example.com|s3.lakefs.example.com>
. You can test the S3-compatible API using the AWS CLI:
Copy code
aws s3 --endpoint-url <http://s3.lakefs.example.com:8000> ls
m
hi, @Yoni Augarten just testet with Trino/Presto but cannot acces the data i always get:
Query 20210220_130947_00027_xhsun failed: Unable to execute HTTP request: <http://testbucket.sl8-2014.xxxx.de|testbucket.sl8-2014.xxxx.de>:
Following Config i have for the catalog:
Copy code
connector.name=hive-hadoop2
hive.metastore.uri=<thrift://10.10.146.32:9083>

hive.s3.aws-access-key=xxx
hive.s3.aws-secret-key=xxxx
hive.s3.endpoint=<http://sl8-2014.xxxx:8000>
Here my hive metastore conf:
Copy code
<configuration>
    <property>
        <name>metastore.thrift.uris</name>
        <value><thrift://10.10.146.32:9083</value>>
        <description>Thrift URI for the remote metastore. Used by metastore client to connect to remote metastore.</description>
    </property>
    <property>
        <name>metastore.task.threads.always</name>
        <value>org.apache.hadoop.hive.metastore.events.EventCleanerTask,org.apache.hadoop.hive.metastore.MaterializationsCacheCleanerTask</value>
    </property>
    <property>
        <name>metastore.expression.proxy</name>
        <value>org.apache.hadoop.hive.metastore.DefaultPartitionExpressionProxy</value>
    </property>
    <property>
        <name>metastore.warehouse.dir</name>
        <value><s3a://spark/warehouse/</value>>
    </property>
    <property>
        <name>javax.jdo.option.ConnectionDriverName</name>
        <value>com.mysql.cj.jdbc.Driver</value>
    </property>

    <property>
        <name>javax.jdo.option.ConnectionURL</name>
        <value>jdbc:<mysql://mariadb:3306/metastore_db</value>>
    </property>

    <property>
        <name>javax.jdo.option.ConnectionUserName</name>
        <value>admin</value>
    </property>

    <property>
        <name>javax.jdo.option.ConnectionPassword</name>
        <value>admin</value>
    </property>

    <property>
        <name>fs.s3a.access.key</name>
        <value>xxxx</value>
    </property>
    <property>
        <name>fs.s3a.secret.key</name>
        <value>xxxx</value>
    </property>
    <property>
        <name>fs.s3a.endpoint</name>
        <value><http://sl8-2014.xxxx:8000></value>
    </property>
    <property>
        <name>fs.s3a.path.style.access</name>
        <value>true</value>
    </property>
</configuration>
i can create a schema with:
Copy code
CREATE SCHEMA hive.master
WITH (location = '<s3a://testbucket/master/test.json>')
And a table with:
Copy code
CREATE TABLE master.test (a varchar, b varchar, c varchar) 
WITH (
  format = 'json',
  external_location = '<s3a://testbucket/master/test.json/>'
);
But after selecting the table i get the above error, what i am doing wrong?
y
Hey @mschuch, can you please share lakeFS configuration?
m
hi sure i start it with docker compose and the following settings:
Copy code
version: '3'
services:
  lakefs:
    image: "treeverse/lakefs:${VERSION:-latest}"
    ports:
      - "8000:8000"
    depends_on:
      - "postgres"
    environment:
      - LAKEFS_AUTH_ENCRYPT_SECRET_KEY=10a718b3f285d89c36e9864494cdd1507f3bc85b342df24736ea81f9a1134bcc
      - LAKEFS_DATABASE_CONNECTION_STRING=<postgres://lakefs:lakefs@postgres/postgres?sslmode=disable>
      - LAKEFS_BLOCKSTORE_TYPE=s3
      - LAKEFS_BLOCKSTORE_S3_REGION=us-east-1
      - LAKEFS_BLOCKSTORE_S3_FORCE_PATH_STYLE=true
      - LAKEFS_BLOCKSTORE_S3_ENDPOINT=<http://10.10.146.217:9000>
      - LAKEFS_BLOCKSTORE_S3_CREDENTIALS_ACCESS_KEY_ID=minio
      - LAKEFS_BLOCKSTORE_S3_CREDENTIALS_ACCESS_SECRET_KEY=minio123
      - LAKEFS_GATEWAYS_S3_DOMAIN_NAME=sl8-2014.xxxxx:8000
      - LAKEFS_GATEWAYS_S3_REGION=us-east-1
      - LAKEFS_LOGGING_LEVEL=${LAKEFS_LOGGING_LEVEL:-INFO}
      - LAKEFS_STATS_ENABLED=false
      - LAKEFS_COMMITTED_LOCAL_CACHE_DIR=${LAKEFS_COMMITTED_LOCAL_CACHE_DIR:-/home/lakefs/.local_tier}
    entrypoint: ["/app/wait-for", "postgres:5432", "--", "/app/lakefs", "run"]
  postgres:
    image: "postgres:${PG_VERSION:-11}"
    command: "-c log_min_messages=FATAL"
    environment:
      POSTGRES_USER: lakefs
      POSTGRES_PASSWORD: lakefs
    logging:
      driver: none
y
Let's take this in DM