Yoni Augarten
02/17/2021, 3:29 PMlakectl
command-line tool and our web interface. This API allows you to interact with lakeFS in many ways, e.g. with git-like operations (branching, diffing, merging etc.)
2. An S3 compatible API, which is the one you should use for Presto and any other query engine.
I'm assuming the YAML file you are referring to is the configuration for the lakectl
command. This one should point to the lakeFS API (1). Our convention for this domain is <http://lakefs.example.com:8000|lakefs.example.com:8000>
(note that the s3 was dropped), and the path should be /api/v1
.
The presto configuration should point to (2).
From a DNS standpoint, you should point both domains to the lakeFS server. The way lakeFS knows how to differentiate between them, is using the gateways.s3.domain_name
configuration, which should be the domain for the S3-compatible API (2). The convention for this domain is <http://s3.lakefs.example.com|s3.lakefs.example.com>
.
You can test the S3-compatible API using the AWS CLI:
aws s3 --endpoint-url <http://s3.lakefs.example.com:8000> ls
mschuch
02/20/2021, 1:15 PMQuery 20210220_130947_00027_xhsun failed: Unable to execute HTTP request: <http://testbucket.sl8-2014.xxxx.de|testbucket.sl8-2014.xxxx.de>:
Following Config i have for the catalog:
connector.name=hive-hadoop2
hive.metastore.uri=<thrift://10.10.146.32:9083>
hive.s3.aws-access-key=xxx
hive.s3.aws-secret-key=xxxx
hive.s3.endpoint=<http://sl8-2014.xxxx:8000>
Here my hive metastore conf:
<configuration>
<property>
<name>metastore.thrift.uris</name>
<value><thrift://10.10.146.32:9083</value>>
<description>Thrift URI for the remote metastore. Used by metastore client to connect to remote metastore.</description>
</property>
<property>
<name>metastore.task.threads.always</name>
<value>org.apache.hadoop.hive.metastore.events.EventCleanerTask,org.apache.hadoop.hive.metastore.MaterializationsCacheCleanerTask</value>
</property>
<property>
<name>metastore.expression.proxy</name>
<value>org.apache.hadoop.hive.metastore.DefaultPartitionExpressionProxy</value>
</property>
<property>
<name>metastore.warehouse.dir</name>
<value><s3a://spark/warehouse/</value>>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.cj.jdbc.Driver</value>
</property>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:<mysql://mariadb:3306/metastore_db</value>>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>admin</value>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>admin</value>
</property>
<property>
<name>fs.s3a.access.key</name>
<value>xxxx</value>
</property>
<property>
<name>fs.s3a.secret.key</name>
<value>xxxx</value>
</property>
<property>
<name>fs.s3a.endpoint</name>
<value><http://sl8-2014.xxxx:8000></value>
</property>
<property>
<name>fs.s3a.path.style.access</name>
<value>true</value>
</property>
</configuration>
i can create a schema with:
CREATE SCHEMA hive.master
WITH (location = '<s3a://testbucket/master/test.json>')
And a table with:
CREATE TABLE master.test (a varchar, b varchar, c varchar)
WITH (
format = 'json',
external_location = '<s3a://testbucket/master/test.json/>'
);
But after selecting the table i get the above error, what i am doing wrong?Yoni Augarten
02/20/2021, 1:23 PMmschuch
02/20/2021, 1:25 PMversion: '3'
services:
lakefs:
image: "treeverse/lakefs:${VERSION:-latest}"
ports:
- "8000:8000"
depends_on:
- "postgres"
environment:
- LAKEFS_AUTH_ENCRYPT_SECRET_KEY=10a718b3f285d89c36e9864494cdd1507f3bc85b342df24736ea81f9a1134bcc
- LAKEFS_DATABASE_CONNECTION_STRING=<postgres://lakefs:lakefs@postgres/postgres?sslmode=disable>
- LAKEFS_BLOCKSTORE_TYPE=s3
- LAKEFS_BLOCKSTORE_S3_REGION=us-east-1
- LAKEFS_BLOCKSTORE_S3_FORCE_PATH_STYLE=true
- LAKEFS_BLOCKSTORE_S3_ENDPOINT=<http://10.10.146.217:9000>
- LAKEFS_BLOCKSTORE_S3_CREDENTIALS_ACCESS_KEY_ID=minio
- LAKEFS_BLOCKSTORE_S3_CREDENTIALS_ACCESS_SECRET_KEY=minio123
- LAKEFS_GATEWAYS_S3_DOMAIN_NAME=sl8-2014.xxxxx:8000
- LAKEFS_GATEWAYS_S3_REGION=us-east-1
- LAKEFS_LOGGING_LEVEL=${LAKEFS_LOGGING_LEVEL:-INFO}
- LAKEFS_STATS_ENABLED=false
- LAKEFS_COMMITTED_LOCAL_CACHE_DIR=${LAKEFS_COMMITTED_LOCAL_CACHE_DIR:-/home/lakefs/.local_tier}
entrypoint: ["/app/wait-for", "postgres:5432", "--", "/app/lakefs", "run"]
postgres:
image: "postgres:${PG_VERSION:-11}"
command: "-c log_min_messages=FATAL"
environment:
POSTGRES_USER: lakefs
POSTGRES_PASSWORD: lakefs
logging:
driver: none
Yoni Augarten
02/20/2021, 1:28 PM