Hi, I need to get Lakefs implement git-like opera...
# help
t
Hi, I need to get Lakefs implement git-like operations on our minio storage However, I'm getting an error after I setup everything (as described in https://lakefs.io/git-like-operations-over-minio-with-lakefs/). The error on the CLI is "request failed: [400 Bad Request] error creating repository: could not access storage namespace" when I try to add a new repository in Lakefs. The same error is also thrown on the GUI.
o
Hi @Tamilvanan A
is MinIO deployed locally as well?
if so, perhaps this thread could be useful
t
Hi Oz, Thank you for the prompt support. It's clear how to point LakeFS to internally hosted MinIO running in a docker container. After testing the initial setup, I need LakeFS to point to another public IP where MinIO is deployed from a docker image. How do I configure LakeFS to point to MinIO hosted on another Public IP?
o
by changing the
endpoint
setting to the proper url (i.e. http://myminioserver.example.com) please check there’s no firewall or security rule preventi you from communicating with the minio server though
t
Hi @Oz Katz But, I already tried adding the end point details in this configuration file line by running the below script before running the docker image:
Copy code
echo "LAKEFS_BLOCKSTORE_S3_ENDPOINT=<minio_endpoint>" >> $LAKEFS_CONFIG_FILE
But, I still end up with the same problem when I run
Copy code
curl <https://compose.lakefs.io> | docker-compose --env-file $LAKEFS_CONFIG_FILE -f - up -d
Tried this link: https://lakefs.io/git-like-operations-over-minio-with-lakefs/ Is there any issue with the documentation in this link? Are there any documentation on getting LakeFS up and running for a Remote instance of MinIO accessible via Public IP? Do I have to setup networking for docker using the examples in your previous link?
o
the example there should work. if you could share the log and the configuration file you used, I might be able to better understand the issue
t
Just a moment, I'm posting the setup details and logs as well
👍 1
Hi @Oz Katz I tried the below change in the config before docker-compose-up: LAKEFS_CONFIG_FILE=./.lakefs-env echo "AWS_ACCESS_KEY_ID=minioadmin" >> $LAKEFS_CONFIG_FILE echo "AWS_SECRET_ACCESS_KEY=minioadmin" >> $LAKEFS_CONFIG_FILE echo "LAKEFS_BLOCKSTORE_S3_ENDPOINT=host.docker.internal:9000" >> $LAKEFS_CONFIG_FILE echo "LAKEFS_BLOCKSTORE_TYPE=s3" >> $LAKEFS_CONFIG_FILE echo "LAKEFS_BLOCKSTORE_S3_FORCE_PATH_STYLE=true" >> $LAKEFS_CONFIG_FILE LakeFS still seems to be giving the same error in GUI and console: "error creating repository: could not access storage namespace" When I bashed into the docker container running treeverse/lakefs image, I'm seeing the following log(s) in lakefs time="2021-05-07T112136Z" level=error msg="error making request request" func="pkg/block/s3.(*Adapter).streamToS3" file="build/pkg/block/s3/adapter.go:202" error="Put \"https://host.docker.internal:9000/example-bucket/dummy\": dial tcp: lookup host.docker.internal on 127.0.0.1153 no such host" host="40.76.222.205:8000" method=POST operation=PutObject path=/api/v1/repositories request_id=5268103d-5c43-4c5c-88d1-98ff8c89cb70 service_name=rest_api url="https://host.docker.internal:9000/example-bucket/dummy" lakefs_1 | time="2021-05-07T112136Z" level=warning msg="Could not access storage namespace" func="pkg/api.(*Controller).CreateRepository" file="build/pkg/api/controller.go:1112" error="Put \"https://host.docker.internal:9000/example-bucket/dummy\": dial tcp: lookup host.docker.internal on 127.0.0.1153 no such host" service=api_gateway storage_namespace="s3://example-bucket/"
o
Thanks @Tamilvanan A! What OS and Docker version are you using?
t
Running on Cent OS 8 Docker version 20.10.6, build 370c289 docker-compose version 1.29.1, build c34c88b2
o
Thanks.. From the looks of it, it seems like you're running into this: https://stackoverflow.com/a/61424570
t
yes
o
this host works automatically in MacOS and Windows, but for some reason, not on Linux
let me see if I can find a quick workaround for that
t
Hi @Oz Katz Have you found out any solution? Should I try the below option when trying docker-compose up?
Copy code
--add-host=host.docker.internal:host-gateway
o
i’m not sure this will work with docker-compose. Im looking into adding an option to run minio as part of the docker compose stack: this way you’ll simply do
docker-compose up
and it’ll automatically setup lakefs and minio including the communication between the two.
If you’d like, it love to have you beta test it 🙂
t
But, for production deployment, MinIO instance is running in a separate VM. I need LakeFS to connect to the other VM over Public IP / Private VPN IP
o
Yes, but the problem here occurs when lakeFS is running on the same host as lakeFS. Docker networking is a strange beast.. The current docker-compose should work just fine for remote minIO deployments
t
But, the current setup doesn't work for both MinIO on localhost and External VM. I'm unable to deploy anything.
o
Ah, did not realize that. for the remote endpoint, can you connect to that host:port from the machine running lakeFS? using curl or mc?
1
Do keep in mind that if you’re looking to do a production deployment, docker compose is not the best option, for that you should probably look at https://docs.lakefs.io/deploying-aws/
t
Hi @Oz Katz Can you help me out with the docker deployment of LakeFS?? I really need help with deploying LakeFS from the docker image. Just to demonstrate working of LakeFS, I'm working on setting up a deployment as mentioned in your production deployment for AWS. But, my management demanding that I deploy the solution using the docker at the earliest.
b
Hi @Tamilvanan A can help you with using lakeFS using docker. can we schedule a Zoom call or any other way for me to better understand your issues?
t
Sure. I'm ready any time. Are we going to have a call now or are we going to schedule it for later?
b
2 min
t
ok. I'm ready
Hi @Barak Amar I have applied DNS name to the LakeFS instance. I'm checking how to use presto with LakeFS. Trino / Presto is also working on the same VM instance. Do we have to install Hive metastore from the below link? https://github.com/treeverse/lakefs-metastore-clone
b
Working with Presto with lakeFS or S3 requires the same. Connecting a metastore as it specified in the Trino documentation - https://trino.io/docs/current/connector/hive.html.
The lakefs-metastore-clone is a tool for cloning metastore information between branches.
Here is a link to lakefs documentation describes how to start with Presto
t
I already got this link. But, for working on it, do I need to install a standalone version of Hive-metastore?
b
You will need Hive-metastore. How are you using Presto to query MinIO today?
t
I'm using MinIO, LakeFS and Trino on docker containers
b
before lakeFS - did you have any Presto setup / connected to your data?
t
Trino is working out of a docker container. But, it is not connected to LakeFS or MinIO
I can login to Trino Container and install any packages if required
Should I install Hive Metastore inside Trino container?
b
No
but you need metastore
Trinoi/Presto requires Hive metastore in order to query S3
t
Shall I install Hive metastore as Docker container as well?
b
It is possible - but this part depends on your requirements - not related to lakeFS.
t
I prefer to have Hive as a docker container if possible. That is my end customer requirement
b
It describes how to run Presto on S3 using Hive. by changing the endpoint and the credentials to lakeFS you will be able to query lakeFS.
t
I have seen this link too. So, you are suggesting to use LakeFS details for endpoint?
ok. Let me try and get back to you
👍 1
b
all I'm saying that if you setup your Presto/Trino to work with any S3 source - pointing the metastore and presto to lakeFS endpoint/credentials should work.
t
ok
Hi @Barak Amar I need a few clarifications in Trino and LakeFS Integration. Is it possible to have a quick meeting?
b
I can get on zoom in an hour. but you can ask, usually it is preferred as others can help too
t
Sure. I went through the link which says "The Quick Guide for Running Presto Locally on S3" According to the instructions given there, I have cloned the repository and got everything working properly. I'm having trouble with the following query shown below:
Copy code
CREATE EXTERNAL TABLE amazon_reviews_parquet(
marketplace string,
customer_id string,
review_id string,
product_id string,
product_parent string,
product_title string,
star_rating int,
helpful_votes int,
total_votes int,
vine string,
verified_purchase string,
review_headline string,
review_body string,
review_date int,
year int)
PARTITIONED BY (product_category string)
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
LOCATION
'<s3a://amazon-reviews-pds/parquet/>';
What location should I use? The error is related to the location. I just want to point to my lakefs instance
b
This is great - notice that for each configuration file in the post there are sections in comment. example:
etc/catalog/s3.properties
Copy code
#hive.s3.endpoint=http:[WANTED_ENDPOINT]
These settings should point to your lakeFS
Update the credentials key/secret to lakeFS
The s3a endpoint will be
s3a://<repository name>/<branch>/path_to_your_table
We assume that the repository and branch exists in lakeFS
There are two dns records one for s3... and *.s3...
The s3... record should be set as lakeFS's gateway s3 domain name
There is a environment variable in the docker compose / docker we pass to the container.
The above will cause lakeFS to process the requests to this record as S3 requests
t
The yaml file which I used with docker-compose up for LakeFS is as follows: version: '3' services: lakefs: image: "treeverse/lakefs:${VERSION:-latest}" ports: - "8000:8000" depends_on: - "postgres" environment: - LAKEFS_AUTH_ENCRYPT_SECRET_KEY=${LAKEFS_AUTH_ENCRYPT_SECRET_KEY:-some random secret string} - LAKEFS_DATABASE_CONNECTION_STRING=${LAKEFS_DATABASE_CONNECTION_STRING:-postgres://lakefs:lakefs@postgres/postgres?sslmode=disable} - LAKEFS_BLOCKSTORE_TYPE=${LAKEFS_BLOCKSTORE_TYPE:-local} - LAKEFS_BLOCKSTORE_LOCAL_PATH=${LAKEFS_BLOCKSTORE_LOCAL_PATH:-/home/lakefs} - LAKEFS_GATEWAYS_S3_DOMAIN_NAME=${LAKEFS_GATEWAYS_S3_DOMAIN_NAME:-dinakar-etl.eastus.cloudapp.azure.com:8000} - LAKEFS_BLOCKSTORE_S3_CREDENTIALS_ACCESS_KEY_ID=${AWS_ACCESS_KEY_ID:-} - LAKEFS_BLOCKSTORE_S3_CREDENTIALS_ACCESS_SECRET_KEY=${AWS_SECRET_ACCESS_KEY:-} - LAKEFS_LOGGING_LEVEL=${LAKEFS_LOGGING_LEVEL:-INFO} - LAKEFS_STATS_ENABLED - LAKEFS_BLOCKSTORE_S3_ENDPOINT - LAKEFS_BLOCKSTORE_S3_FORCE_PATH_STYLE - LAKEFS_CATALOGER_TYPE - LAKEFS_COMMITTED_LOCAL_CACHE_DIR=${LAKEFS_COMMITTED_LOCAL_CACHE_DIR:-/home/lakefs/.local_tier} entrypoint: ["/app/wait-for", "postgres:5432", "--", "/app/lakefs", "run"] postgres: image: "postgres:${PG_VERSION:-11}" command: "-c log_min_messages=FATAL" environment: POSTGRES_USER: lakefs POSTGRES_PASSWORD: lakefs logging: driver: none
I did configure the LAKEFS GATEWAY S3 DOMAIN NAME as dinakar-etl.eastus.cloudapp.azure.com:8000
g
Hey @Tamilvanan A What are you experiencing now? When trying to create a table in presto, do you get any error? Do you see any logs in lakeFS?
t
Last time I noticed that it caused a deadlock with the CREATE TABLE query. It hanged and was not responding
g
What path did you use in the create table query?
Copy code
LOCATION
'???????';
Do you see any logs in lakeFS?
t
I tried location as s3a://dinakar-etl.eastus.cloudapp.azure.com:8000/dinakar-test-bucket/test I'm sure I did a minor mistake with the END POINT details. For a clean setup, can you help me out over a zoom call? It shouldn't take more than 10 or 15 minutes. Posting the logs and config file contents exposes credentials as well. I also configured the DNS name for LakeFS as @Barak Amar mentioned in our last meeting.
b
1 min
t
ok
b
sent the link on private
@Tamilvanan A
t
Hi @Barak Amar Here's the log you wanted to check out.