I am setting up lakefs with docker compose. This l...
# help
u
I am setting up lakefs with docker compose. This linux machine has mount to one nfs storage and map it to /data. Then I start the lakefs and got error "level=fatal msg="failed to create catalog" func=cmd/lakefs/cmd.glob..func8 file="cmd/run.go:131" error="build block adapter: got error opening a local block adapter with path /home/lakefs: path provided is not writable"
u
version: "3" services: lakefs: image: treeverse/lakefs:latest container_name: lakefs ports: - "8000:8000" depends_on: - postgres environment: - LAKEFS_BLOCKSTORE_TYPE=local - LAKEFS_AUTH_ENCRYPT_SECRET_KEY=some random secret string - LAKEFS_DATABASE_CONNECTION_STRING=postgres://lakefs:lakefs@postgres/postgres?sslmode=disable - LAKEFS_BLOCKSTORE_LOCAL_PATH=/home/lakefs - LAKEFS_GATEWAYS_S3_DOMAIN_NAME - LAKEFS_STATS_ENABLED - LAKEFS_LOGGING_LEVEL #- LAKEFS_COMMITTED_LOCAL_CACHE_DIR=/home/lakefs/.local_tier - LAKEFS_COMMITTED_LOCAL_CACHE_SIZE_BYTES - LAKEFS_COMMITTED_SSTABLE_MEMORY_CACHE_SIZE_BYTES - LAKEFS_COMMITTED_LOCAL_CACHE_RANGE_PROPORTION - LAKEFS_COMMITTED_LOCAL_CACHE_RANGE_OPEN_READERS - LAKEFS_COMMITTED_LOCAL_CACHE_RANGE_NUM_SHARDS - LAKEFS_COMMITTED_LOCAL_CACHE_METARANGE_PROPORTION - LAKEFS_COMMITTED_LOCAL_CACHE_METARANGE_OPEN_READERS - LAKEFS_COMMITTED_LOCAL_CACHE_METARANGE_NUM_SHARDS - LAKEFS_COMMITTED_PERMANENT_MIN_RANGE_SIZE_BYTES - LAKEFS_COMMITTED_PERMANENT_MAX_RANGE_SIZE_BYTES - LAKEFS_COMMITTED_PERMANENT_RANGE_RAGGEDNESS_ENTRIES - LAKEFS_COMMITTED_BLOCK_STORAGE_PREFIX volumes: - ./lakefs:/home/lakefs entrypoint: ["/app/wait-for", "postgres:5432", "--", "/app/lakefs", "run"] postgres: image: postgres:11 container_name: postgres environment: POSTGRES_USER: lakefs POSTGRES_PASSWORD: lakefs volumes: - ./db:/var/lib/postgresql/data networks: default: name: lakefs
u
The working directory is /data and I map the /home/lakefs to the /data/lakefs on my host machine.
u
can someone give me a clue what is wrong to configure the mapping from the local path in docker image to the path on my host machine?
u
Hi @donald 👋 Let me have a look
u
@donald can you please check the
/home/lakefs
directory permissions (from within the container)? from the error message you shared, it looks like it's not writable
u
the application uses the
lakefs
user, so it should have permissions to write to that directory
u
the docker is dead after that error. I cannot login to that docker instance to check the permission
u
only the postgres docker instance is running
u
I see, mind sharing the docker command you run?
u
sudo docker-compose up
u
can you share the permissions set on the /data/lakefs directory?
u
(on your host)
u
the owner is root for /data/lakefs
u
let me change the permission to these directories on my host machine
u
can you please run
ls -ld /data/lakefs
?
u
lakefs uses its own user (id 101 on the docker-compose)
u
so you can either change the owner/group to use this id with
chown
or add permissions to others with
chmod
u
did you manage to get it working?
u
After I changed the permission, I can start lakefs now. But I have another issue. I cannot see the data after I loaded one file via web
u
And I can see one error from the console, "ERROR: relation "auth_installation_metadata" does not exist at character 23"
u
just making sure, you've gone through the setup, right?
u
Yes, I did the setup step. Then I created one repository and then load one file by clicking the "load object" button
u
and you manage to see the object using the ui/cli, right?
u
I just found the owner of /data/lakefs/mystorage is _apt/systemd-timesync. And I can see the file only when I run the "sudo ls /data/lakefs/mystorage"
u
it seems like the permissions are still inconsistent in your directory
u
can you please change the owner of lakefs directory recursively? using chown -R
u
yes, it looks like it is a issue relate to permission
u
let me delete /data/lakefs first
u
what is the ower I should set for /data/lakefs on my host machine?
u
lakefs user doesn't exist on your machine, and due to the fact you mount the directory from your host, it uses the same user IDs, lakefs uses ID 101 within the container
u
so
chown -R 101:101 /data/lakefs
should do the trick
u
I failed to run the above command
u
It said the operation not permitted
u
try adding
sudo
before the command please double check the path so you won't accidentally change other directories permissions (reminder: -R is recursive)
u
you can verify that by entering the container and listing the
/home/lakefs
directory and see if
lakefs
is now the owner
u
I can change the owner /data/lakefs to systemd-timesync/systemd-timesync, but the owner of the repository I created via web under /data/lakefs is _apt/ systemd-timesync
u
Copy code
drwxrwxr-x  4 systemd-timesync systemd-timesync 4096 May  5 15:47 lakefs
u
Copy code
drwxrwxr-x 4 systemd-timesync systemd-timesync 4096 May  5 15:47 .
drwxrwxr-x 4 yang             yang             4096 May  5 15:46 ..
drwxr-xr-x 3 _apt             systemd-timesync 4096 May  5 15:46 data
drwxr-x--- 2 _apt             systemd-timesync 4096 May  5 15:47 local-repository
u
systemd-timesync
might be 101 on your local machine, you can verify that using
id systemd-timesync
u
yes, the id for systemd-timesync is 101
u
so that's OK
u
now, is there any data in the data directory? are you able to upload objects and see that files were changed? (don't expect to see the same files, that's part of the lakefs under the hood magic)
u
But I still have the same issue that I cannot view the data I loaded from web
u
if you run
ls -l /data/lakefs
directory, what do you see?
u
Copy code
drwxrwxr-x 4 systemd-timesync systemd-timesync 4096 May  5 15:47 .
drwxrwxr-x 4 yang             yang             4096 May  5 15:46 ..
drwxr-xr-x 3 _apt             systemd-timesync 4096 May  5 15:46 data
drwxr-x--- 2 _apt             systemd-timesync 4096 May  5 15:47 local-repository
u
please note that the files within are still using bad permissions, (i.e local-repository owner is _apt)
u
have you managed to run
sudo chown -R 101:101 /data/lakefs
?
u
It works now after I recreated the repository and then reload one file
u
yeah, creating a new repository would work too as it inherits the parent directory owner & group (which are now fine)
u
happy to hear you managed to solve this. anything else I can help with?
u
I have deleted the /data/db, which is the mapping directory on my host machine for postgres database. why can I still see the repository I created before after I delete the /data/db and /data/lakefs and restart docker-compose?
u
postgres is used for sync actions on your repositories and give you strong guarantees of commits, the metadata is still preserved on the storage
u
I see. Thanks you very much for your help
u
if you'd like to start blank, I suggest to stop & rm the docker-compose, clean the directories and start again
u
sure 🙂 we're happy to have you here! let me know if you have any further questions
u
hey @donald, I wonder if you managed to get everything working
u
When I tried it on the real production system, I first got the following error from console
u
Copy code
lakefs   | panic: error while connecting to DB: failed to connect to `host=postgres user=lakefs database=postgres`: server error(FATAL: the database system is starting up(SQLSTATE 57p03))
u
it looks like Postgres reports it's not available yet
u
when you say real production system - what do you mean? did you change it to work with a different database now that you're not using docker-compose?
u
Then I got the following output
u
Copy code
postgres | 2022-05-09 LOG: database system is ready to accept connetions
u
sounds like lakeFS started before the database was ready. Are you still seeing errors in lakeFS log?
u
it seems that lakefs does not wait for enough time before postgres start up
u
lakefs exited with code 2
u
can you try to start it again now when postgres is alive?
u
It seems that I can startup lakefs when I run "docker-compose up" again when postgres is live
u
But I can run "docker-compose down" to stop all the services and then run "docker-compose up", then I got the same error message
u
yeah, I understand, let me have a look if we have a wait-for implementation there
u
is there a way to control how long lakefs wait for postgres startup?
u
I saw there is a command in lakefs : "/app/wait-for", "postgres:5432", "--"
u
exactly... it waits for postgres container to be listening on port 5432 (postgres port)
u
it looks like postgres was listening but wasn't available to accept any connections
u
were you running some maintenance operations on that postgres instance? were the logs indicate of a similar operation taking place?
u
No, postgres is also docker image
u
got it. what did you mean then when you said you tried it on production system?
u
I did try this on DEV machines last week and it succeeded with your help. Now I tried to deploy it to produce system for trial so that wide group of developer can try it.
u
When I evaluated it in the beginning, I installed postgres manually and run lakefs standalone. Now I tried to use docker-compose to manage it as maybe it is easy to maintain
u
What is the best practices to run lakefs with NAS as storage? docker-compose? standalone?
u
If I run "docker-compose up" twice, first start postgres and lakefs failed due to no enough time, second start lakefs. It is ok and I can login to lakefs WebUI to create repository.
u
for production deployment, I wouldn't use either NAS or docker-compose... NAS/local is not recommended for production
u
you can use any of the main cloud vendors object store (AWS, GCP, Azure) to store you objects, and you can provision a managed postgres database. we also have our helm chart where you can deploy lakeFS on top of Kubernetes.
u
for now we cannot use public cloud. That is why we have to use NAS
u
and we are not ready to use kubernetes either
u
you can use other s3 api-compatible solutions for the object store, such as MinIO: https://docs.lakefs.io/integrations/minio.html
u
NAS is highly not recommended and unsupported, please take that into account
u
and you can run lakeFS in a standalone docker/binary if you like, but you might face scalability issues if there's no orchestrator in place
u
I have evaluated MINIO, but the license is GNU. We cannot use the product with GNU license
u
Does your NAS solution supports S3 protocol?
u
No, NAS is standard one, it doesnot support s3
u
Hi @donald, The local block adaptor is not suitable for production for various reasons. Specifically: • lakeFS will create huge directories, and filesystems (as opposed to object stores) do not handle these well. • Direct object access (as used by the LakeFSFileSystem for Hadoop and Spark) will be unusable. So you will need to scale your lakeFS servers proportionally to data rather than proportionally to metadata. • Graveler -- the lakeFS component that handles committed metadata --is designed to use big (~10MiB) files for storing metadata. That gives excellent performance on object stores, but not on a NAS. Because it is not used in production, we do not perform large-scale integration testing on the "local" block adaptor. So my gut feeling is NOT to use this adaptor for anything except tests. There are various systems that simulate object stores on top of file systems -- MinIO comes to mind and is documented and tested, but I read that Ceph might also work. The focus of these systems is on achieving reasonable object store performance, so this added layer of abstraction may help. That said: we would be very happy better to understand your setup and requirements. Do DM me if you are willing to meet!
u
Currently we cannot use public cloud and the infrastructure is not ready yet for kubernetes. And we evaluated some other data version control system, such as Giblab with LFS. And our long term goal will use to kubernetes, that is the main reason we will try to use lakefs as trial for some projects.
u
I have login to lakefs docker image and looked at "wait-for" script. And the timeout is set to 15. But it is read only. What is the default password for root?
u
Git with LFS shares many features with lakeFS, but the nonfunctional characteristics are very different: they have very different performance in many use-cases. Which to use will depend on your typical usage patterns. Because an underlying object stores scale so much better than a NAS, lakeFS will probably be much easier to scale up. @Oz Katz we should probably have an explanation for these differences somewhere, but I could not find it.
u
I just checked ceph in our system. It has been denied as well due to its license.
u
@donald regarding your question on root user, when you "exec" into the container, just pass the --user root flag
u
@Or Tzabary I just renamed the service names, for example, "lakefs" -> "lakefs2", "postgres"->"postgres2" and deleted all postgres data on my host machine. Then I run "docker-compose run" and everything is ok. I can create the repository, load file via web. However I got the same error again after I run "docker-compose down" first to stop all the service and then run "docker-compose up".
u
Also I set the timeout from 15 to 90
u
Looked at the stacktrace and github.com/treeverse/lakefs/pkg/db.BuildDatabaseConnection(...) in /build/pkg/db/connect.go:26 caused the issue.
u
lakefs didn't wait for db connection at all
u
I'm looking into this
u
I have added healthcheck to posgres service with "pg_isready -U lakefs" and then added one dependency condition "service_healthy" to lakefs service. I tried to start/stop it several times and it all works fine.
u
@Or Tzabary just one more question. I saw there is one warning message, "using the local block adapter. This is suitable only for test, but not for production". what does this imply? just performance?
u
amazing! happy to hear that! would you like to contribute this to the open source? we'll really appreciate it
u
not only, scalability takes place also... for production use it's highly recommended NOT to use local storage
u
another reason is compatibility: features like import, staging external objects, garbage collection and the native Hadoop Filesystem - are not supported and aren't likely to be supported when using the local adapter (note: we should really document this!!). I would strongly recommend using an object store interface with lakeFS. Here are a couple of options to think about: 1. Using MinIO's S3 NAS gateway with a version of MinIO prior to 05-2021: At the time it was still Apache license, so using this version doesn't require accepting AGPL 2. Using a commercial offering that has S3 protocol support. I believe Weka and Pure Storage both support this, I would assume many other vendors do too 3. I assume this is less relevant, but using any of the cloud provider's native object stores. Nowadays this is possible even in a private datacenter using something like AWS Outposts
u
@Oz Katz @Or Tzabary Many thanks for your help