Hey there! I’m trying to play with nextflow and La...
# help
s
Hey there! I’m trying to play with nextflow and LakeFS.. locally they seem to be working nicely together, however, when I configure nextflow to work with AWS batch, the instances seem to have the wrong credentials. Any good reference for this setup?
a
I have no personal experience with Nextflow. For any tool that expects to use S3, you will need to configure: • Access key ID and secret key generated on the lakeFS account • The lakeFS URL in the S3 endpoint. After all, you probably never want your tool to access AWS S3: it typically accesses lakeFS using the S3 protocol. We had a bunch of blogs about this for various tools, but not Nextflow. Our AWS CLI doc really describes how to configure any Boto tool to work with lakeFS; you might want to start there?
s
Using that in the local setup (nextflow+LakeFS) works just fine… it’s that transition the AWS batch
a
Can you configure the endpoint that AWS Batch will use to access S3? Otherwise it will be much tougher to get it to talk to lakeFS. It will need to know the URL of your lakeFS server somehow.
s
I think the problem is somehow with the credentials rather than the endpoint
a
Yeah, that's entirely possible. The thing is that it can be really difficult to tell the two apart: if I get an access key from lakeFS and use it to sign a request to AWS, it just says "Unauthorized". Can you share any details of the configuration that you used? What errors you do see, of course, but also the AWS -side "client- configuration, and also where the lakeFS server lives, how its endpoint works, and what it logs?
x
One possible solution I see is can somehow lakeFS honor the aws credentials as well. e.g. if the aws credentials has the same permission to access lakeFS s3 storage, it could access lakeFS data too (by some configuration that user can enable). hmmm, sry, nvm. maybe that wound not work too, since endpoint still diffs.
we currently setup lakeFS on EKS with s3 and dynamodb as storage.
a
Yeah @Xubo Fei, this would be really useful. The issue is that AWS provides little way to determine that an access key is valid. And even after that -- I end up having to provide my AWS credentials to lakeFS. That is not only horrible security, but managing the same credentials on two different systems will be a really poor admin experience. We've been thinking about this, as you can see. Currently the best experience is to use lakeFS access keys for everything, and then maybe ask lakeFS for presigned URLs to access its S3 storage. It's a bit weird and doesn't work with everything. But with presigned URLs, today you can use lakeFS for all flows with just lakeFS access key credentials. Still might not help @Sivan Bercovici if we cannot configure the right S3 endpoint.