Hi! Have a few questions that have come up while e...
# help
u
Hi! Have a few questions that have come up while evaluating lakeFS. Thanks ahead of time! 1. Is there a benefit to any particular AWS database technology over another? For something like the DS/Research env reference architecture, would RDS be ok? 2. Could someone expand on the use case for the S3 fallback url/S3 proxy?
gateways.s3.fallback_url
(string)
- If specified, requests with a non-existing repository will be forwarded to this URL. This can be useful for using lakeFS side-by-side with S3, with the URL pointing at an S3Proxy instance.
1. Is it correct that garbage collection requires a Spark deployment if i'm not using lakeFS Cloud? Are there any other features that require Spark?
u
Hi @Christopher Burke I can help with the last question. GC requires running a spark job for lakeFS OSS. I can’t think of any other features that REQUIRE Spark. However - https://docs.lakefs.io/howto/export.html might be easier with Spark (not required though).
u
@Christopher Burke, regarding 1st question, DynamoDB (key-value store) will be better than RDS: https://docs.lakefs.io/deploy/aws.html
u
In what way is Dynamo better? We have an internal wrapper around AWS infra and at the moment do not expose Dynamo
u
In what way is Dynamo better? We have an internal wrapper around AWS infra and at the moment do not expose Dynamo
That’s a good question and it depends on the use-case, and although dynamoDB is a good option for some cases, it will not always be the case and it depends on your use case: There are two aspects I would take into consideration Pricing: the DynamoDB solution can be much cheaper in many cases, but If you already have an RDS instance running it might not be the case for you performance: It also depends on your use-case, you can look at the benchmark section it has some data on benchmarking done on both RDS and DynamoDB
u
According to question 2: lakeFS is s3 compatible and can be accessed as you would access S3 (e.g in aws cli configure your lakeFS to be the endpoint and all
aws s3
commands will be answered by lakeFS) Due to a request by users to use the S3 gateway for accessing both lakeFS repositories and S3 buckets we added the proxy option. Having that said, if you do want to access both lakeFS and S3 we have better solutions today for most cases, if that is the case, please expand on your use case and will help you out with a suitable solution
u
I was mostly just curious about what the s3 fallback provides - I didn't have a particular scenario in mind. Thanks to all for the answers!