Hi, I'm using lakeFs filesystem (within Spark io.l...
# help
f
Hi, I'm using lakeFs filesystem (within Spark io.lakefshadoop lakefs assembly0.2.1 https://docs.lakefs.io/integrations/spark.html#lakefs-hadoop-filesystem) with presigned URLs and 2hours expiration time and sometimes I have error 500 from AWS (which from reading says its S3 throttling, I checked and for sure its not expired). java.io.IOException: Server returned HTTP response code: 500 for URL: https://xxxx.s3.eu-west-1.amazonaws.com/... Is this normal? Can we somehow configure the Spark LakeFS filesystem to retry those requests individually without relying on Spark task retry? So Spark doesn't fail the whole task in a "known" issue (its wasting task retries on that and sometimes it fails, also it takes around 20secs to happen).
a
Hi @Florentino Sainz, Sorry to hear you're running into issues. What version of lakeFS are you running on the server side? And can you share the configuration -- are you on K8s, what kind of access key or role are you using to authenticate to S3, etc.? This is a bit of a long shot, but we had an issue along with this one that may be relevant. IF you are running lakeFS with an assumed role for S3 on K8s, and IF you use almost only presigned URLs, then it might end up producing URLs that expire sooner than expected.
f
looking into that, just one additional info, Spark retries sometimes fix the issue btw (not sure if they reuse the same URL or they request a new one to LakeFS) If I try the URL in my web browser "a few minutes later" I do get expired token so it could be the reason, but its a 403, not a error 500, thats why I thought it was not related. We are using EKS (K8s) deployment of the open source version (enterprise version coming soon afaik), Using LakeFS 1.1.0 LakeFS uses a ServiceAccount which maps to an AWS role which has direct access to S3 (same account), no assume role or anything.
a
OK, then you're safely past that bug.
You can get a presigned URL from lakeFS by using the lakectl CLI - and if the expiration time that AWS encode in the URL is not actually correct, that command tells you when that URL will actually expire. Could you try running that for me?
Copy code
lakectl fs stat --pre-sign <lakefs://repo/branch/path/to/object>
and then if it says "Physical Address Expires" we can see. (But make sure NOT to share the "Physical Address", of course, that is literally a presigned URL to access your data!)
f
If I use one of the URls from the error it says: <X-Amz-Expires>7200</X-Amz-Expires> <-- this matches my 120m config <Expires>2023-11-08T101100Z</Expires>
(gonna do what you asked)
a
So not an expiry. Too bad, I was hoping I'd already fixed that bug
f
hmm
Physical Address Expires: 2023-11-08 125837 +0100 CET
i did what you said with fs stat --pre-sign
and I got only 15 minutes (?)
Copy code
blockstore:
                type: s3
                default_namespace_prefix: s3://{data_s3_buckets[0].bucket_name}/lakefs/
                s3:
                    region: eu-west-1
                    pre_signed_expiry: 120m
                    disable_pre_signed_ui: false
that's my config (in the ones I got from Spark it says 7200 though)
a
Yeah, your eks is probably giving you 15 minute tokens. It's documented somewhere on the AWS websites, I can look it up later.
f
will explore that route, no worries will check myself
will report back, thanks for the info though 🙂
a
You might consider trying to set:
Copy code
s3:
  pre_signed_expiry: 1h
  web_identity:
    session_duration: 1h
    session_expiry_window: 50m
to get longer session expiry. Not sure that you're in the same EKS mode as intended for those settings, but they should not harm anything.
f
yeah im usingn web_identity, thanks on that. I already extended my role expiration to 12h (max our security team allows). Will try to configure that and check
btw those options are not in https://docs.lakefs.io/reference/configuration.html 🙂, but will try anyways
a
I am not sure that you will be able to ask web_identity for >1h. If you're in the same account then it might work..
f
yeah im on the same, anyways 1h should be enough too, 2h was arbitrary
a
Yeah, they're very niche and I was hoping not to make the configuration guide even more confusing.
f
oh ok, btw, expiry window 50m means that it will renew/consider them outdated/we them after 50minutes or after 10minutes?
a
It wil renew them after 10 minutes. So you always get ~50m left on your token.
f
kk perfect, will set it to and test. ty 🙂
Copy code
pre_signed_expiry: 2h
                    web_identity:
                        session_duration: 3h
                        session_expiry_window: 2h
a
If it doesn't work, reduce everything to 1h or less. STS has some weird hardcoded behaviours.
f
ok thanks, will do that, gonna try with 2h "just to test", but probably in prod will be under 1h anyways, I think that should be enough and dont want to risk the links being too long (even though all our devs who have access to the logs are kind of trusted)
confirmed, that did the trick, now it jumped to 2hours 🙂 lets see how it goes, thanks a lot!
Updating back, it did fix the issue on the long running process (the one which processes big .gz files slowly). However I still see some ERROR 500, and if I click the link where I have the error, I can download the file from my web browser. It's happening with Spark local (maybe that one does not have retrials, not sure how spark behaves locally?) during integration testing:
Copy code
Server returned HTTP response code: 500 for URL: <https://blalbalblalbla>
    at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1902)
    at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1500)
    at sun.net.www.protocol.https.HttpsURLConnectionImpl.getInputStream(HttpsURLConnectionImpl.java:268)
    at io.lakefs.storage.HttpRangeInputStream.updateInputStream(HttpRangeInputStream.java:54)
    at io.lakefs.storage.HttpRangeInputStream.read(HttpRangeInputStream.java:100)
    at java.io.InputStream.read(InputStream.java:170)
    at java.io.DataInputStream.read(DataInputStream.java:149)
    at org.apache.parquet.io.DelegatingSeekableInputStream.readFully(DelegatingSeekableInputStream.java:102)
    at org.apache.parquet.io.DelegatingSeekableInputStream.readFullyHeapBuffer(DelegatingSeekableInputStream.java:127)
    at org.apache.parquet.io.DelegatingSeekableInputStream.readFully(DelegatingSeekableInputStream.java:91)
    at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:576)
    at org.apache.parquet.hadoop.ParquetFileReader.<init>(ParquetFileReader.java:777)
    at org.apache.spark.sql.execution.datasources.parquet.SpecificParquetRecordReaderBase.initialize(SpecificParquetRecordReaderBase.java:102)
    at org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.initialize(VectorizedParquetRecordReader.java:180)
    at org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat.$anonfun$buildReaderWithPartitionValues$2(ParquetFileFormat.scala:284)
a
That is odd, because lakeFS is not on the read data path. Do you know how long this happens after the stage starts?
f
it is here
Copy code
io.lakefs.storage.HttpRangeInputStream.updateInputStream(HttpRangeInputStream.java:54)
    at io.lakefs.storage.HttpRangeInputStream.read(HttpRangeInputStream.java:100)
isnt it?
^im using Spark LakeFS filesystem
I don't have spark UI, but the whole process failed in less than 5 minutes (and the link is still valid)
its happening during a delta Merge btw, just in case. In Spark local mode there's no task maxFailures, thats why we are very sensitive to that though (I think, just did some minor testing)
a
Yeah but a 500 from s3 should be fairly rare. This is real s3, right? Not some MinIO container in a test environment...
f
yes, its real S3
lakefs internally uses weird (many) prefixes so i dont know why the throttle either 😕 not sure if its related to presigned urls having different throttle rules
a
Yeah, we are literally best practice for object naming. Also I don't think you're hitting that key very often. It only ever spreads your keys out more. I would expect lakeFS to throttle you well before S3 does.
f
on my side the connection should be mostly direct, EKS -> VPC Gateway endpoint (i.e. the one which uses routes, not privatelink) -> S3. Like no api gateways in the middle or anything.