I lied. Our files can be 30GB in size worst case. ...
# help
c
I lied. Our files can be 30GB in size worst case. Big high resolution video files. How would this affect the LakeFS S3 Gateway? My understanding is data is written to S3 using the Gateway's Storage Adapter, and metadata managed via something called Graveler. Not sure if its just a driver application that would tell Spark how to upload data to S3 or if it's going to try and ingest all of that information into memory when uploading.
i
The preferred option is to use presign urls using lakeFS API and avoid the operational overhead of sending the data through lakeFS. If you’re using lakeFS Hadoop FS, data will not flow through lakeFS, just the metadata. If you’re using the S3 gw, then it will. Per your question, lakeFS will not load the entire file into memory, it will stream it to S3. Meaning, data will flow through lakeFS but the memory requirements won’t have to increase linearly with it.
👍 1