Itai David
02/07/2022, 2:40 PM\
) in Windows' style paths to Unix's style forward slashes (/
) to match S3 behavior.
Out main concern is the effect it might have on users currently using Windows' style paths and the effect it might have on their data.
The original message is attached hereafter. I'm attaching all responses as comments.
All further comments are very appreciated.
Thanks
Hi.
I would love to here some opinions and some your of concerns regarding an issue I'm working on - BUG : Upload directory path not parsed correctly #2880
TL;DR - we are not treating Windows' path delimiter - the backslash - as such
Upon uploading a file, from Windows, the original path, e.g. C:\some\path\to\file
is kept and used as a key.
Later, when the file is returned to the client(s), as a response for ListObjects, we do not parse the key as a path correctly.
While lakectl
shows the key as a full path either ways, the web UI fails to create the nice (and expected) tree display, as described in the bug
On @Tal Sofer's advice, I looked at S3 behaviour, and found out that AWS actually converts Windows` style paths to Unix style. Namely, all backslashes are turned into forward slashes.
That's it for the problem description, now for possible solutions:
Option 1 - change lakefs behaviour to match S3 - this means we actively change backslashes in path to forward slashes.
I can think of 2 problems here:
• Users who use Windows clients and has their data in place (let's say they do not use the web UI and are not bothered by the path parsing difference) will be affected, as new files will be 'named' differently
• Not sure this is a real issue, but Unix allows forward slashes as part of a valid path name. If we change these to backslashes it will affect the file path in lakefs.
Option 2 - Store the file path/name/key the same way we do today, but treat both backslash and forward slash as path delimiters.
This will solve the web UI problem and will not affect the way we keep data, and so, existing user data will not be affected .
However, this is handling the symptom rather than the problem, and I'm not sure of where else this problem gonna pop. Moreover, it does not align with our S3-like approach, so not sure this is even an option
Any advice, opinion or other considerations I fell to mention?
All comments will be greatly appreciated 🙏Itai David
02/07/2022, 2:40 PMchange lakefs behaviour to match s3
, you mean change the lakectl upload command to match aws s3? If so, I think this is the right way to go here. You raised a valid concern for breaking backward compatibility for Windows users relying on the forwardslash to appear in their lakefs repo, but (1) I don’t really think that such users exist (2) we can help them to migrate it when they’ll surface.
To summarize, I think we should change the lakectl fs upload
to handle the OS’s default separator (back for Unix, fwd for Windows) as lakeFS default separator (always back)Itai David
02/07/2022, 2:41 PM--translate-pathnames=false
, or some equally uninviting name).Itai David
02/07/2022, 2:41 PMToSlash
to replace the OS specific path to '/'. This way you will not have any 'if' or OS specific code to handle Windows.Itai David
02/07/2022, 2:42 PMlakefs
webUI there is no option for recursive upload or download, so this is not required, as for each single file upload a specific lakefs
destination is required and, obviously, it is specified in lakefs
format.
Should we consider supporting batch uploads from the web UI too, I guess we will have to support that too
So, if I summarize this up:
• We want to change the way we store the file path (key) to match S3 behavior (Unix path format)
• The translation should be done by the client (lakectl
)
• No need to support the same behavior with our web client (at least for now)
• We will introduce a flag to bypass this behavior (i.e. no path translation and preserving the wrong behavior we currently have). New, correct, behavior (i.e. Windows style paths will be transformed to S3 style) will be enabled by default.
• In case required, we will provide a migration procedure to translate all Windows style paths to S3 style, on existing databases (Do you think we should provide it as part of the fix introduction, or should it wait until someone actually need it?)
Sounds legit?
ThanksItai David
02/07/2022, 2:42 PMItai David
02/07/2022, 2:43 PMItai David
02/07/2022, 2:44 PM