Hi all. I'm attaching a technical discussion that ...
# dev
i
Hi all. I'm attaching a technical discussion that we were (mistakenly) having in an internal thread. It concerns an issue regarding the way we handle Windows paths. TL;DR - we want to change backslashes (
\
) in Windows' style paths to Unix's style forward slashes (
/
) to match S3 behavior. Out main concern is the effect it might have on users currently using Windows' style paths and the effect it might have on their data. The original message is attached hereafter. I'm attaching all responses as comments. All further comments are very appreciated. Thanks Hi. I would love to here some opinions and some your of concerns regarding an issue I'm working on - BUG : Upload directory path not parsed correctly #2880 TL;DR - we are not treating Windows' path delimiter - the backslash - as such Upon uploading a file, from Windows, the original path, e.g. 
C:\some\path\to\file
 is kept and used as a key. Later, when the file is returned to the client(s), as a response for ListObjects, we do not parse the key as a path correctly. While 
lakectl
 shows the key as a full path either ways, the web UI fails to create the nice (and expected) tree display, as described in the bug On @Tal Sofer's advice, I looked at S3 behaviour, and found out that AWS actually converts Windows` style paths to Unix style. Namely, all backslashes are turned into forward slashes. That's it for the problem description, now for possible solutions: Option 1 - change lakefs behaviour to match S3 - this means we actively change backslashes in path to forward slashes. I can think of 2 problems here: • Users who use Windows clients and has their data in place (let's say they do not use the web UI and are not bothered by the path parsing difference) will be affected, as new files will be 'named' differently • Not sure this is a real issue, but Unix allows forward slashes as part of a valid path name. If we change these to backslashes it will affect the file path in lakefs. Option 2 - Store the file path/name/key the same way we do today, but treat both backslash and forward slash as path delimiters. This will solve the web UI problem and will not affect the way we keep data, and so, existing user data will not be affected . However, this is handling the symptom rather than the problem, and I'm not sure of where else this problem gonna pop. Moreover, it does not align with our S3-like approach, so not sure this is even an option Any advice, opinion or other considerations I fell to mention? All comments will be greatly appreciated 🙏
@Itai Admi : Bear in mind that lakeFS acts as an s3 gateway, so we need to  treat forward-slash just as s3 does (i.e. don’t treat it as equal to the default delimiter backslash). So I think option2 isn’t really something we can do and keep compatibility with s3. When you write in option 1 
change lakefs behaviour to match s3
, you mean change the lakectl upload command to match aws s3? If so, I think this is the right way to go here. You raised a valid concern for breaking backward compatibility for Windows users relying on the forwardslash to appear in their lakefs repo, but (1) I don’t really think that such users exist (2) we can help them to migrate it when they’ll surface. To summarize, I think we should change the 
lakectl fs upload
 to handle the OS’s default separator (back for Unix, fwd for Windows) as lakeFS default separator (always back)
@Ariel Shaqed (Scolnicov) : When you say "aws" you mean just the cli client, right? (You might test by reading source code, or trying to upload using the cli to lakeFS). If so, then I would change our cli client to do what the aws cli client does (but only on Windows - I expect you can use filepath or somesuch standard module to perform the translation!), and add a flag to revert to the old bad behaviour (
--translate-pathnames=false
, or  some equally uninviting name).
@Barak Amar : You can use https://pkg.go.dev/path/filepath package to process the command line argument in case it is a path. Use function like 
ToSlash
 to replace the OS specific path to '/'. This way you will not have any 'if' or OS specific code to handle Windows.
@Itai David : Thanks everyone for your comments I've done some additional digging. Indeed AWS perform the translation between local path style and S3 path style (namely, Unix style), in both directions, at the clients I guess that means we will align with this behavior. Please note that for S3 the same path translation is supported by the web UI too. For 
lakefs
 webUI there is no option for recursive upload or download, so this is not required, as for each single file upload a specific 
lakefs
 destination is required and, obviously, it is specified in 
lakefs
 format. Should we consider supporting batch uploads from the web UI too, I guess we will have to support that too So, if I summarize this up: • We want to change the way we store the file path (key) to match S3 behavior (Unix path format) • The translation should be done by the client (
lakectl
) • No need to support the same behavior with our web client (at least for now) • We will introduce a flag to bypass this behavior (i.e. no path translation and preserving the wrong behavior we currently have). New, correct, behavior (i.e. Windows style paths will be transformed to S3 style) will be enabled by default. • In case required, we will provide a migration procedure to translate all Windows style paths to S3 style, on existing databases (Do you think we should provide it as part of the fix introduction, or should it wait until someone actually need it?) Sounds legit? Thanks
@Itai Admi : Agreed, except that I don't think we need a flag for using the current path translation.
@Itai David : OK. What about users that want to keep using their already existing data, uploaded from Windows? Are we offering a migration tool as part of this feature?
@Ariel Shaqed (Scolnicov) : Arguably not many people have done this bad upload, given the bug hasn't shown up yet. Personally I want a flag precisely for this b/c (backwards compatibility). And I definitely don't want a migration tool: if someone complains, let's give them a tool. But let's not without a good use-case, or we won't even know the scale of their problem...