Selva
12/06/2022, 5:20 AMGuy Hardonag
12/06/2022, 5:53 AMSelva
12/06/2022, 6:28 AMGuy Hardonag
12/06/2022, 6:55 AMSelva
12/13/2022, 7:45 AMrepo
|_code
|_prep.m
|_scale.m
|_split.m
|_train.m
|_val.m
|_data
|_prep_results
|_<txt files>
|_scale_results
|_<txt files>
|_split_results
|_<txt files>
|_train_results
|_<model files>
My users wants the experience of git gui for code changes and local debugging. That is, clone the repo to c://repo
, make changes to the script, run the matlab script with limited inputs. Once satisfied, commit the script and our powerful server machine (HPC) run for full data (create millions of files).
Git is not useful for this case. Because, I cannot commit millions of files and I cannot clone them to my local later. I need some kind of feature where I clone only the code folder and not the data folder. DVC
was promising this by storing only the reference but it cannot work on millions of files. Hence, I have to ditch this.
Then, I came with a second alternative of breaking my repo into two (which I hate because of higher possibility of manual errors by users)
repo_git
|_code
|_prep.m
|_scale.m
|_split.m
|_train.m
|_val.m
repo_lakefs
|_data
|_prep_results
|_scale_results
|_split_results
|_train_results
So now, my users can clone the repo_git alone and make code changes and then run. Once satisfied, they can commit the code change and our HPC will run for full data.
So the writing output code in prep.m
currently,
Pitch = [0.7;0.8;1;1.25;1.5];
Price = [10.0;13.59;10.50;12.00;16.69];
Stock = [376;502;465;1091;562];
T = table(Pitch,Price,Stock)
writetable(T,'../data/prep_results/tabledata.txt');
Should be changed as
Pitch = [0.7;0.8;1;1.25;1.5];
Price = [10.0;13.59;10.50;12.00;16.69];
Stock = [376;502;465;1091;562];
T = table(Pitch,Price,Stock)
%Todo: create a LakeFS branch with the same name as current git branch
%Todo: Get the azure path of the data folder in this new branch and store in lakefspath variable
txtpath = fullfile(lakefspath,âprep_resultsâ,âtabledata.txtâ
writetable(T,txtpath);
Could you please fill the todo? Also, if you have any other alternatives please suggestLynn Rozen
12/13/2022, 1:06 PMSelva
12/13/2022, 7:19 PM