Hello guys, I was testing the Java Client upload...
# help
r
Hello guys, I was testing the Java Client upload API via JMeter. The Number of Threads(Users) 10 it works fine. When increased to 20 I got
io.lakefs.clients.api.ApiException: java.net.SocketTimeoutException: timeout
And in lakefs server logs are:
DEBUG  [2022-06-02T11:11:59+05:30]lakeFS/pkg/httputil/logging.go:78 pkg/httputil.DebugLoggingMiddleware.func1.1 HTTP call ended                               host="localhost:8000" method=POST path="/api/v1/repositories/btest/branches/68873464/objects?path=containerd.gz&storageClass=" request_id=4f5fdba2-5fa1-436f-aa64-d0003ca6146a sent_bytes=31 service_name=rest_api status_code=500 took=1m30.3184134s
Do I need to increase the timeout? OkHttpClient default value is 10 second?
t
Hi @Raman Kharche! a quick question -
The Number of Threads(Users) 10 it works fine. When increased to 20
Can you please share what number did you change and how? also, if you can share what you are trying to do that would be great.
r
image.png
I am trying to check how upload api will work via java client when number of users are 10/20 maybe 50 and these users are trying to upload a file of size around 45MB
50 users is too much 25 would be the worst case
t
Thanks! I would like to understand the setup of your testing environment to help you move forward. • Where is your lakeFS server running? is it a local machine? • Where is your bucket located? • What is your testing program doing? I mean, is it a simple upload or you are testing additional lakeFS operations?
r
• lakeFS server is on local • Currently It's local development so
blockstore
is local • One API I have written in JAVA which calls upload API + Commit API via Java Client of lakeFS
👀 1
t
Thanks! have another question to get the full picture - where is your db? local as well?
r
yes local. Postgres db
👌 1
t
Let’s try the following: Can you please try to do only uploads in your test? I would like to first make sure that this is happening quickly as i’m expecting it to happen.
I mean, let’s try without commits
r
Yeah it is working if I don't commit it. Even working with 50 users (Thread)
t
@Raman Kharche Thanks for trying it our and letting me know! Let me explain the reasoning behind what you are seeing - Commit is a blocking operation. It locks the branch because it has to happen in a serial manner. So what I think is happening is that an upload is delayed because the lock is used by a commit operation. As opposed to commit, an upload operation is non-blocking (multi-writers are expected) and that’s why multiple concurrent uploads are fast. In terms of lakeFS best practices, we recommend each user use their own branch to make changes and commits and then merge back to a “main” branch after making the desired change. Merge is also a blocking operation but in nature is a less frequent operation. I’m curious to learn about your use-case - do you have a use case in which your users will have to do multiple concurrent commits?
I will also open an issue for us to investigate the limits of concurrent commit to the same branch and share it here
r
OKay. This are the steps: • Create branch for the each user from
main
• User is uploading 1 file (~45MB) in his branch only. For example Let say branch name is
user1
uploading file and committing it in
user1
branch So as soon as it will uploaded I have called commit api
t
To clarify, these steps are what you plan to do? :)
r
I was thinking, if user is uploading file in his branch(
user1
), directly commit it his branch
t
Yes, exactly
When they want to merge back main will be locked
r
But I'm not merging it in
main
. So basically when user will upload a file it will create one
Uncommitted Changes
Right. So I'm just committing this change in the user's branch. Not in
main
branch
t
Sorry I was not clear :) If your users don't want to share the output of their work with others - they should commit to their user branches as you mentioned and never merge their changes back to main. In case they later want to share their changes, they will need to merge back.
Let me know if that makes sense😊
👍 1
r
yeah right, so if user thinks
all is well
then user will merge his branch to
main
lakefs 1
💯 1
t
btw, this blog post explains this use case with examples. you may want to check it out!
👍 1
r
also The commit is working if I do
uploadObjectAsync
. (in onSuccess I am hitting commit api)
t
Do you mean that the flow 1. upload to main branch 2. commit to main branch done by 20 concurrent users is successful?
r
not just 20. for 50 concurrent users 🙂
sunglasses lakefs 1
👍 1
t
@Raman Kharche thinking about this more - when you used uploadObjectAsync did you end up committing objects successfully or you created empty commits? Yo somehow waited until upload completed?
r
Yeah, I waited till the file gets upload
something like this:
Copy code
objectsApi.uploadObjectAsync(repoName, branchName, path, "", "*", content, new ApiCallback<ObjectStats>() {
    @Override
    public void onFailure(ApiException e, int statusCode, Map<String, List<String>> responseHeaders) {
        
    }

    @SneakyThrows
    @Override
    public void onSuccess(ObjectStats result, int statusCode, Map<String, List<String>> responseHeaders) {
        CommitsApi commitsApi = new CommitsApi(lakeFsClient.createLakeFsClient());
        CommitCreation commitCreation = new CommitCreation();
        commitCreation.setMessage("upload: " + fileUploadRequest.getCommitMessage());
        commitCreation.setDate(System.currentTimeMillis());
        commitCreation.setMetadata(new HashMap<>());
        commitsApi.commit(repoName, branchName, commitCreation, null);
    }

    @Override
    public void onUploadProgress(long bytesWritten, long contentLength, boolean done) {

    }

    @Override
    public void onDownloadProgress(long bytesRead, long contentLength, boolean done) {

    }
});
t
got it, thanks! and when you did this each user committed to their own brach or to a shared one?
r
to their
own
branch. Not in
main
branch
t
That makes more sense now 🙂 thanks again
👍 1