Playing around with the LakeFS API using the Java ...
# help
g
Playing around with the LakeFS API using the Java client, I started by running the sample on Getting Started, but I'm having some issues
Expected URL scheme 'http' or 'https' but no scheme was found for /api/v...
this is so far my code
Copy code
val defaultClient = Configuration.getDefaultApiClient()
            defaultClient.setBasePath("/api/v1")

            // Configure HTTP basic authorization: basic_auth
            val basic_auth = defaultClient.getAuthentication("basic_auth") as HttpBasicAuth
            basic_auth.username = "AKIAIOSFOLQUICKSTART"
            basic_auth.password = "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY"

            val apiInstance = ActionsApi(defaultClient)
            val repository = "quickstart"
            val runId = "runId_example"
            try {
                val result = apiInstance.getRun(repository, runId)
                    .execute()
                println(result)
            } catch (e: ApiException) {
                System.err.println("Exception when calling ActionsApi#getRun")
                System.err.println("Status code: " + e.code)
                System.err.println("Reason: " + e.responseBody)
                System.err.println("Response headers: " + e.responseHeaders)
                e.printStackTrace()
            }
I have no idea what
runId
is supposed to be the repo
"quickstart"
is the one from the LakeFS
--quickstart
mode I also don't know where I shall put the url to connect to, Barak mentioned "as part of each API constructor", but I couldn't figure it out what he meant specifically
a
I think this connects to the right path but did not specify the host:
Copy code
defaultClient.setBasePath("/api/v1")
Could you try
Copy code
defaultClient.setBasePath("<http://localhost:8000/api/v1>")
?
You can find some examples of client creation in the various Java high-level clients in the repo, for instance under
clients/hadoopfs
.
g
with http://localhost:8000/api/v1 I get this error instead
Exception when calling ActionsApi#getRun
Status code: 404
Reason: {"message":"not found: not found"}
Response headers: {content-length=[35], content-type=[application/json], date=[Wed, 14 Feb 2024 125320 GMT], x-content-type-options=[nosniff], x-request-id=[79c62be2-5501-484d-b901-c4d0e9d4b690]}
io.lakefs.clients.sdk.ApiException: Message: Not Found
HTTP response code: 404
HTTP response body: {"message":"not found: not found"}
HTTP response headers: {content-length=[35], content-type=[application/json], date=[Wed, 14 Feb 2024 125320 GMT], x-content-type-options=[nosniff], x-request-id=[79c62be2-5501-484d-b901-c4d0e9d4b690]}
at io.lakefs.clients.sdk.ApiClient.handleResponse(ApiClient.java:1138)
at io.lakefs.clients.sdk.ApiClient.execute(ApiClient.java:1051)
at io.lakefs.clients.sdk.ActionsApi.getRunWithHttpInfo(ActionsApi.java:145)
at io.lakefs.clients.sdk.ActionsApi.access$100(ActionsApi.java:42)
at io.lakefs.clients.sdk.ActionsApi$APIgetRunRequest.execute(ActionsApi.java:199)
at com.example.ApplicationKt$configureRouting$1$1.invokeSuspend(Application.kt:77)
at com.example.ApplicationKt$configureRouting$1$1.invoke(Application.kt)
a
Hi Giuseppe, I'd start by reading the OpenAPI spec to understand what 404 means. I'd bet here that the run id does not exist.
g
what is the
runId
?
a
It will probably be easier to track the API after reading about actions and hooks. Run IDs are opaque, you can get them from lakeFS by doing things like listing, by API or lakectl or web-UI.
g
Actually, I'm not even sure I need to execute
apiInstance::getRun
, I was just eager to get anything running I'd be more than happy if I can get somehow this on the Java client
for change in client.diff(repo, from_ref, target_branch, prefix=prefix):
ApiClient
seems to have no
diff
or similar method available
a
1. You're definitely talking to your lakeFS. 2. The best guide for exploring our OpenAPI is the lakeFS API Reference. It is generated from swagger.yml, which is the definitive reference. You can search either one for "diff" to find diffBranch (difference between a branch and its staging area, aka uncommitted changes) and diffRefs (difference between two lakeFS refs). HTH.
g
sorry, but I still fail to grasp this in the LakeFS API Reference you linked, I found, for example, this
GET /repositories/{repository}/branches/{branch}/diff diff branch
Do you mean I shall manually create an Http GET to get the info I'd like to have?
a
That's the OpenAPI visualizer, which indeed primarily shows the REST API. Personally I usually prefer just to read the spec and use the generated SDK. If you go to the spec, that call is called diffBranch, and the generated JVM client uses that name. We committed as part of 1.0 to supply the REST API as an OpenAPI spec, and we additionally generate and publish clients for Python and the JVM.
You can do that. Or you can use an API call. The API call is auto-generated. It performs the same http call.
g
at the end I could see the staging files with
Copy code
val api = BranchesApi(defaultClient)
val diffList = api.diffBranch(schema.repositoryId, schema.branchId).execute()
where
schema
is the parsed class from the received text (I'm posting for others looking at the same problem)
sunglasses lakefs 1
however now I'm looking on the simplest way to read a few kb yaml file, nothing huge, without any multi-threading and/or atomic concern I'm looking over and over in the hadoopfs examples, but I cant figure it out could you help me giving me some hints, Ariel?
I see there is a
LakeFSFileSystem
class, which looks interesting, comment says >
Copy code
* A dummy implementation of the core lakeFS Filesystem.
> * This class implements a {@link LakeFSFileSystem} that can be registered to
> * Spark and support limited write and read actions.
but I cant see anything for reading
a
Sure. There are actually no issues with multithreading or atomicity. For reading and writing in Python we recommend the higher level SDK. It makes things easier. lakeFSFS is a great solution for Hadoop-based work, for insurance in Spark. I would not recommend introducing Hadoop anywhere you don't need it.
g
yes, I saw that, but I'm looking for reading in the jvm world
a
We have currently have nothing similar for the Java world. Of course, with significant demand this may change.
You will need to getObject and read the returned stream.
👍 1
g
looks like it's even simpler than that
Copy code
val objectsApi = ObjectsApi(defaultClient)
    val objectStatsList = objectsApi.listObjects("quickstart", "main").execute()
    println(objectStatsList)
    val readme = objectStatsList.results.first { it.path == "README.md" }
    val readme2 = objectsApi.getObject("quickstart", "main", "README.md").execute()
    println(Path(readme.physicalAddress.substringAfter("local://")).readText())
    println(readme2.readText())
is it that simple because in
quickstart
mode? Would it work the same in production with MinIO behind?
e
It has nothing to do with the quickstart. It should work the same way regardless of the repository and/or block store type. So, yes, it should work the same way with MinIO.
👍 1
g
I see
InternalApi(defaultClient).lakeFSVersion
is marked as deprecate, what shall it be substituted with? Everything I can find still points to
InternalApi
a
It is an internal API, used by the GUI. Typical user programs should have little use for this API. Even the format of the reply strings might not be what you expect. "Internal" means that it is not subject to the lakFS compatibility guarantees. You may of course call whatever API you want, just please be mindful of the guarantees that we can provide.
👍 1
@Giuseppe Barbieri, I am not sure what exactly we are trying to achieve here. I might have an easier time if you could explain the end-goal. We might be able to provide (much) more helpful advice, than by answering you one bit at a time in such an unstructured manner.
g
I'm working with the client java api for our internal researchers and on the side, I'm trying to develop some sort of high level wrapper, counterpart of the python one already existing
imho, then
@Deprecated
is the wrong annotation you should probably create one specific called, for example,
@Internal
a
The source of truth is the OpenAPI spec that I have linked. Cross-language specs cannot assume the same annotations across all languages, and indeed even in a single language or even platform such as the JVM there is no point in adding an annotation not supported by the toolchain.
👍 1
I'm working with the client java api for our internal researchers and on the side, I'm trying to develop some sort of high level wrapper, counterpart of the python one already existing
If so, I would recommend sticking to supported APIs.