https://lakefs.io/ logo
Title
a

Adi Polak

09/09/2022, 12:31 PM
Heya, I am looking to better understand lakeFS data model with DynamoDB. when we think about consistency and atomic operations, where is the locking mechanism taking place? and how to ensure atomic operations are happening when there is a need to scale the lakeFS server?
i

Itai Admi

09/09/2022, 2:27 PM
lakeFS data model has not changed with the transition to the
kvstore
. The
kvstore
simple interface allows many different databases to implement it, currently we have DynamoDB and Postgres. We use the
SetIf
functionality to ensure that operation that started on a given entity (e.g. branch) will only succeed if the entity didn’t change since it was read (e.g. multiple operations trying to change the branch HEAD). Other than that we must configure the DB in a way that will ensure read-after-write consistency, luckily DynamoDB has that option.
🔐 1
a

Adi Polak

09/11/2022, 4:00 AM
thanks, @Itai Admi. so as far as I understand, the lock is managed by the key-value store, and since DynamoDB supports read-after-write strong consistency, it is by design makes lakeFS branch operations atomic. would it be possible to leverage s3 as kV store as well? in 2020 s3 introduce a strong read after writing and by design, it is a key-value store. or could it be that because of s3 naming collision that might introduce problems in the future with key names?
i

Itai Admi

09/11/2022, 7:55 AM
The
SetIf
functionality is a fancy name for
compare-and-swap
. There’s no easy way to achieve this with S3, so the answer is unfortunately not at this moment.
a

Adi Polak

09/12/2022, 5:04 AM
so basically passing the predicate - compare - and if predicate, then, swap? so the dynamodb API itself is PutItemWithContext?
i

Itai Admi

09/12/2022, 7:20 AM
The DynamoDB API call is PutItem (
PutItemWithContext
is the golang sdk api). We use ConditionExpression to compare the blobs. Code reference.
a

Adi Polak

09/12/2022, 1:03 PM
thanks!!