Hi Everyone I have a question about the database storage siz lakeFS #dev

Hi Everyone, I have a question about the database ...

Niro

08/08/2022, 2:13 PM

Hi Everyone, I have a question about the database storage sizing guide in the lakeFS documentation According to the documentation, the storage requirements are about 150MiB per every 100,000 uncommitted writes - which is roughly around 1500 bytes per write. Looking at the code - I see that the lakeFS is writing the following

Entry

struct per write:

Copy code

ent := &Entry{
		Address:      entry.PhysicalAddress,
		AddressType:  addressTypeToProto(entry.AddressType),
		Metadata:     entry.Metadata,
		LastModified: timestamppb.New(entry.CreationDate),
		ETag:         entry.Checksum,
		Size:         entry.Size,
		ContentType:  ContentTypeOrDefault(entry.ContentType),
	}

Creating a gross calculation taking into account field limits: Address - according to AWS guidelines does not exceed -- 1024 bytes AddressType - int32 -- 4 bytes Metadata - according to AWS limited to 2KB user data -- 2048 bytes LastModified - int64 -- 8bytes Etag - AWS limitation -- 1024 bytes Size - int64 -- 8 bytes ContentType - Lets use the worst case scenario -- 1024 bytes Summing this up we get over 5000 bytes which is far from the given estimation, and this is without taking into consideration other data which is saved such as entry key and checksum Am I missing something here?? (Keep in mind that these are general approximation - not trying to do exact math here but rather get a sense of the size)

Ariel Shaqed (Scolnicov)

08/08/2022, 3:47 PM

I think you may be summing upper bounds, which is good at estimating the worst case but not so good at estimating the average case. Address is typically <<256 bytes (you'd need a 224 byte storage namespace prefix to get there) , metadata barely exists, etag is 32 bytes or something, content type typically fits into 32 bytes. (For Graveler SSTables, of course, the column compression will really smash these numbers down!) If a user wants to DoS themselves, I imagine they could set a 1000 byte storage namespace prefix, invent content types, and use huge metadata. So you're right, 1500 bytes is very little for these users - but I'm not sure we would consider that "typical". Do you think we should clarify this is an expected number rather than the hard limit?

Niro

08/08/2022, 4:01 PM

I was evaluating the possible changes in the sizing guide towards transitioning to key-value store and was trying to get an understanding of how the numbers were created. So I guess if we are talking about the average case, there isn't a significant change between implementations

👍🏼 3

10 Views

Open in Slack

Previous Next