Blob Foreign Key #6164
jiaoew1991
started this conversation in
Ideas
Replies: 1 comment
-
|
just to complete the picture: there is now also a detailed write up about blob v2 its history and how it works: |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
In lance storage v2.2, the blob v2 data format supports S3 raw data references, which is a very cool feature https://lancedb.com/blog/lance-file-format-2-2-taming-complex-data/
In our company's scenario, we use lance storage extensively for video, images, audio, and other large binary data, but the usage has always been a headache for us. Due to the large volume of blob data, it's easy to run out of memory (OOM) in computing scenarios, and it's also not friendly for queries in warehouse scenarios. If you want to use the same binary data across multiple tables, it won't be as straightforward as copying a number, because copying it is too slow and also wastes storage.
Therefore, our current solution is to have one binary table and multiple business tables that store the rowid of the binary table. This approach indeed solves some problems, but it also has other issues: first, when using it, you still need to access the binary table through the SDK, which is very repetitive; second, if there are multiple binary tables, the logic becomes more complex; third, the rowid itself is not stable, as compaction, updates, and other operations can cause the rowid to change, which prevents us from modifying the binary table and forces us to freeze it after making all changes at once.
There is a new solution. Blob v2 already supports S3 raw data references, so why not support the scenario of external rowid? This would be like blob supporting foreign keys, allowing us to leverage the advantages of the blob API without wasting storage.
Beta Was this translation helpful? Give feedback.
All reactions