-
Notifications
You must be signed in to change notification settings - Fork 14
[README]:consistencyModel #346
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @vaibhav5140, I put some comments.
|
||
## Consistency Model | ||
|
||
Analytics Accelerator Library for Amazon S3 implements a time-based consistency model using TTL (Time-To-Live) for metadata caching. This ensures bounded staleness while maintaining performance. The library caches object metadata with a configurable TTL period. When metadata TTL expires, the library performs a conditional HEAD request to S3 to verify if the object has changed. If the object is unchanged, the TTL period is reset. If the object has changed, the cache is updated with new metadata. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can drop conditional
from the sentence.
|
||
Within-Stream Consistency: Each stream maintains strict consistency throughout its lifetime by locking to a specific object version. If cached blocks are evicted and the object has changed in S3, subsequent reads will fail with 412 Precondition Failed errors rather than serving inconsistent data. | ||
|
||
Cross-Stream Consistency: Streams created within the same TTL window see the same object version. When an S3 object is modified, existing streams continue using their original version, while new streams will see the updated version after TTL expiration. This can result in different streams temporarily accessing different versions of the same object, but the maximum staleness is bounded by the configured TTL duration. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can drop the sentence This can result .....
It might confuse to reader.
|
||
Cross-Stream Consistency: Streams created within the same TTL window see the same object version. When an S3 object is modified, existing streams continue using their original version, while new streams will see the updated version after TTL expiration. This can result in different streams temporarily accessing different versions of the same object, but the maximum staleness is bounded by the configured TTL duration. | ||
|
||
The metadata cache may evict entries when reaching its size limit (5000 entries by default), triggering fresh metadata fetching on the next access without affecting existing streams consistency. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might want to add a sentence to describe the reasoning behind having a size-limit in addition. Something like, To avoid an ever-growing metdata cache, when TTL is set to a very large number, we also use a threshold to limit number of metadata entries kept in memory
.. Next, you should also tell what is the eviction strategy when this threshold is reached.
* `metadatastore.capacity` - Maximum size for metadata cache entries, default is 5000 | ||
|
||
TTL = 0 (Strong Consistency): | ||
Setting `metadata.ttl.default=0` provides the strongest consistency guarantees. Every metadata access triggers a fresh HEAD request to S3. This configuration results in higher latency due to increased S3 requests but is recommended when absolute freshness is required. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think saying provides Strong Consistency is enough. Since Strong Consistency is a technical term, i am not sure if we can use stronger here.
TTL = 0 (Strong Consistency): | ||
Setting `metadata.ttl.default=0` provides the strongest consistency guarantees. Every metadata access triggers a fresh HEAD request to S3. This configuration results in higher latency due to increased S3 requests but is recommended when absolute freshness is required. | ||
|
||
High TTL Values (Relaxed Consistency): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you mean eventual consistency
. We are not talking about Relaxed consistency models here.
Setting `metadata.ttl.default=0` provides the strongest consistency guarantees. Every metadata access triggers a fresh HEAD request to S3. This configuration results in higher latency due to increased S3 requests but is recommended when absolute freshness is required. | ||
|
||
High TTL Values (Relaxed Consistency): | ||
Setting a high TTL value (e.g., `metadata.ttl.default` for 1 hour) allows for longer caching of metadata. This configuration offers better performance due to reduced S3 requests and is suitable for static data or when staleness can be tolerated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reduced HEAD requests?
Description of change
[README Update]
Follow-Up PR of (#338) to include Consistency Model documentation of AAL
Does this contribution need a changelog entry?
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and I agree to the terms of the Developer Certificate of Origin (DCO).