Skip to content

Conversation

vaibhav5140
Copy link
Contributor

Description of change

[README Update]
Follow-Up PR of (#338) to include Consistency Model documentation of AAL

Does this contribution need a changelog entry?

  • I have updated the CHANGELOG or README if appropriate

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and I agree to the terms of the Developer Certificate of Origin (DCO).

@vaibhav5140 vaibhav5140 temporarily deployed to integration-tests August 12, 2025 15:41 — with GitHub Actions Inactive
Copy link
Collaborator

@fuatbasik fuatbasik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @vaibhav5140, I put some comments.


## Consistency Model

Analytics Accelerator Library for Amazon S3 implements a time-based consistency model using TTL (Time-To-Live) for metadata caching. This ensures bounded staleness while maintaining performance. The library caches object metadata with a configurable TTL period. When metadata TTL expires, the library performs a conditional HEAD request to S3 to verify if the object has changed. If the object is unchanged, the TTL period is reset. If the object has changed, the cache is updated with new metadata.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can drop conditional from the sentence.


Within-Stream Consistency: Each stream maintains strict consistency throughout its lifetime by locking to a specific object version. If cached blocks are evicted and the object has changed in S3, subsequent reads will fail with 412 Precondition Failed errors rather than serving inconsistent data.

Cross-Stream Consistency: Streams created within the same TTL window see the same object version. When an S3 object is modified, existing streams continue using their original version, while new streams will see the updated version after TTL expiration. This can result in different streams temporarily accessing different versions of the same object, but the maximum staleness is bounded by the configured TTL duration.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can drop the sentence This can result ..... It might confuse to reader.


Cross-Stream Consistency: Streams created within the same TTL window see the same object version. When an S3 object is modified, existing streams continue using their original version, while new streams will see the updated version after TTL expiration. This can result in different streams temporarily accessing different versions of the same object, but the maximum staleness is bounded by the configured TTL duration.

The metadata cache may evict entries when reaching its size limit (5000 entries by default), triggering fresh metadata fetching on the next access without affecting existing streams consistency.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might want to add a sentence to describe the reasoning behind having a size-limit in addition. Something like, To avoid an ever-growing metdata cache, when TTL is set to a very large number, we also use a threshold to limit number of metadata entries kept in memory.. Next, you should also tell what is the eviction strategy when this threshold is reached.

* `metadatastore.capacity` - Maximum size for metadata cache entries, default is 5000

TTL = 0 (Strong Consistency):
Setting `metadata.ttl.default=0` provides the strongest consistency guarantees. Every metadata access triggers a fresh HEAD request to S3. This configuration results in higher latency due to increased S3 requests but is recommended when absolute freshness is required.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think saying provides Strong Consistency is enough. Since Strong Consistency is a technical term, i am not sure if we can use stronger here.

TTL = 0 (Strong Consistency):
Setting `metadata.ttl.default=0` provides the strongest consistency guarantees. Every metadata access triggers a fresh HEAD request to S3. This configuration results in higher latency due to increased S3 requests but is recommended when absolute freshness is required.

High TTL Values (Relaxed Consistency):
Copy link
Collaborator

@fuatbasik fuatbasik Aug 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you mean eventual consistency. We are not talking about Relaxed consistency models here.

Setting `metadata.ttl.default=0` provides the strongest consistency guarantees. Every metadata access triggers a fresh HEAD request to S3. This configuration results in higher latency due to increased S3 requests but is recommended when absolute freshness is required.

High TTL Values (Relaxed Consistency):
Setting a high TTL value (e.g., `metadata.ttl.default` for 1 hour) allows for longer caching of metadata. This configuration offers better performance due to reduced S3 requests and is suitable for static data or when staleness can be tolerated.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reduced HEAD requests?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants