Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added changes to enable full file cache stats #17538

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

mayanksharma27
Copy link
Contributor

@mayanksharma27 mayanksharma27 commented Mar 6, 2025

Description

  • Added Changes to enable FullFileCacheStats
  • Added integration tests for FileCacheStats and FullFileCacheStats for writable warm.

Related Issues

Resolves #17479.
Meta Issue #13149

Check List

  • [ ✓ ] Functionality includes testing.
  • [ NA] API changes companion pull request created, if applicable.
  • [NA] Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Copy link
Contributor

github-actions bot commented Mar 6, 2025

❌ Gradle check result for da0dcc9: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@gbbafna gbbafna requested review from rayshrey and shourya035 March 10, 2025 08:21
Copy link
Contributor

@rayshrey rayshrey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me from the functionality point of view.
But have a few concerns related to extensibility (specially when we are going to introduce the pinning functionality)

  • We have added FullFileCacheStats to account for the stats of the full file. Will we require a similar file for accounting for stats of blocked files (and pinned files in future) ? In that case should we not have it in a single place - CacheUsage would be a good place for that since we are already maintaning the usage and activeUsage overall there.

  • Also adding to my comment on making changes in the LRUCache instead of FileCache, how about adding changes to the Node in LRUCache to check whether the IndexInput is for block or full file. We already have a weight param for Node - we can add file type, pinned status etc as per our use case to simplify the stats/usage aggregation. We would just need to make changes in the replaceNode and addNode functions to account for our needed stats/usages. Something similar to below.

static class Node<K, V> {
        final K key;
        V value;
        long weight;
        int refCount;
        // Add below properties for Node
        FileType fileType; 
        PinStatus pinned;
}
public class CacheUsage {
        private final long usage;
        private final long activeUsage;
        // Add below properties in CacheUsage
        private final long blockUsage;
        private final long nonBlockUsage;
        private final long pinnedUsage;
}

@gbbafna @ankitkala @mayanksharma27 thoughts on the above approach ?

@mayanksharma27 mayanksharma27 force-pushed the smynk-tiering-FullfileCacheStats branch from da0dcc9 to 38847e8 Compare March 17, 2025 05:00
@github-actions github-actions bot added enhancement Enhancement or improvement to existing feature or request Storage:Remote labels Mar 17, 2025
@mayanksharma27
Copy link
Contributor Author

mayanksharma27 commented Mar 17, 2025

@rayshrey regarding your comment above -

  • We have added FullFileCacheStats to account for the stats of the full file. Will we require a similar file for accounting for stats of blocked files (and pinned files in future) ? In that case should we not have it in a single place - CacheUsage would be a good place for that since we are already maintaning the usage and activeUsage overall there.
  • We wont be requiring a similar file for both of these as current filecachestats would handle it. As part of the recent commit I have -
    - Moved Usage and ActiveUsage as part of stats itself; and removed CacheUsage altogether; this is done to keep FileCacheStats as single source of truth for all metrics related to LRU Cache.
  • Also adding to my comment on making changes in the LRUCache instead of FileCache, how about adding changes to the Node in LRUCache to check whether the IndexInput is for block or full file. We already have a weight param for Node - we can add file type, pinned status etc as per our use case to simplify the stats/usage aggregation. We would just need to make changes in the replaceNode and addNode functions to account for our needed stats/usages. Something similar to below.
  • While adding pinning info in Node does makes sense, adding business logic - like filetype needed just to publish metrics and not needed for cache functionality is not clean. I instead have added this logic as part of CacheStats class.

Copy link
Contributor

❕ Gradle check result for 38847e8: UNSTABLE

  • TEST FAILURES:
      1 org.opensearch.http.SearchRestCancellationIT.testAutomaticCancellationMultiSearchDuringQueryPhase

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

Copy link

codecov bot commented Mar 17, 2025

Codecov Report

Attention: Patch coverage is 76.27907% with 51 lines in your changes missing coverage. Please review.

Project coverage is 72.44%. Comparing base (e306d51) to head (38847e8).
Report is 7 commits behind head on main.

Files with missing lines Patch % Lines
...dex/store/remote/utils/cache/stats/CacheStats.java 52.17% 21 Missing and 1 partial ⚠️
.../remote/utils/cache/stats/DefaultStatsCounter.java 72.22% 9 Missing and 6 partials ⚠️
...dex/store/remote/filecache/FullFileCacheStats.java 70.96% 9 Missing ⚠️
...h/index/store/remote/filecache/FileCacheStats.java 50.00% 2 Missing and 1 partial ⚠️
...search/index/store/remote/filecache/FileCache.java 83.33% 2 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main   #17538      +/-   ##
============================================
+ Coverage     72.27%   72.44%   +0.16%     
- Complexity    65611    65756     +145     
============================================
  Files          5311     5311              
  Lines        304942   305099     +157     
  Branches      44225    44247      +22     
============================================
+ Hits         220407   221034     +627     
+ Misses        66448    65959     -489     
- Partials      18087    18106      +19     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

}

@Override
public String toString() {
return snapshot().toString();
}

private boolean isFullFile(V value) {
Copy link
Contributor

@skumawat2025 skumawat2025 Mar 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are using this type checking every-time we remove, get, and put value to FileCache.
Would it make sense to have this value pre-computed for FullFIleCachedIndexInput?
Maybe introducing a method in CachedIndexInput Thoughts?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a good point @skumawat2025 . So for every get we would be calling instanceof . We should do a benchmark for this change by reusing FileCacheBenchmark . If we see any perf issues, we will need to solve for it .

/**
* A non thread-safe {@link StatsCounter} implementation.
*
* @opensearch.internal
*/
public class DefaultStatsCounter<K> implements StatsCounter<K> {
public class DefaultStatsCounter<K, V> implements StatsCounter<K, V> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about having a different implementation of StatsCounter in case of FullFileStats? Maybe using a Factory while initialising Cache. Currently FullFileStats won't make sense in case of SearchableSnapshots right?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can make sense if we support pinning some files there.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same suggestion here as well - have a base stats counter implementation and then have DefaultStatsCounter as below

DefaultStatsCounter {
    StatsCounter overallStatsCounter;
    StatsCounter fullFileStatsCounter;
    StatsCounter blockedFileStatsCounter;
    StatsCounter pinnedFileStatsCounter;
}

* Represents Stats about FullFiles in a {@link RefCountedCache}.
*/
@ExperimentalApi
public class FullFileStats {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can be static class?

CacheUsage cacheUsage = theCache.usage();
logger.trace("Total Usage: " + cacheUsage.usage() + " , Active Usage: " + cacheUsage.activeUsage());
long cacheUsage = theCache.usage();
logger.trace("Total Usage: " + cacheUsage);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we log the active usage here as well since it was logged previously as well.

}

@Override
public String toString() {
return snapshot().toString();
}

private boolean isFullFile(V value) {
return value instanceof CachedFullFileIndexInput;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of using this instanceof check, I think it will be better to use the Path of the the file to identify it's type(block files will have block as part of their name). This will also ensure that we don't need to change the signature of StatsCounter to StatsCounter<K,V>, thus minimising the changes.

@@ -33,6 +41,7 @@ public class FileCacheStats implements Writeable, ToXContentFragment {
private final long evicted;
private final long hits;
private final long misses;
private final FullFileCacheStats fullFileCacheStats;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This approach is not extensible, in case we need to add another BlockedFileCacheStats or PinnedFileCacheStats, we would have to create multiple duplicate classes with the same fields and functionality. Let's create a base class similar to below

public class FileCacheStatsBase {
    private long used;
    private long total;
    private long evicted;
    private long hits;
    private long misses;
}

Then we can create overallCacheStats, FullFileCacheStats, BlockedFileCacheStats, PinnedFileCacheStats and have our FileCacheStats as below

public class FileCacheStats {
    private FileCacheStatsBase overallCacheStats;
    private FileCacheStatsBase FullFileCacheStats;
    private FileCacheStatsBase BlockedFileCacheStats;
    private FileCacheStatsBase PinnedFileCacheStats;

    // return the overall stats for currently existing methods such as getHits/getMisses etc
    public long getCacheHits/getCacheMisses/...() {
        overallCacheStats.hits/misses;
    }

    // similarly for all other stats
    public long getFullFileCacheStats() {
        return FullFileCacheStats;
    }

}

* Represents Stats about FullFiles in a {@link RefCountedCache}.
*/
@ExperimentalApi
public class FullFileStats {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar recommendation as provided for FileCacheStats, let's use inheritance to make things extensible here as well (Have a base class with all common fields and then use those classes together in a single class.

/**
* A non thread-safe {@link StatsCounter} implementation.
*
* @opensearch.internal
*/
public class DefaultStatsCounter<K> implements StatsCounter<K> {
public class DefaultStatsCounter<K, V> implements StatsCounter<K, V> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same suggestion here as well - have a base stats counter implementation and then have DefaultStatsCounter as below

DefaultStatsCounter {
    StatsCounter overallStatsCounter;
    StatsCounter fullFileStatsCounter;
    StatsCounter blockedFileStatsCounter;
    StatsCounter pinnedFileStatsCounter;
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement Enhancement or improvement to existing feature or request _No response_ Storage:Remote
Projects
Status: No status
Development

Successfully merging this pull request may close these issues.

[Writable Warm] More refined stats in FileCache
4 participants