Ingesters stopped triggering tsdb compaction #6672
Replies: 3 comments
-
If you don't get any samples, it makes sense there is nothing to compact. It's been a while since I ran consul, but there is a metric called "consul_raft_leader"... if you are getting prometheus metrics from consul. That metric tells you if the consul cluster has a leader. If after the crash, consul had no leader, it makes sense ingesters were unhealthy. That's a possible explanation of what happened. Consider switching to memberlist, consul was a pain point for me until I stopped using it. Historically consul was never recommended to be run as a cluster for cortex, because clustering can have these issues, where it loses the leader. |
Beta Was this translation helpful? Give feedback.
-
Thank you @friedrichg for the reply. I can confirm the ingesters were getting the samples, yes, I think it is better for us to move to memberlist. |
Beta Was this translation helpful? Give feedback.
-
Thank you for your reply. after the investigation we found, CompactionLoop: cortex/pkg/ingester/ingester.go Line 2297 in 21e8366 Or CompactBlock We know it never reached to cortex/pkg/ingester/ingester.go Line 2346 in 21e8366 Unfortunately it never reached for increasing trigger. As of now we are making sure consul stability. and we ll have plan to migrate to memberlist soon. Slack Thread: |
Beta Was this translation helpful? Give feedback.
-
Describe the bug
Ingesters stopped triggering tsdb compactions causing the OOM issue and data loss because of no push to remote storage (google cloud storage)
To Reproduce
Expected behavior
Environment:
Additional Context
Server logs of consul
Beta Was this translation helpful? Give feedback.
All reactions