Skip to content

Cleaning nodestore_node table #1808

Open
@aminvakil

Description

@aminvakil

Problem Statement

Over time nodestore_node table gets bigger and bigger and currently there is no procedure to clean it up.

A forum comment explaining what nodestore_node is by @untitaker :

To answer the question about purpose of nodestore:

nodestore stores the raw, unindexed parts of the data of an event. For example all event breadcrumbs and contexts are stored there. Clickhouse only stores the data you can search and aggregate by. It only has indices, so to speak.

nodestore has multiple backends. On sentry.io it is not postgres, but google bigtable. We use the postgres backend for local development and for onpremise, but for large-scale installations it is suboptimal.

We don’t have a solution right now for the storage problem in any case, sorry. One thing you should ensure actually runs is sentry cleanup, but it’s definitely possible that nodestore takes this amount of disk space.

https://forum.sentry.io/t/postgres-nodestore-node-table-124gb/12753/3

Solution Brainstorm

There was an idea suggested on forum which worked for me, but I lost all event details.
Something like this would work:

DELETE FROM public.nodestore_node WHERE “timestamp” < NOW() - INTERVAL '1 day';
VACUUM FULL public.nodestore_node;

Change 1 day according to your needs.

Maybe put this in a cron container which gets run every night, we should think about its performance issues though, this took a long time to get executed on our instance, maybe because it wasn't run before, but I'm not sure.

Metadata

Metadata

Assignees

Projects

Status

No status

Status

No status

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions