Skip to content

Add sizing guideline for ballast files #6754

Open
@florence-crl

Description

@florence-crl

Florence Morris (florence-crl) commented:

from @knz
there's a formula that's easier to understand than to explain. The idea is to combine two things.

  1. how fast their data grows over time. To know this they should use metrics/monitoring and plot their storage growth over days/weeks/months. They also need to understand their storage spikes (e..g Bulk I/O events and the necessary disk space for them)
  2. how fast they are able to react to a "low storage" condition, e.g by adding nodes or more disk space. Some businesses can react within 1 day, others need 2 weeks to work on it.

Once they know these two things, they need to choose a ballast that covers the amount of disk space growing (1) during their reaction period (2).

Examples:

  • They generate 1GB per week, and they need 2 weeks turnaround to grow their disk space, they need 2GB ballast.
  • They generate only 100MB per week, but they perform a bulk i/o event that needs 2GB every day, and they can only react to disk shortage within 2 days, then they probably need 2-3GB ballasts.

One layer of complexity is that the intermediate state of the growth can appear larger than the long-term state, because of RocksDB compactions. For example if they create a lot of data quickly, there will be more disk usage than what they have put in their SQL, until RocksDB compacts it.

Another layer is MVCC: if they delete data, the data is still around until it is GC'ed (zone config, default 25 hours). So if their workload is delete-heavy they need to consider that.

Both things can be reliably ignored if their disk usage evolves slowly (which is common) and they can monitor it at a high level (e.g. our capacity metric in the UI, or if they do their own export using prometheus)

from @jseldess
An addition is that we need to strongly recommend that they put alerts in place to notify them of “low storage” conditions so they can set their process in place. For example, when a node is running low on disk space and using prometheus metrics.
Ideally, a customer shouldn’t get to the point where they need to use a ballast file.

cc: @Annebirzin @piyush-singh since this ties into observability and alerting

Jira Issue: DOC-453

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions