5477 speed up distributions export by phoffer · Pull Request #5605 · rubyforgood/human-essentials

phoffer · 2026-06-18T02:49:16Z

This improves the timeout issue in #5477. The two biggest factors for duration (slowness) are number of distributions and the number of items for an organization. (distributions x organization.items = total calculations)

Overall, I think this can speed up processing up by roughly 6x, dependent upon data volumes and composition (ie associated data)

My approach was to see how data volume impacts processing time. First, calculate processing time for various volumes of distributions, and then separately calculate across various volumes of items. All performance evaluation is relative, since my computer is not a typical server, and also not memory constrained for this purpose. I have attempted to have consistent system load across test runs, and run enough to get an idea for the potential speed processing, essentially the best realistic performance possible for each variation of code. All my testing is done on a MBP M2 Pro.

All of this is heavily dependent on what real world data looks like. My testing had few items per distribution, and primarily varied distribution count and organization item count. That could be wildly off from reality, but the performance gains should still be sizable.

I have added a skipped spec, which is what I primarily used for testing for various counts of distributions. Simple tweaks allowed for testing different item counts with the same distribution amount.

A couple general findings:

processing time scales linearly as distribution count grows. This may not hold up for production systems that have other resource contention
processing time scales (mostly) linearly as organization item count grows
I have lots of data recorded, please let me know if that would be good to share as well

Summary of changes, commit by commit:

adbfe3a
Add controller spec + perf test spec, and tweak method arg to match actual usage

b41deb0
Switch to #find_each. This did not have any speed impact on my machine, but memory usage is smoothed, and it will likely have some on production systems with other load and memory constraints. This will be better for the system overall

dcf3059
Use group_by to group organization.line_items (once per distribution), rather than doing distribution.line_items.select for each organization item per distribution. This removes about 60% of original processing time per distribution.

439ff17
Use a memoized array for organization line items which are not in the distribution. This replaces the 3 entry calculation in the inner most loop with a pre-computed set of zeroes. It also causes a shift to using flat_map instead of shoveling individual values. This entire change removed approximately 60% of processing time remaining after the prior improvement.

664e292
Update the comment for future devs

Other thoughts

The idea of moving to a background job would still be good long term, but this should buy some more runway
I wanted to keep this maintainable and avoid turning this into a leetcode exercise. There are some further tweaks available to squeeze a hair more out of it, but I do not think they are meaningful enough for the maintainability cost (ie could do some clever data plucking to avoid instantiating so many line_item+item pairings, but that would get ugly quickly)
The distributions_controller#index action has a bunch of statements that are unnecessary, but seem to have negligible impact. The DistributionTotalsService query is repeated (in controller, then export service), but 1) it probably hits ActiveRecord query cache the 2nd time and 2) at least with my test dataset composition, had negligible cost as well. But with different data in production, it could be relevant

Checklist:

I have performed a self-review of my own code,
I have commented my code, particularly in hard-to-understand areas,
I have made corresponding changes to the documentation,
I have added tests that prove my fix is effective or that my feature works,
New and existing unit tests pass locally with my changes ("bundle exec rake"),
Title include "WIP" if work is in progress.
I acknowledge that I will not force push my branch once reviews have started.

-->

Resolves #5477

Description

Type of change

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
This change requires a documentation update
Documentation update

How Has This Been Tested?

Screenshots

dorner · 2026-06-19T19:31:16Z

@phoffer please see my comment on the issue - it's incredibly unlikely that a single organization has thousands of distributions in a year. It's almost definitely an issue with how we're calling the database.

phoffer added 5 commits June 16, 2026 21:53

add some new specs and testing

adbfe3a

switch to find_each

b41deb0

pre-group line_items instead of nested iteration

dcf3059

re-use array for missing line item

439ff17

update comment

664e292

phoffer changed the title ~~5477 exporting distributions timeout~~ 5477 speed up distributions export Jun 18, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

5477 speed up distributions export#5605

5477 speed up distributions export#5605
phoffer wants to merge 5 commits into
rubyforgood:mainfrom
phoffer:ph-5477-exporting-distributions-timeout

phoffer commented Jun 18, 2026 •

edited

Loading

Uh oh!

dorner commented Jun 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

phoffer commented Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overall, I think this can speed up processing up by roughly 6x, dependent upon data volumes and composition (ie associated data)

Summary of changes, commit by commit:

Other thoughts

Checklist:

Description

Type of change

How Has This Been Tested?

Screenshots

Uh oh!

dorner commented Jun 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

phoffer commented Jun 18, 2026 •

edited

Loading