Skip to content

Code refactoring for history.js#69

Open
Mehulantony wants to merge 4 commits intomainfrom
history-refactor
Open

Code refactoring for history.js#69
Mehulantony wants to merge 4 commits intomainfrom
history-refactor

Conversation

@Mehulantony
Copy link
Copy Markdown
Collaborator

Refactored history.js to eliminate ~70 lines of duplicated code by extracting common logic into a shared getVersionHistory() helper function. The since() and history() operations now delegate to this shared base, reducing maintenance burden and improving code clarity.

Comment thread controllers/history.js
Copy link
Copy Markdown
Collaborator

@thehabes thehabes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am unable to start the app locally. I think you need to remove express-rate-limit from this PR.

Your instincts are correct. This is an expensive operation. However, introducing rate limiting is out of scope for this. That change also happens to be a breaking change that prevents us from starting the app even after doing a fresh npm install. We have to debug what package is missing and install it manually for the app to run.

Even worse, express-rate-limit will need to be adjusted for the fact that this runs in cluster mode, meaning it has multiple workers to manage rate limiting over at the same time which will actually require something like Redis. RERUM API is not intended to run single-threaded.

For your records you will see how hard the static reviewer flagged this in the report below.

Static Review Comments

Review Date: 2026-05-04
Reviewer: Pair Static Review - Claude & @thehabes

Claude and Bryan make mistakes. Verify all issues and suggestions. Avoid unnecessary scope creep.

Category Issues Found
🔴 Critical 2
🟠 Major 4
🟡 Minor 2
🔵 Suggestions 0

PR scope note: The PR title and description describe a refactor of controllers/history.js (extracting getVersionHistory() from since/history). The refactor itself is clean and idiomatic. However, the PR also bundles a new rate-limiter feature touching routes/history.js, routes/since.js, and both test files — that is a separate concern and is the source of almost every issue below.


Critical Issues 🔴

Issue 1: New dependency express-rate-limit is imported but never added to package.json

Files:

  • routes/history.js:2
  • routes/since.js:2
  • routes/__tests__/history.test.js:6
  • routes/__tests__/since.test.js:6

Category: Breaking Change / Runtime Error

Problem:
Four files now import rateLimit from 'express-rate-limit', but package.json (and package-lock.json) have not been updated to declare the dependency. On a fresh npm install, the package is not present in node_modules, and Node's ESM loader throws at import time. This is reproducible right now under pm2:

Error [ERR_MODULE_NOT_FOUND]: Cannot find package 'express-rate-limit' imported from /mnt/e/oss-rerum-server/routes/since.js
    at Object.getPackageJSONURL (node:internal/modules/package_json_reader:301:9)
    ...

The result is that pm2 start -i max bin/rerum_v1.js (the documented production startup) fails to bring up workers — the app cannot deploy as-is. This is a hard deploy-blocker.

Current Code (routes/history.js):

import express from 'express'
import rateLimit from 'express-rate-limit'

Suggested Fix:
Remove express-rate-limit

npm uninstall express-rate-limit

Refactor as necessary. If added to package.json the following issues should be addressed.


Issue 2: Rate limit is per-process, but production runs 4 or more pm2 cluster workers

Files: routes/history.js:8-13, routes/since.js:8-13

Category: Logic Error / Configuration

Problem:
express-rate-limit defaults to an in-memory store. Each pm2 worker keeps its own counter, so under pm2 start -i max (4 workers per the project CLAUDE.md), the effective limit is max × workers per IP — i.e., 400 requests/minute, not the documented/intended 100. The actual ceiling a client hits depends on which worker the load balancer routes them to, making the limit non-deterministic and easy to bypass.

If 100/min/IP is genuinely the intent, a shared store is required (Redis, Memcached, or rate-limit-mongo — which would reuse your existing MongoDB Atlas).

Current Code:

const historyLimiter = rateLimit({
  windowMs: 60 * 1000,
  max: 100,
  standardHeaders: true,
  legacyHeaders: false
})

Suggested Fix (one option — shared MongoDB store):

import rateLimit from 'express-rate-limit'
import MongoStore from 'rate-limit-mongo'

const historyLimiter = rateLimit({
  windowMs: 60 * 1000,
  max: 100,
  standardHeaders: true,
  legacyHeaders: false,
  store: new MongoStore({
    uri: process.env.MONGO_CONNECTION_STRING,
    collectionName: 'rate-limits',
    expireTimeMs: 60 * 1000
  })
})

If a per-worker limit is acceptable (e.g., the intent is just to absorb obvious abuse and you're fine with max × workers), document it explicitly in code and bump max down accordingly (e.g., max: 25 so the cluster total ≈ 100).

How to Verify:

  1. Start with pm2 start -i 4 bin/rerum_v1.js.
  2. Hammer /v1/history/:id from one IP at >100 req/min and observe whether the 101st is throttled or not.
  3. With the in-memory default, you will be able to exceed 100/min until the cluster total is reached.

Major Issues 🟠

Issue 3: No trust proxy configuration — every request keys to the proxy IP

Files: routes/history.js:8-13, routes/since.js:8-13, app.js

Category: Security / Logic Error

Problem:
express-rate-limit keys by req.ip. Without app.set('trust proxy', …) configured on the Express app, when the API runs behind a load balancer/reverse proxy (typical for the production RHEL VM), every request appears to originate from the proxy's IP. That means all users globally share a single counter, so a single abusive client can lock everyone else out, or a high-traffic legitimate origin can self-DoS.

express-rate-limit v7+ will warn loudly at runtime if it detects this misconfiguration, but it won't fix it. This needs to be set per the actual deployment topology, not blindly to true (which is itself a spoofable misconfiguration).

Suggested Fix (in app.js, near top):

// Adjust to match your deployment: e.g., 1 if behind a single trusted reverse proxy
app.set('trust proxy', 1)

Document the chosen value and why. If the production server terminates TLS itself with no proxy in front, omit this and document that too.

How to Verify:

  1. Behind a proxy, req.ip should resolve to the original client IP, not the proxy. Add a temporary console.log(req.ip) in a route and curl from two different external IPs to confirm.

Issue 4: Limiter config duplicated in 4 places — drift is inevitable

Files: routes/history.js, routes/since.js, routes/__tests__/history.test.js, routes/__tests__/since.test.js

Category: Code Quality / Maintainability

Problem:
The same rateLimit({ windowMs: 60 * 1000, max: 100, standardHeaders: true, legacyHeaders: false }) block is copy-pasted four times. If the team later tunes max for production, the tests will silently keep their own copy and remain green even if production behavior diverges. The tests are not testing the production limiter — they are testing a re-instantiated lookalike, which gives false assurance.

Suggested Fix:
Extract a single shared module and import it everywhere:

// middleware/rate-limit.js
import rateLimit from 'express-rate-limit'

export const versionHistoryLimiter = rateLimit({
  windowMs: Number(process.env.RATE_LIMIT_WINDOW_MS ?? 60_000),
  max: Number(process.env.RATE_LIMIT_MAX ?? 100),
  standardHeaders: true,
  legacyHeaders: false
})

Then in routes/history.js, routes/since.js, and both test files:

import { versionHistoryLimiter } from '../middleware/rate-limit.js'
...
router.route('/:_id')
    .get(versionHistoryLimiter, controller.history)
    .head(versionHistoryLimiter, controller.historyHeadRequest)

This also makes it trivial to swap in the shared store (Issue 2).

How to Verify:

  1. Change RATE_LIMIT_MAX in .env and confirm both tests and production routes pick up the new value.

Issue 5: Rate limit applied to HEAD requests is unusual and may break caching clients

Files: routes/history.js:14-16, routes/since.js:14-16

Category: Logic / API Design

Problem:
Both .get(...) and .head(...) now go through the same limiter sharing the same counter. HEAD is RFC-defined as a no-body alternative to GET — clients often issue HEAD for cache validation, ETag checks, or to test reachability. Counting HEAD against the same 100/min budget as GET means an aggressive cache-validating client (or monitoring tool issuing periodic HEADs) eats into the GET budget unexpectedly.

Common patterns:

  • Skip HEAD entirely for limits, OR
  • Give HEAD its own (more permissive) limiter, OR
  • Configure skip on the limiter for HEAD methods.

Suggested Fix:

const historyLimiter = rateLimit({
  windowMs: 60 * 1000,
  max: 100,
  standardHeaders: true,
  legacyHeaders: false,
  skip: req => req.method === 'HEAD'
})

Or simply do not pass the limiter to the HEAD handler:

router.route('/:_id')
    .get(historyLimiter, controller.history)
    .head(controller.historyHeadRequest)

How to Verify:

  1. Issue 100 GETs in a minute, then a HEAD — confirm HEAD succeeds (returns 200) when the chosen behavior is "skip HEAD."

Issue 6: Rate-limit error response breaks the API's standard JSON error envelope

Files: routes/history.js, routes/since.js

Category: API Consistency

Problem:
The rest of this API funnels errors through rest.js/createExpressError, returning structured JSON with message and status. express-rate-limit defaults to res.status(429).send('Too many requests, please try again later.') — plain text. Clients parsing JSON on every error path will choke or silently miss the rate-limit case.

Suggested Fix:

const historyLimiter = rateLimit({
  windowMs: 60 * 1000,
  max: 100,
  standardHeaders: true,
  legacyHeaders: false,
  handler: (req, res, next, options) => {
    next(createExpressError({
      message: 'Rate limit exceeded for /history. Please retry after a minute.',
      status: options.statusCode // 429
    }))
  }
})

This routes the 429 through the same messenger middleware as every other error.

How to Verify:

  1. Force a 429 (e.g., set max: 1 locally and curl twice).
  2. Confirm the second response body is JSON with message and status, matching the format documented at /v1/API.html.

Minor Issues 🟡

Issue 7: New getVersionHistory helper has no JSDoc

File: controllers/history.js:157

Category: Documentation

Problem:
The new helper carries the meat of the logic for both since and history, but it's the only function in the file without a JSDoc block. The project CLAUDE.md asks for JSDoc on functions you touch.

Current Code:

const getVersionHistory = async (req, res, next, versionGetter) => {

Suggested Fix:

/**
 * Shared resolver for /history and /since version queries.
 * Looks up the keyed object (by _id or slug), gathers all related versions,
 * filters them via the supplied versionGetter (ancestors or descendants),
 * applies idNegotiation, and writes the JSON-LD response.
 *
 * @param {express.Request} req
 * @param {express.Response} res
 * @param {express.NextFunction} next
 * @param {(all: object[], obj: object, acc: object[]) => object[]} versionGetter
 *        Either getAllAncestors (for /history) or getAllDescendants (for /since).
 */
const getVersionHistory = async (req, res, next, versionGetter) => {

Issue 8: Magic numbers for windowMs/max are not configurable

Files: routes/history.js:9-10, routes/since.js:9-10

Category: Code Quality

Problem:
60 * 1000 and 100 are hardcoded. Production load and dev/sandbox load are very different — TinyThings prototyping might want a much higher cap, while production behind public traffic might want much lower. Reading from process.env (with sensible defaults) lets each environment tune via .env without code changes.

See suggested fix in Issue 4 — it folds this in.


Suggestions 🔵

None.


If there are significant code changes in response to this review please test those changes. Run the application manually and test or perform internal application tests when applicable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants