Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Implementation of Short-Term In-Memory Cache in Express for Mitigating Traffic Spikes and Preventing DDoS #6395

Open
andrehrferreira opened this issue Mar 16, 2025 · 4 comments

Comments

@andrehrferreira
Copy link

andrehrferreira commented Mar 16, 2025

Good morning everyone =)

Currently, Express is widely used by applications of various sizes and purposes, serving as the foundation for numerous critical web services. However, a common challenge in high-traffic applications is the need to handle unexpected spikes in simultaneous access, which can be caused by both legitimate events (such as promotions, launches, and viral content) and DDoS attacks. While there are various strategies to address these situations, the current Express architecture requires that all requests go through routing, middleware, and handlers, potentially generating unnecessary overhead before a request is even processed.

The proposed solution is the implementation of a short-term in-memory cache, with an expiration time between 2 and 5 seconds, to store and reuse the most frequently accessed responses within this short period. This concept is already widely used in high-performance HTTP servers and complements traditional strategies such as CDNs, Redis, and other external caching layers, significantly reducing the impact of traffic spikes on an application.

This approach consists of intercepting requests before they reach the routing and middleware layer, checking whether the same response can be delivered to multiple requests within the same short timeframe. This technique proves particularly efficient for GET requests, where most responses are identical for different users over a short period, such as static HTML pages, public query APIs, and frequently accessed content.

The benefits of this strategy include:

  • Reduction in server load, preventing the need to repeatedly process identical requests.
  • Fewer unnecessary database and external service accesses, reducing latency and operational costs.
  • Minimization of the impact of traffic spikes, preventing bottlenecks and improving application stability.
  • Enhanced resilience against DDoS attacks, as cached responses reduce processing overhead.

To ensure an efficient implementation, a robust cache key generation mechanism is essential, avoiding collisions and ensuring that stored responses accurately match incoming requests. The use of MurmurHash3 for fast, low-collision hash generation, combined with URLSearchParams for request parameter normalization, has proven to be an effective approach for this scenario.

Unlike solutions relying solely on Redis or other external caching systems, this approach eliminates the latency associated with TCP-based queries, as the cache resides directly in the application’s memory. Additionally, due to the short cache expiration time, there is no risk of serving outdated information in scenarios where data changes rapidly.

Implementing this system within Express would provide applications with a native mechanism for handling massive access loads, without relying on external solutions or additional processing layers. This approach has already proven effective in various modern HTTP server scenarios and could significantly impact Express’s scalability and resilience.

Automatic Cache Cleanup and Memory Control

To ensure efficient memory management, an automatic cache cleanup system should be implemented. Since this cache is short-lived, a timed eviction mechanism can be used to remove expired cache entries, ensuring that outdated responses are not stored unnecessarily.

Additionally, a maximum memory threshold should be defined, preventing the cache from growing uncontrollably and consuming excessive system resources. When the memory limit is reached, the system should adopt a Least Recently Used (LRU) strategy to remove older or less frequently accessed cache entries, keeping only the most relevant responses available.

Why This Implementation Should Not Be an Optional Module

For this solution to be truly effective, it should not be implemented as an optional middleware, but rather be directly integrated into the core request handling layer of Express. If implemented as a middleware, Express would still need to resolve the route and execute the entire middleware stack before reaching the caching logic. This would lead to a significant performance loss, as each request would still pass through unnecessary processing steps before benefiting from the cache.

Currently, Express's router is not as efficient as other solutions like find-my-way, used in frameworks such as Fastify. Routing in Express involves iterating over registered routes and middleware, which introduces additional overhead, especially in high-throughput applications. By integrating the cache mechanism before route resolution, the server can immediately return cached responses, avoiding unnecessary routing computations.

Furthermore, the effectiveness of a caching system diminishes if the request has already passed through a large middleware stack before reaching it. The more middleware an application has, the less noticeable the performance gain will be, as Express will still need to process the request through multiple layers before determining if a cached response exists.

To maximize efficiency, this caching mechanism must be implemented at the pre-processing stage of the handler, intercepting and analyzing the raw HTTP request before Express begins routing and executing middleware. By doing so, the system can determine within microseconds whether a cached response can be returned, avoiding unnecessary computations and significantly improving response times in high-load environments.

Conclusion

The proposed implementation will significantly reduce infrastructure costs, especially in scenarios with a high volume of repeated requests. By integrating short-term in-memory caching directly into the core of Express, the framework becomes more robust and better equipped to handle large amounts of traffic on a single instance.

Additionally, this approach enhances Express's resilience against DDoS attacks and sudden traffic spikes, ensuring that frequently accessed routes do not repeatedly consume computational resources unnecessarily. By reducing the need for redundant request processing, database queries, and middleware execution, this implementation allows Express to operate with greater efficiency and scalability.

Ultimately, this improvement would make Express a stronger choice for high-performance applications, enabling it to compete more effectively with modern alternatives while maintaining its simplicity and widespread adoption.

@dpopp07
Copy link
Contributor

dpopp07 commented Mar 17, 2025

Thanks for taking the time to write this up! It's an interesting proposal. I'm inclined to think that this is out of scope for an intentionally-minimalist project like Express, and is better handled by other components in the overall architecture for a high-traffic service, but I'm curious what others think.

Implementing this system within Express would provide applications with a native mechanism for handling massive access loads, without relying on external solutions or additional processing layers.

If a service truly is handling massive loads, it probably already should be relying on external components, and response caching is something that is likely better handled by a purpose-built reverse proxy set up in front of the application. That's currently the recommendation in the Express documentation.

@andrehrferreira
Copy link
Author

andrehrferreira commented Mar 17, 2025

I fully agree that using a load balancer, such as Nginx, in front of the application is the best practice. This remains an effective solution, but in scenarios involving DDoS attacks or massive traffic spikes, even with a load balancer distributing requests, the application can still become overwhelmed.

I experienced this firsthand while using Express during Black Friday when my Google Analytics recorded traffic surges of thousands of requests per second due to a promotion by a major Brazilian YouTuber. We observed a 90/10 distribution pattern, where 90% of the requests were concentrated on just 10% of the endpoints.

In this scenario, Express was repeatedly processing the same requests, resolving routes, and executing multiple middleware layers to serve identical content to different users. For instance, if a request had to go through 10 middleware functions before reaching the controller—where it would then query Redis to retrieve a JSON response—this introduced unnecessary latency, directly impacting the application's scalability and performance.

The goal of this approach is not to replace load balancers but to complement them by preventing unnecessary calls to the router and middleware, which can be incredibly slow depending on the scenario. By optimizing how frequently these layers are executed for repetitive requests, we can significantly reduce overhead and improve response times.

@xxsuperdreamxx
Copy link

@andrehrferreira, a practical approach would be to split your Express backend into microservices and use load balancing for those services. As you mentioned, 90% of requests are concentrated on a small number of endpoints, so there's no need to distribute the entire Express backend.

When it comes to DDoS prevention, the most effective strategies operate at the server level rather than within Express itself. In our setup, we utilize four DDoS prevention systems—Cloudflare, Sucuri, FortiDDoS, and another system I cannot disclose. Each system runs on separate virtual machines to avoid adding unnecessary overhead.

For serving static or repetitive content, leveraging a reverse proxy to cache responses is a great optimization. Additionally, we've implemented strategies like storing repetitive data in Local Storage or JWTs, which has helped us reduce traffic by 40% on certain endpoints.

Using 10 middleware functions seems excessive. It might be a good time to review and optimize your code logic. It's also worth considering whether some of the libraries you're using could be replaced with native Node.js or Bun alternatives. For instance, I’ve swapped out bcrypt for the native Node scrypt module, which has worked well. Another optimization we implemented was migrating from RSA encryption to EdDSA. These small adjustments can have a significant impact, particularly since we both know that JavaScript, being inherently single-threaded, isn't the most efficient language for certain tasks.

I hope that helps!

@andrehrferreira
Copy link
Author

@xxsuperdreamxx

Hey there,

Just to clarify — the example I mentioned was purely hypothetical. In practice, my projects already have a custom HTTP server in place. That said, I’d like to share my thoughts on the topic.

First of all, I completely disagree with the unnecessary use of microservices when there’s no real justification. Valid use cases for microservices include things like consuming queues, handling heavy workloads such as payment processing, image or video manipulation, and so on. Outside of those scenarios, a well-structured monolith can be far more efficient and maintainable.

Caching layers, WAFs, Cloudflare, and NGINX still play a critical role in any solid architecture. The problem is that many developers don’t properly configure caching headers. For example, if your application relies on ETag to return a 304, even for an OPTIONS request, Express will still resolve routes, process middlewares, and generate the response body to compute the ETag. Without proper Cache-Control headers, all your efforts to leverage CDN-level caching on the API become pointless.

Furthermore, tools like Cloudflare’s WAF and DDoS protection are effective when dealing with actual attacks — but that doesn't solve the issue of legitimate high traffic reaching the application, which, while not extremely common, is still a valid scenario for public APIs.

That’s why I’d like to open a broader discussion on introducing an optional intermediate layer between Node’s native HTTP server and the router. The goal is to reduce unnecessary processing — especially considering the known performance bottleneck in the creation of req and res objects, which I’ve discussed in a separate issue #5998.

In summary, I believe this feature — even as an optional toggle — could serve as a valuable optimization strategy. It can be enabled or disabled as needed, and in many cases, it would result in noticeable performance improvements across the board.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants