5 Useful Concepts To Consider if Sequential Processing is Not an Option

Feb, 16, 2024

9 minutes

Share article

In applications grappling with a high influx of requests, the traditional sequential processing approach often emerges as a bottleneck. This can trigger resource starvation, jeopardizing the application’s functionality.

Dealing with a high input load presents a challenge without a one-size-fits-all solution. The approach varies significantly depending on the specific requirements of an application. Factors such as memory usage, processing demands, and the balance between read and write intensiveness all play a crucial role in shaping the design of our application. Here, we focus on unraveling a set of straightforward yet impactful concepts that might help you navigate and overcome such situations.

Consider an e-commerce platform where each order request triggers non-trivial business logic and is processed in the same thread it’s received. Initially, when the platform is in its infancy and receives only a handful of orders per day, processing each request sequentially, in the thread that received the request may seem sufficient.

However, as the platform gains traction and the volume of incoming orders escalates to multiple requests per second, sticking with the same design approach would spell disaster. Opening a thread per request under such circumstances would quickly overwhelm the system, leading to CPU and memory issues.

Concepts that Help with Sequential Processing

Decoupling Message Reception & Processing

Given that sequential processing of incoming requests is not suitable for our scenario, our primary strategy for performance improvement involves the separation of message reception from processing.

In terms of threads, we will have two distinct ones: the receiver thread responsible for handling message reception and the persister thread dedicated to data persistence. We will persist data to enable later asynchronous processing. This approach allows for a more nuanced scaling of resources to support these functionalities effectively.

The persister thread is solely responsible for persisting data in some storage, and it’s crucial that we persist all data in a single trip to our storage. The receiver thread is designed to steer clear of engaging in complex business logic, database interactions, or executing requests to external APIs. Involving the receiver thread in such tasks would extend message-receiving time, potentially hindering its ability to handle new requests until it completes the current one promptly. This could compromise the efficient handling of a high input load. To prevent this, keeping the receiver thread as swift as possible is essential. 

What would be the primary role of our receiver thread? Typically, it involves consuming input requests and storing them for later asynchronous processing. However, dealing with high loads presents challenges, particularly regarding memory usage. Storing data in memory isn’t feasible as it would quickly exhaust available resources. Thus, let’s consider the scenario where data needs to be saved to a database.

Databases often have connection pools with fixed capacities, restricting the number of simultaneous queries. If we were to handle database interactions within the receiver thread, we’d face limitations in the number of consumer threads we can open. So, how do we efficiently avoid this bottleneck and ensure smooth data processing?

Receiving and persisting requests
Receiving and persisting requests

The concept is straightforward: the receiver thread could enqueue messages into an internal queue. Once a message is enqueued, the receiver thread swiftly handles the subsequent incoming request. When the internal queue reaches a predefined threshold, the persister thread takes over. Using a single connection, it flushes all queued messages to the database or any designated storage in a single batch. This decoupled approach ensures that the receiver thread remains agnostic of the subsequent processing steps while efficiently handling a high volume of requests. Importantly, the persister thread won’t actively check if the queue is getting filled over the threshold but will be notified following the observer pattern once it does.

Additionally, we strive to integrate numerous concurrent consumer threads to enhance the swift processing of incoming requests. For tasks involving atomic and lightweight processing lasting just a few milliseconds, we can significantly scale up these threads. This approach ensures scalability tailored to the unique needs of your application and the resources at hand. Always retain a configurable number of receiver and processing threads, facilitating external fine-tuning to align precisely with specific requirements.



The internal queue is just one option among various data structures that can assist in buffering incoming requests before persisting. Alternatives such as lists or sets can also serve as effective buffers, facilitating the decoupling of request reception and processing.

Our buffer data structure must support concurrency, allowing receiver threads to handle incoming requests simultaneously. If the buffer consumes a significant amount of memory, consider switching to alternative storage solutions such as databases or files. However, if memory efficiency can be maintained, buffering in application memory remains the preferred option.

Consider a scenario where the buffer size is set to 1000. Without buffering, each request would necessitate opening a separate database connection, resulting in considerable resource usage. Introducing a buffer enables us to consolidate these requests, reducing the number of interactions with the database to a single instance for a batch of 1000 requests.

This approach significantly scales down the resources required to persist the input load. However, what happens when the buffer reaches its capacity? Different strategies address this issue. One option is to implement a flushing mechanism when the buffer reaches 80% (or a similar threshold) of its capacity. This threshold serves as a preemptive measure, allowing ample time to process and persist the buffered data before reaching maximum capacity. If the nature of an application allows for estimating this threshold and the rate of data persistence is faster than the rate of incoming messages, utilizing such a threshold can effectively manage the buffer’s capacity.

Alternatively, in scenarios where calculating a threshold in advance is not feasible, another approach involves temporarily suspending message reception while the buffer is cleared. By halting the consumption of new messages until the buffer is successfully emptied and data is persisted, this strategy ensures that the buffer won’t overfill. Once the buffer has been cleared and the data has been successfully persisted, message consumption can resume, maintaining the continuity of processing.

The choice of buffer design and management strategy should align closely with the specific requirements and characteristics of an application. Whether utilizing a predefined threshold or implementing temporary suspension of message reception, the goal remains the same: efficiently managing the buffer’s capacity to prevent overfilling and ensure seamless processing of incoming data.

Dissect processing into smaller components

We usually need to do more than just consume data; processing involves various actions tailored to the specific features of an application. It’s challenging to generalize due to the diverse range of functionalities involved.

Breaking down processing tasks into smaller steps offers significant benefits, especially optimizing time and resource utilization. Let’s delve deeper into our e-commerce platform example to illustrate this principle. When managing user orders, the application juggles various tasks, including order validation, inventory updates, payment processing, shipping management, and, potentially, trend analytics.

Instead of tackling each order step sequentially, adopting an imperative approach of dividing tasks into smaller, parallel operations proves advantageous. By doing so, we capitalize on time savings accumulated throughout the process. For instance, while orders are being validated, inventory management and payment processing can occur concurrently. This parallel execution accelerates the overall order processing time and maximizes resource utilization.

Furthermore, splitting the order fulfillment and analytics tasks into distinct parallel streams, ensures that processing continues seamlessly, contingent upon the successful completion of preceding operations. This divide-and-conquer strategy not only enhances efficiency but also enhances the application’s responsiveness to fluctuating workload demands.

This approach reduces overall processing time, enabling faster order processing. Additionally, in cases where some processing phase involves significant data, consider batching to optimize memory usage. By limiting memory usage per batch, you can avoid memory issues and execute batch operations more efficiently. Like our persister thread flushes data in a single database connection, batch operations eliminate overhead tasks like opening and closing connections or files, resulting in faster processing and improved performance.

Session affinity

Session affinity, often called sticky sessions, is a technique used in web application architectures to ensure that requests from the same client are consistently routed to the same server. This means that once a user establishes a session with our application, all subsequent requests from that user are directed to the same server, regardless of how many servers are available in the application’s infrastructure.

At its core, session affinity relies on two main components: client-side session management and server-side configuration. On the client side, we use techniques such as cookies or local storage to store a unique session identifier. This identifier is included in each request the user makes to our application. On the server side, we configure our load balancer or application server to recognize and honor the session identifier included in incoming requests. Instead of routing each request to any available server in a round-robin fashion, the server ensures that requests with the same session identifier are always directed to the same server.

Remember receiver and persister threads? Let’s introduce another crucial component: the processing thread. By incorporating multiple processing threads, we can effectively distribute and manage the workload among them. However, orchestrating multiple instances simultaneously necessitates synchronization mechanisms for seamless operation.

Sticky sessions play a crucial role in ensuring that data pertaining to a specific user or entity consistently reaches the same processing thread within our application architecture. This means that once a user establishes a session with our application, all subsequent requests from that user are directed to the same processing thread, regardless of how many threads are concurrently handling requests.

By ensuring that each processing thread handles a distinct set of data associated with a particular user or entity, sticky sessions facilitate concurrent execution and minimize thread blocking. This means that different threads can simultaneously process requests from different users or entities without interfering with each other’s operations. As a result, our application can handle a higher volume of requests more efficiently, leading to improved performance and responsiveness.

In conjunction with sticky sessions, the modulo operation serves as a powerful tool for achieving consistent routing of requests to specific processing threads. The modulo operation involves using the unique identifier of a monitored entity (such as a user ID or entity ID) and performing a mathematical operation with the number of parallel consumer threads in our application. Calculating the remainder of dividing the unique identifier by the number of parallel consumer threads ensures that requests with the same identifier will always yield the same remainder, thereby ensuring deterministic routing to the corresponding processing thread.

Request consistently routed to the same thread

Employing the modulo operation in conjunction with sticky sessions provides a robust mechanism for managing request routing and load distribution within our application architecture. It allows us to achieve efficient utilization of resources while maintaining data consistency and optimizing performance. By carefully implementing and fine-tuning these techniques, we can enhance the scalability, reliability, and responsiveness of our application in high-load environments.



Data accumulation involves aggregating incoming data over a specific period, thereby reducing the memory footprint and speeding up both persisting and processing. For instance, if our application aims to calculate daily totals or averages from real-time data, we can store only one record in the database for each day and accumulate incoming data in that row.

Take, for example, a scenario where a sensor sends temperature readings every second throughout the day, but our application user is only interested in the average temperature for that day. Without data accumulation, persisting each temperature reading individually would result in an excessive number of rows in our database. However, leveraging data accumulation can optimize storage and processing efficiency.

Upon receiving the first temperature reading for a given day from a sensor, we create a new row dedicated to that sensor and day in our storage. This row contains essential information such as the total number of requests received and the accumulated sum of temperature readings. Subsequent temperature readings received throughout the day are then added to the accumulated sum in the corresponding row.

By the end of the day, our storage will contain only one row per sensor, encapsulating all the necessary information to calculate the average temperature for that day. This approach reduces the memory footprint and storage requirements and simplifies data processing and analysis, ultimately leading to improved application performance and efficiency.

Accumulating data certainly offers advantages, but it’s not without its tradeoffs. One notable concern is the potential data loss due to various issues such as network failures or sender errors. When data is missed during the accumulation process, the integrity of the entire accumulation period can be compromised. It’s imperative to have a robust backup and recovery plan in place to address such scenarios effectively.

In the event of missing data, it’s essential to promptly implement a recovery plan to restore the integrity of the accumulation period. This typically involves invalidating the affected accumulation period, reimporting the missing data for the given timeframe, and then performing the accumulation process again. By proactively preparing for such eventualities, you can mitigate the risks associated with data loss and ensure the accuracy and reliability of your accumulated data.

Having a well-defined backup and recovery strategy not only safeguards against data integrity issues but also minimizes the impact of potential disruptions on your production environment. By being prepared for contingencies, you can maintain the stability and efficiency of your application while leveraging the benefits of data accumulation effectively.

Optimizing data accumulation techniques for real-time analysis in high-load applications offers numerous benefits, including reduced memory footprint, improved storage efficiency, and simplified data processing. By leveraging data accumulation strategies, developers can build more scalable and efficient applications capable of handling the demands of modern-day data-intensive environments.


In conclusion, navigating the challenges of high-load applications demands a multifaceted strategy encompassing various techniques such as decoupling message reception from processing, concurrency optimization, efficient buffering, session affinity, and data accumulation.

While we’ve discussed several key concepts in this blog, it’s essential to recognize that the landscape of performance optimization is vast, and there may be other strategies and methodologies that could further enhance application efficiency. By remaining open to exploring new approaches and continuously refining our techniques, we can adapt and evolve our applications to meet the demands of even the most demanding workloads.

Lastly, it’s important to note that the majority of the discussed topics were related to a single-node application instance. However, an additional improvement can be achieved by introducing multiple active nodes, effectively distributing the load among themselves. This distributed approach enhances performance and improves scalability, resilience, and fault tolerance, ensuring robust operation even under extreme conditions.

If you want to learn more about how you can get there, take a look at our other blog posts.