Tips for getting ahead of usage spikes in shared MongoDB clusters

By July 30, 2018 September 3rd, 2019 No Comments

Usage spikes happen. MongoDB workloads can encounter spikes for a number of reasons:

  • Seasonal patterns
  • New features
  • New customer growth
  • Big marketing events

We recently covered how to tell when it’s time to scale MongoDB. In every usage spike scenario, it’s important to be prepared to gracefully handle the increase in operations, connections, and storage. Let’s cover some considerations when preparing for usage spikes.

Scale Out in Advance

For sharded collections with proper shard keys and targeted operations, scaling out can achieve near linear scaling—the recommended way to scale MongoDB. This is because you increase throughput while maintaining High Availability. When adding shards, you are scaling both your write throughput and your read throughout versus adding additional secondary members, which will only scale read performance when all members are available. For those running MongoDB components in an environment where the same virtual machine or container can be deployed on the same physical hardware, it’s important to distribute the components across different physical devices as much as possible. Distribution prevents components from exhausting shared resources such as storage and networking.

Understand your limits

If you know the limits and the latency threshold of your current cluster and workload, you can calculate how many additional shards you’ll need.

For example, if you expect a 50% increase in traffic and you can currently handle 20,000 requests per second while maintaining less than or equal to 10 milliseconds per request with 10 shards, you know you will need an additional five shards to handle the expected increase while maintaining the same query latency. Plan accordingly to allow adequate time for balancing chunks between shards. Starting in MongoDB 3.4, multiple chunk migrations can run in parallel which can significantly decrease the balancing time.

Scaling out is not limited to your data nodes. The additional workload will also put more demand on the mongos tier, especially when scaling up your application tier for the added traffic. The mongos layer will have to handle more client connections, more shard connections, process more operations, and return more results per second. In some cases, the mongos can become a bottleneck and add latency to your requests when concurrency increases.

Add additional mongos servers in advance

More mongos servers can reduce concurrency for an individual mongos and reduce the resources consumed by each mongos server. Also, you can choose to deploy a subset of your mongos servers to an application server. Selecting and using a subset prevents the driver from establishing connections to all mongos servers, in turn, keeping connection counts lower per mongos server.

Scale Up in Advance

Even when scaling your cluster out, you may also need to scale up. The increased workload and added components increase overall work per process and resources consumed by the process. The processes and the storage engine also have internal data structures and threads that are responsible for specific actions, all of which require additional CPU and memory when the workload increases. Adding other components will also increase connection counts. The mongod process will now have more connections between each shard in addition to the connections established by the mongos. It’s important to remember each connection when active can use up to one megabyte of memory.

An alternative to scaling out reads is scaling up reads by adding additional secondaries. In MongoDB, you can configure a read preference for your operations. This allows them to utilize secondary members of your cluster to distribute reads. Read preferences that make use of secondaries is fine for reads that are okay with eventual consistency. In other words, they don’t need to read data that was just written to the primary. It is essential to provision enough secondaries so that if one is down for planned or unplanned maintenance, your remaining secondaries can handle the workload.

Data Balance

Balanced data is critical to balancing operations. For sharded collections, you will want to be sure that the output from sharding status shows the balance of each sharded collection is inline with thresholds defined by MongoDB documentation. Balanced data ensures you will scale your operations for the upcoming usage spike.

Achieving Balance

If the balance of a collection falls outside of the defined thresholds, you need to investigate why this is happening.

  1. Once you confirm the balancer is enabled, check your maxSize. A shard’s maxSize is a user-defined limit, measured in megabytes, of how much data a shard should contain. If a shard’s data files exceed this size, the balancer will no longer select this shard as the destination for chunk migrations.
  2. Confirm your balancing window is large enough to balance the new chunks that were created outside of the balancing window.
  3. Evaluate your chosen shard key. A poorly chosen shard key can lead to hot spotting which will also lead to data imbalances. This means new data is created on one shard or a subset of shards and then has to be migrated to other shards at a later time.

Unsharded Data

It’s not common to mix sharded and unsharded collections in the same database. When possible, shard any collection that is large enough to distribute operations. When sharded, the collection or collections will also see an increase in operations during the spike. In MongoDB, each database has a primary shard. The primary shard is responsible for every request for unsharded collections. When the primary shard is overwhelmed with operations that can’t be distributed to other shards in the cluster, it can lead to query latency. The same holds true for applications that have multiple databases using the same primary shard. Using the movePrimary command distributes databases and their unsharded collections to different shards. If you use the movePrimary command, it’s critical to review the documentation. This command often needs a maintenance window to prevent data loss.

Unused Data and Indexing

For read-intensive spikes, it’s necessary to:

  1. Remove any unneeded documents, and
  2. Have proper indexing in place.

Overtime collections can grow quickly. These documents are great candidates for purging. They can be the result of an application bug, data that is no longer needed, or data from testing.

To prevent this data from creating unnecessarily large indexes, getting paged from disk, or uncompressed in memory, remove it from your collections in advance. If a significant number of documents are removed, it’s important to initial sync all members to rebuild the data files and remove any fragmentation for both WiredTiger and MMAPv1.

Unused indexes can also add overhead for writes (as index entries are also written) and for reads (if the optimizer had to evaluate them as a potential plan). For more information on investigating unused indexes, read our blog about how to find unused indexes in MongoDB.



We’ve covered balancing a bit already, but it’s important to reiterate some points:

  1. Good shard keys prevent unneeded balancing
  2. Give adequate time to balancing existing and newly sharded collections
  3. Set a window to prevent balancing during peak times


Depending on the specifics of your implementation, scheduling backups can be very important. For backups that rely on mongodump, the data will be fetched from disk, uncompressed, and cached in the WiredTiger cache. Fetching the entire collection into cache will cause your hot set to be evicted from cache and may cause unexpected cache utilization. This forces WiredTiger to spend time evicting pages instead of processing operations. When possible, use snapshots, secondaries, or even hidden secondaries to minimize the potential impact of mongodump.

Queue and Caching Systems

Lastly, implementing a caching system and queue system can help manage a usage spike. Each method serves a different purpose during a spike in traffic.

A caching system prevents unnecessary and redundant calls to the database. Caching helps avoid degradation in performance because caching systems often outperform databases for static data.

A queue system allows you to throttle the rate of requests against the database, and a queue approach will enable the application to process requests as expected but not overwhelm the database with requests.

The Best Way to Get Ahead of Usage Spikes? Database as a Service

Want one surefire way to get ahead of database usage spikes? Try a Database as a Service vendor, like ObjectRocket. We take care of your databases so you can focus on development and your business. From schema design to query optimization, our expertise is always included. Give ObjectRocket a try today.