Scaling with MongoDB: Setting up a sharding infrastructure

In a recent blog post, I discussed when you need to scale MongoDB. In this post, the focus is on how to scale MongoDB.

MongoDB version 3.0 introduced WiredTiger as the default storage engine. MongoDB has since been able to provide two approaches when it comes to scalability. Mongo is now able to expand vertically as well as horizontally. Both approaches warrant a closer look.

Scaling Vertically

Scaling vertically increases computing resources, such as the number and type of CPUs, or the amount of RAM or disk space. With this in mind, when scaling vertically with WiredTiger, you need to identify which resource is contributing to any bottleneck (whether it be CPU/RAM/ disk space or a some combination thereof).

Note on WiredTiger RAM
If the allocated RAM is less than 1GB, WiredTiger typically defaults to either 256MB or 50 percent of the available RAM. As a result, the WiredTiger cache must be resized if more RAM is allocated.

Scaling Horizontally

Scaling horizontally distributes computing resources across a network, typically by adding servers. To distribute data across multiple nodes as they’re added, MongoDB utilizes a database architecture called sharding, which uses key ranges to partition data that is distributed among multiple database instances.

Setting Up the Sharding Infrastructure

The physical components involved in setting up a sharding configuration requires the following items.

  • Mongos functions as the query router.
  • Config servers hold the sharding metadata.
  • Data nodes hold the actual data.

For more information about the physical components, visit our MongoDB Overview.

Defining Shard Keys

After you set up the physical sharding infrastructure, focus on the logical aspect of sharding. Shard keys represent the various fields within a collection that MongoDB uses to partition data. Conveniently, Mongo lets you define these keys.

To identify the fields that involve implementing shard keys:

  1. Identify the collections to shard. Collections that are larger than 200MB and whose data can be distributed evenly represent good candidates for sharding.
  2. Generate or devise an appropriate shard key. When creating a shard key, consider the following recommendations and questions:
  • Gain some insight into the manner in which the application interacts with the database.
  • Consider whether the application is more read- or write-heavy, or whether it’s balanced evenly.
  • What is the most important activity against the database? For example, the application might write large amounts of data to the database, but the most important activity might involve queries returning data in less than 100 ms.
  • What are the expected weekly and monthly patterns of data growth?
  • Do any pain or problem areas need to be addressed, such as slow queries?
  • Is the application busier during certain hours of the day, week, month, or year? Is it busy all the time?

After you investigate these issues, you can begin a more detailed analysis. Our next blog post will discuss how to find the right shard key.

Exit mobile version