There are lots of definitions for scaling. Scaling could be defined as removing the scales from a fish. However, with databases, scaling refers to having the ability to expand to meet additional needs around storage/disk, RAM/memory, CPUs/compute cycles, networking, or other resources.
How do you know when it’s time to scale?
Today, it’s common to see rapid growth of data and rapid adoption rates for applications. For example, when apps like Pokemon Go start to go viral and usage takes off. When this happens, you’ll outgrow your initial environment pretty quickly. That growth can happen due to physical data storage needs, performance hits, and degradation that would require more resources. Think CPU, RAM, networking, or a combination of all of those areas.
You can either proactively plan for growth or you may choose to scale when you start seeing smaller performance hits.
When you proactively plan to scale, there are at least two general patterns:
- You know that you have a big marketing push coming up where you think that you will be adding a significant number of customers and/or amount of data.
- Your application or business tend to be cyclical in nature (e.g., Christmas buying, New Years’ resolutions, etc.) where there will be a lot of activity or events that you will want to capture and keep a high volume of data.
Are you seeing small warning signs? When you start hitting bottlenecks and you expect to continue growth, you already need to be thinking about scaling.
Trouble signs to watch for include things like
- increased query times for end users
- increased login times
- requests and servers freezing up
- the dreaded “database slow” cries from developers
- slower server response times
- increased load on hosts
- out-of-memory errors
- unintended elections
- errors in the logs
When you start seeing these signs, it’s time to start scaling so you can keep up with demand and make sure you aren’t losing customers.
There are two ways to scale: Up (Vertically) or Out (Horizontally).
This is the proverbial Big Iron method: One big machine with lots of resources (CPU cores, higher CPU speeds, lots of RAM, storage).
The main benefits of vertical scaling include reduced architectural complexity and fewer hosts to maintain. This is helpful if you don’t have anyone that can handle the maintenance for you.
Today, there are many ways to scale up vertically. There are better options for commodity hardware, cheaper disks and storage, better storage options, cheaper memory, better software, and networking so you can more gracefully handle failovers and interruptions.
Scaling up works well for many applications and needs. For those we would recommend the Replica Sets discussed below. One thing to keep in mind with using larger replica sets is that there can sometimes be hidden costs to scaling vertically. If your environment continues to grow rapidly, you may have to constantly be moving to larger and larger machines or have additional resources added to your hosts until you reach a point where that is no longer an option. You should also consider that upgrade cycles are less efficient on a single larger host versus a horizontally scaled environment. With continued growth, you would have to decide whether to continue to scale up or if you feel that you may benefit from scaling horizontally.
Sharding is horizontal scaling. Sharding stores data across multiple nodes, distributing the load and the processes across the hosts. Replication is handled via Master-Slave with the ability to add additional nodes as needed.
The ‘chunks’ of data are distributed by the Balancer across the disks on the nodes.
This increases read and write capacity by distributing read and write operations across a group of machines, instead of hammering one machine with writes or with reads. Luckily, there have been great improvements in balancer function over the last few releases.
Scaling horizontally takes advantage of MongoDB’s built in sharding ability and also benefits from the ability to use cheaper commodity hardware.
When you scale horizontally, you add additional resources with physical or virtual hosts.
- Physical – lots of lower cost commodity hardware
- Virtual – add additional CPU cores or nodes via VMs or cloud
- Networking – add load balancers, additional mongoS processes, etc.
Utilizing improved load balancing technologies (hardware and software) to shuttle traffic to where it needs to go via load balancers, etc.
How is Replica Set Scaling Implemented in MongoDB?
MongoDB can scale out horizontally via single large Replica Sets using one Primary and two Secondaries with heartbeat communication for up/down state with replication occurring to the secondaries via the oplog.
Horizontal Scaling: Replica Set or Sharding?
The trade-off you make in sharding comes with some increased overall complexity. But sharding also provides the benefit of simplifying maintenance by allowing for rolling upgrades and the ability to perform certain operations such as index builds in parallel at the same time across your shard/nodes.
Here are some comparisons between using larger Replica Sets vs. Sharded Clusters:
|Lots of Reads across a wide data set (Don’t want to scatter gathers)||Lots of Writes/Updates (Want to go directly to exact shards for results)|
|Lots of data, lower activity rates||Lots of data, lots of activity|
|Need more “normal” resources – ex. just disks||Need more of all resources – Disks, RAM, CPU, write scopes|
Why Managed MongoDB from ObjectRocket?
The ObjectRocket Difference, in a word, expertise.
We have been managing MongoDB at scale from the get-go. We offered support for larger replica sets. But we were one of the first providers to offer support for larger sharded MongoDB clusters. Our engineers and DBAs have the experience and have run into many problems that other providers do not even have the chance to see.
Some of our largest customers in the marketing technology vertical (mobile analytics, media, and email campaigns, mobile advertising fraud detection, and digital media) often hit bugs that no one else will see or know how to fix. Billions of messages, thousands of campaigns, billions of documents from a variety of customers, large and small, all are hosted on our platform.
And everything is included in the price that we provide:
- The best hands-on support, hands-down. The right response 24×7.
Our aim is to provide cloud solutions which offer fully robust setups and not provision your secondaries on virtual volumes thus avoiding performance issues that may arise if an election occurs and your PRIMARY ends up on the less robust hosts provisioned only for secondaries.
That’s it for scaling. Tune in to a future blog that will cover sharding in more detail including tips on selecting the best shard keys and more.