Black Friday, Cyber Monday, and even Green Monday (the last Monday to save some “green” and with optimum shipping dates for Christmas) are certainly big topics of discussion this time of the year. The most common public conversation topics are about hunting for great deals, shopping for coveted holiday gifts, and the jaw-dropping amount of money that consumers spend.
Sales reports for Black Friday 2018 show that online sales were $6.2 billion, an increase of 23.6 percent from the year before, and Cyber Monday was the biggest sales day of the year in 2018, with online revenues of $7.9 billion, an increase of 19.3 percent over 2017.
But another very important, parallel conversation is also going on now, and that one is about the massive technology and data management efforts required to keep the online shopping flowing smoothly, credit card numbers secure, and both online and on-site sales processes glitch-free. While most shoppers probably never stop and think about the amount or complexity of the technology that enables their shopping, or how prices are set and purchases processed, for those of us whose livelihood depends on the data technology behind the shopping experience this is the most critical planning and implementation time of the entire year.
So if you are a merchant, or support a merchant, or are involved in anyway with e-commerce or finance, you’re probably really busy right now, and you’re keeping your cell phone nearby at all times. Now is the time to prepare your databases, the supporting infrastructure, and your applications for the heavy loads that they might endure during the shopping season.
This blog covers some of the major areas of your data technology stack that you want to consider with Black Friday approaching, pointers about how to prepare, and how your ObjectRocket team can work with you to get ready for Black Friday and Cyber Monday.
Common Steps To Prepare Your Data Stack
There are several steps that should be taken to prepare, no matter which databases you utilize. These include standard best practices for keeping your data stores running smoothly, but they have a special urgency when preparing for Black Friday and Cyber Monday.
- Prepare for large amounts of traffic: Capacity planning, growth projection, data stability, high availability, replication, failover clusters, are the primary concepts for enduring the increased traffic loads. Consider these items when preparing:
- For sharded clusters, consider adding more shards to ensure that you do not run out of space during such peak periods. Here at ObjectRocket, we typically want to keep customers who anticipate a lot of growth at around 60% usage to prevent any outages resulting from a full disk.
- With Elasticsearch clusters version 6.x and greater you’ll need to ensure that, if using smaller plan sizes, the possibility of rapid growth is considered, and especially the important goal of keeping the disk size under 95% to avoid a read-only lock. Keep in mind when capacity planning that each shard has a translog that can grow to 512MB and stay that way for up to 12 hours.
- For a smaller, simple replica set that does not use auto-scaling, consider moving to a plan size that can handle concurrency and growth based on growth projection.
- As traffic increases, so do connection requests to the database. Review with the application team members/customers to ensure that the correct connection pooling settings in place as well as the correct OS settings (such as ulimits, which allow you to display and set resource limits for users). Sometimes applications might require you to open more connections than a single replica set or the 4 default mongos (the special nodes that listen for connections on a sharded cluster) that we offer can handle. In this case, we often work with the customers to allocate more mongos to handle connections.
- Ensuring that the oplog is sized correctly is another area to look at. The oplog is a capped collection (with a fixed size) of all create, read, and delete operations, from which secondary nodes replicate data from their sync source. The secondary nodes might go out of sync if the oplog size on the node they are syncing from is small and they are not able to keep up with the operations – especially during peak periods. The bigger the oplog, the more time secondary nodes must be in sync with activities on the Primary node, even if they go down.
- Test, Tweak, and Test some more: the more you test your databases, the more know you know your databases. This is the time to run stress and performance tests, and then apply any changes, well before your customers start shopping.
- Take a look at any past hiccups/data failures, areas of weakness, and determine if there are things that need beefing up (but don’t apply big fixes right before BF).
- Perform load and performance test early in the preparation period, but also year round, and build these types of tests into your Continuous Integration process.
- Simulate traffic patterns and length of load in a test environment, if possible.
- Leverage analytics: if you’ve been looking at traffic flows and other patterns over the past many months, then you should be familiar with what’s “normal”.
- Start looking out for any performance or input/output issues that seem sub-optimal that you can resolve before the big data.
- Study your traffic patterns: look at the number of max connections at the busiest time and then plan for 20% more capacity.
- For Elasticsearch, you can use third-party metrics collectors like New Relic that will give you CPU use/GC time/Query time. All of these stats will help you plan for the big week of black Friday.
- Be sure to hold off on major repairs or upgrades too close to the big days! And, be sure to communicate this moratorium across your team.
- Look again at your security measures, know and understand them, be prepared and confident about your ability to explain them to others (should the need arise). At minimum, make sure you’re not a “soft target” for a data breach.
- Privacy regulations, similar to Security issues, are increasingly important. Consider how the GDPR requirements impact your use of of sales campaigns (and related cookies) and how you use/store customers’ data.
- Continuous improvement with an eye for the future: It’s a great idea to hold retrospective meetings (retros) after every Black Friday/Cyber Monday to identify areas that could be improved on next time!
Here are some Pro Tips for some of the databases that ObjectRocket supports (or maybe you run all three databases (polyglot persistence)!
MongoDB Pro Tips
MongoDB is an increasingly popular choice among the NoSQL databases for e-commerce, with speed and scalability being the primary benefits. Many merchants use the MongoDB Product Catalog to manage their product offerings, pricing, shipping, and inventory. Pro tips from our MongoDB experts include:
- Ensure that all three hostnames of the replica set are added in the connection string for replica sets and all 4 mongos for the case of a sharded cluster.
- Ensure you they are using a connection pool to avoid opening too many unnecessary connections to the database.
- Consider how much your traffic and read/writes might increase (your growth projection) and take advantage of MongoDB’s auto-scaling feature.
- Stay on top of performance tuning (memory usage, connections handling, health of the replica sets) both before and during the big shopping days.
Elasticsearch Pro Tips
We collected some Pro tips for preparing databases for Black Friday and Cyber Monday from our team of Elasticsearch experts. Elasticsearch is often a critical part of the e-commerce technology stack, whether it is used primarily for visualization and analysis of streaming log data or for the critical online-shopping functionality of search (with auto-completion, faceted search, and synonyms).
- 2 days before Black Friday, increase the frequency of taking snapshots. We suggest taking snapshots 2 – 3 times per day throughout the shopping weekend, and then after Cyber Monday return to the normal backup schedule. We suggest scheduling these snapshots to happen before peak traffic and then again after peak traffic – and another at a time of your choosing. Because Elasticsearch backups are done as diffs, doing snapshots 3 times per day keeps the duration of the snapshot quite short.
- If you use Elasticsearch primarily for visualizing log files:
- Plan for increased data consumption during the event.
- Ensure that Kibana dashboards are up-to-date with the appropriate settings to analyze the most critical logs and events.
- Know what your baselines are in your aggregations/dashboards so that you can identify spikes and issues. Some use cases can benefit from Elastalert integration to alert on different events.
- If you use Elasticsearch primarily for Search functionality:
- Since these will more than likely be read-heavy days, you want to:
- Increase node count to gain additional active thread pool
- Increase replica count to give you more places to read from
- Since these will more than likely be read-heavy days, you want to:
- Use Elasticsearch’s tool Rally to do benchmarking tests to see if the current Elasticsearch cluster can handle the current planned search throughput.
- Check Elasticsearch logs for any additional errors that may be thrown.
- Ensure that all applications are using all 4 connection endpoints.
- Look at current slow queries, and if you want help analyzing or improving queries, reach out to your ObjectRocket team.
Redis Pro Tips
Redis is commonly used for data that can be highly ephemeral, which of course means caching: query, sesion, and full page caches. During the heavy traffic of Black Friday and Cyber Monday, caching is key to keeping applications performing at their best. Our Redis experts suggest the following tips:
- If you’re using Redis as a session store or cache, consider resizing upwards if you think you might hit the memory or bandwidth limit with increased Black Friday traffic.
- Confirm that you’re setting a TTL (“time to live” determines the key’s expiration time) on any keys that can and should automatically expire.
- Confirm that the maxmemory-policy is set appropriately in case you run out of memory. Contact ObjectRocket support if you need this setting modified on your instance.
- For production instances, make sure you have a healthy memory fragmentation ratio prior to any spikes in traffic.
- If you are using Redis as a backend for any job queue management framework, we highly recommend to have and check a dashboard for this framework to monitor job status. Some examples of such dashboards are resque-web, arena, sideqik.
- As tempting as it may sound, do NOT run flushdb, flushall, or other large deletion operations immediately prior to campaign; instead, write a SCAN and DEL script.
- Check the COMMANDSTATS log and the SLOWLOG log for current entries and see what patterns can be optimized, so you aren’t running into blocking operations. Avoid the use of “KEYS * ” and other long-running operations.
- On a development instance (not production!), run a stress test while watching output from the MONITOR command to analyze patterns.
- On a development instance with sample data, use rdb-tools, redis-cli –bigkeys, and redis-cli –memkeys to analyze your data distribution and memory usage.
- Setup an agent that integrates monitoring of your instances into your existing stats platform. Ex: Datadog, New Relic, Prometheus.
What ObjectRocket Does to Partner with You
For many customers, the holiday season represents a large percentage of their yearly business volume. It is during this period that stakes are highest and uptime is most critical. We do a lot of work to help our customers prepare for these seasonal traffic events.
Here are just a few examples of what your ObjectRocket team does:
- Slow query analysis: We can provide advice for queries that are taking longer than optimal (for example, in Elasticsearch any query that takes longer than 1 second is suboptimal).
- Logging for slow queries needs to be enabled on the cluster – (our customers can request this) and then – once we have a day or two of data – we look at the number of queries that took longer than defined optimal seconds and then run them through the query profiler, which identifies the most expensive part of the query. We then consult with our customer and provide advice on how to decrease query time.
- Performance Tuning: This is another area where we tend to spend some time on tuning unoptimized operations for our customers, especially:
- Bad regex queries can slow your database to a crawl. Reference our blog post on some of the optimizations that can be made to regex queries.
- Queries requiring collection scans. These operations can be quite expensive as they often scan a lot of documents and return only a few (i.e. they do not use indexes).
- Operations using non-performant operators often examine a lot of documents and index keys, so where possible, avoid the use of operators such as $ne, $exists: true, $nin.
- Indexing: Having the proper indexes in place is often the most critical task, and it’s how most of our customers have survived Black Friday. We can profile the instance to make sure that the proper indexes are in place for most operations, and if additional indexes are needed, we work with you to create them.
- Load Testing & Scaling — Upon request, we can coordinate load testing to simulate any expected increases in traffic before they happen. This is imperative to ensure your application’s performance remains optimal during anticipated growth and traffic surges.
- Maintenance Moratorium — We’ve come to appreciate that our customers depend on us heavily during this time and have implemented a Maintenance Moratorium. This means a soft freeze on infrastructure maintenance work is in effect between early November and into early January. During this time only emergency maintenances that have gone through a scrupulous approval process will be allowed.
- 24x7x365 Support — The ObjectRocket Support team is staffed around the clock to meet your needs. Our team of DBAs and data technology experts are seasoned veterans of the Black Friday and Cyber Monday madness; they are here for you before, during, and after this critical period!
As always, the best way to reach ObjectRocket Support is by emailing us at firstname.lastname@example.org. Let us know how we can help, and Happy Black Friday!