Appboy is a relationship management platform for publishers with mobile apps. We help our customers identify their users and how they are interacting with their mobile apps, and based on behavior and demographics, we provide the means to target an audience with multi-channel messaging through push notifications, emails and a multitude of in-app messaging options, including the industry’s first news feed.
To achieve such specific targeting, we built a powerful analytics engine using MongoDB to store our data. The Appboy platform collections billions of data points each month from our varied customers including photo sharing apps, games, text messaging apps, digital magazines and more. MongoDB is used as our primary data store and houses almost all of our pre-aggregated analytic data. MongoDB’s flexible data store easily keeps track of time series data across dimensions, and ObjectRocket has proven to be a great database provider as we’ve grown to track billions of data points each month.
To back into our schema, when we were building our analytics dashboard, we started thinking about the types of graphs that we wanted to display and how we could generate and display them to our customers blazingly fast. A basic example is tracking the number of app opens over time. At the most granular level, Appboy shows this data hourly, so that’s the level at which we can pre-aggregate to get the fastest results.
The simplest way we could approach this is to have a collection called “appopensby_hour” and have each document look something like this:
When an app open occurs, simply do an upsert on the {app_id, date, hour} pair with an $inc on the opens field. With this approach, you keep the documents small, the collection count low (24 documents per day per app), and store the data in a way that can be further rolled up using the aggregation framework. The increment operations will also be fast because we can expect the document to be in the working set every time we want to $inc it. However, you probably would not ever look at a single hour by itself, but rather in the context of a day. With that in mind, MongoDB’s data modeling principles lead us to look to embedding documents and having a single document by day. The revised schema is:
Now, when an app open occurs, you upsert the document based on the {app_id, date} tuple to $inc the total opens and the hour. With one document per day, pulling the distribution of app opens per day is extremely easy, and even pulling it for a month means only looking at most 31 documents! Keeping the total opens per day on the document makes it easy to aggregate open counts across longer spans of time.
This works all well and good, but what about when we want to show app opens over time based on some dimension? Or compare how iPods are doing against iPhones? Here is where we can use MongoDB’s flexible schema and embedded documents to track hourly data. Say we want to track the number of app opens by device type (iPad Air, iPhone 4, iPhone 5S, iPod Touch, etc.). Since we don’t have to declare fields ahead of time with MongoDB, our application layer can programmatically $inc fields based on the values in each dimension. Take a look at this schema, which can be easily generated dynamically:
This schema is easily extensible if you want to add other dimensions. Appboy uses similar schemas in certain places in our product. Though, to be clear, pre-aggregated analytics are great for a fast lookup of series data, but be sure that you store the raw data as well. Doing so allows you to perform an arbitrary analysis of your data (whereas pre-aggregated data requires that you knew the question ahead of time), and also gives you user-level attribution.
App opens are conceptually no different than some other event which happens inside the app. Therefore, you can make the schema slightly more generic and create time series data on any event. At Appboy’s scale, tracking billions of data points each month, that means making many billions of writes each month.
We moved to ObjectRocket from another database provider because we liked the shard-first strategy they promote. Before ObjectRocket, we had been scaling up vertically as we grew, but sharding lets us add more servers horizontally as we need to accommodate more reads and writes. When working with MongoDB, I can’t recommend enough to shard from the beginning! It took us about a month of development efforts to rewrite parts of our application to support the restrictions sharding places on your application. Choosing a shard key is extremely important, and ObjectRocket jumped on the phone with us multiple times to discuss and suggest shard keys for each collection in our application.
Scaling by adding another shard, instead of vertically scaling, improves our cost-per-unit of scaling, making ObjectRocket more cost-effective and predictable from the get-go. ObjectRocket’s sharding management is simple and powerful, with a level of sophistication we couldn’t find elsewhere. And in terms of value, ObjectRocket is far better than the competitors: each shard has 2 secondary servers instead of just a secondary and an arbiter, and backups are included.
Running MongoDB with ObjectRocket gives us the performance, value and consistency our customers demand, making it a trusted choice for Appboy.
I’ll be presenting at the NYC MongoDB meetup at the eBay offices on 11/19, discussing mobile app analytics and delving deeper into Appboy’s use of MongoDB’s flexible schemas and statistical analysis on top of MongoDB. Shoot me a line at @jon_hyman or jon@appboy.com if you have any questions, or let me know how you’re using MongoDB for analytics!