mongodb

Using MongoDB Regular Expressions (Regex) without slowing your cluster down

By February 26, 2019 August 19th, 2019 No Comments

One bad regex query can impact the performance of your whole MongoDB instance. We’ll illustrate this by going through some examples of different MongoDB regex queries and how they can cause issues for your database.

What’s a regex query?

Regular expressions, or regex queries, are similar to wild card searches. Here’s a typical regex query format:

db.foo.find({"field"://}).

When you use this format, the filter will scan a whole index, which in effect is similar to a collection scan and in turn can consume up database resources.

How to avoid regex queries from scanning a whole index

One way to avoid this is by anchoring the filter. In other words, the filter starts with a ^ like the following example:

db.foo.find({"field":/^/}).

Case insensitive with no anchor

These regex searches also tend to use indexes inefficiently. Case insensitive searches use the /i option to let the regex function know to use case insensitivity in its search:

db.foo.find({"field"://i}).

Regex performance differences

To illustrate the performance differences between the various options of regex, it’s best to use examples. Let’s use a test collection called ‘foo’ that contains names and characteristics of different types of business that has the following sample data:

{ _id: 1, "name": "wigit_company","nbr_employees":100,"state":"texas" }

with an index definition on the name field

{
"v" : 1,
"key" : {
"name" : 1
},
"name" : "name_1",
"ns" : "test.foo"
}

Example 1: Normal query (with no regex)

First, let’s perform a normal query with no regex. We’ll be using the handy explain function to illustrate the query planner. As noted in my previous blog, we are interested in the portion of the explain output where it refers totalKeysExamined vs. totalDocsExamined, which indicates how many index keys it has to scan versus how many documents were scanned. The number for both should either match or be close as possible to one another. This query is relatively fast as the totalKeysExamined matches the totalDocsExamined:

db.foo.find({"name" : "new_business_test"}).explain('executionStats');

snippet results of explain plan

"executionStats" : {
"nReturned" : 1, 
"executionTimeMillis" : 2,
"totalKeysExamined" : 1, <-- scan one index key
"totalDocsExamined" : 1, <-- totalDocsExamined

Example 2: Query with regex

Our second example illustrates the difference in performance when querying with the regex function. Note that since it is a wild card search, it has to scan the whole index because it doesn’t know if there are other combinations of the field that it has to search through. The query has to scan through 3675 index keys to return one document, which is not efficient.

db.foo.find({"name" : /new_business_test/}).explain('executionStats');

snippet results of explain plan

"executionStages" : {
"stage" : "SINGLE_SHARD",
"nReturned" : 1, 
"executionTimeMillis" : 18,
"totalKeysExamined" : 3675, <-- scan this many index keys
"totalDocsExamined" : 1,    <-- totalDocsExamined

Example 3: Case-insensitive query with regex

Our third example illustrates the query with a combination of regex and case insensitive functions. Similar to example 2, it has to scan through 3675 index keys to return one document.

db.foo.find({"name" : /new_business_test/i}).explain('executionStats');

snippet results of explain plan

"nReturned" : 1, 
"executionTimeMillis" : 21,
"totalKeysExamined" : 3675, <-- scan this many index keys
"totalDocsExamined" : 1,    <-- totalDocsExamined

Example 4: Regex query with anchor

This example shows a regex query with an anchored regex. Similar to the first example, where the query is using an exact match, it is able to use the index efficiently. Note that it has to only scan two index keys to return one document.

db.foo.find({"name" : /^new_business_test/}).explain('executionStats');

snippet results of explain plan

"executionStages" : {
"stage" : "SINGLE_SHARD",
"nReturned" : 1, 
"executionTimeMillis" : 1,
"totalKeysExamined" : 2, <-- scan this many index keys
"totalDocsExamined" : 1, <-- totalDocsExamined

Example 5: Regex query with anchor + case-insensitive

This example illustrates that even if you use an anchor with a case-insensitive option, it negates the ability for the query to be able to use the index efficiently. Similar to examples 2 and 3, it has to scan a lot of index keys in order to return one document.

db.foo.find({"name" : /^new_business_test/i}).explain('executionStats');

snippet results of explain plan

"executionStats" : {
"nReturned" : 1, 
"executionTimeMillis" : 20,
"totalKeysExamined" : 3675, <-- scan this many index keys
"totalDocsExamined" : 1,    <-- totalDocsExamined

Note: Refer to your driver documentation for regular expression defaults and syntax. Also, we can cover how this compares to a text index and the $text operator as well as related considerations.

Have MongoDB optimization questions?

Interested in having our MongoDB DBAs help you with query and overall cluster optimizations? We’re here for you. Learn more about our managed MongoDB offerings. Current customers can always reach out directly to our support team.