Indexes

A Btree Example, http://commons.wikimedia.org/wiki/File:Btree.png

Indexes are the root to high performance in MongoDB, as it allows the database to search through less documents to satisfy a query. Without an index MongoDB has to scan through all of the documents to ensure it has answered the query correctly.

An index increases the amount of storage needed to represent a document and the time it takes to insert a document, trading it against faster search time for the terms in the document indexed.

One of the core issues to remember about indexes is that they are inclusive. That means they can only answer questions about documents that have been included in the index.

Index Gotchas

$nin and $ne queries that cannot be answered by indexes and force collection scans. If you need to use these ensure you are filtering down using indexes as much as possible leaving the $nin and $ne terms to the very last part of the query selector.

MongoDB have several types of indexes

Single field indexes
Compound indexes
Multikey indexes
Geo-spatial indexes
Text indexes

It also supports a couple of variations of the above indexes

Sparse indexes
Unique indexes
Time To Live indexes
Covered Indexes

Single field indexes

Take the following document

{
  _id: ObjectId("523cba3c73a8049bcdbf6007"),
  name: 'Peter Jackson',
  age: 50,
  nationality: "New Zealand",
  address: {
    street: "Some Street 22"
  },
  department: {
    floor: 1,
    building: 1
  }
}

Let’s look at the different ways we can apply a single field index

var values = db.getSisterDB("indexes").values;
values.ensureIndex({name: 1});

This indexes the name field in ascending order.

var values = db.getSisterDB("indexes").values;
values.ensureIndex({"address.street": 1});

This indexes the street field in the embedded document under the address field.

var values = db.getSisterDB("indexes").values;
values.ensureIndex({department: 1});

This indexes the department subdocument allowing for strict equality matches on the subdocument. That is to say it will only match on the query for a subdocument that contains all the fields in the indexed subdocument.

var values = db.getSisterDB("indexes").values;
values.findOne({department: {floor: 1, building: 1}});

Compound indexes

A compound index is an index that contains references to multiple fields within a document.

var values = db.getSisterDB("indexes").values;
values.ensureIndex({nationality: 1, age: -1, name: 1});

The compound indexes have some interesting properties. Obviously the index is usable if you have a query that includes nationality, age and name. But it’s also able to answer other queries using the index.

Any query starting with nationality
Any query starting with nationality and age
Any query starting with nationality, age and name
Any query starting with nationality and name

The reason is that for compound indexes order matters as we match from left to right. F.ex if you reverse a query to start with name and age it will not match the order of fields in the compound index and MongoDB is not able to use the index.

Compound Index Field Order

Always make sure the order of fields in a compound index match the order of fields in the queries you want to execute against the collection.

One additional note about using a compound index is about sorting. The ordering and direction of fields in a compound index decide if it’s possible to use the index in the query as well as for the sort.

Given the index above of {nationality: 1, age: -1, name: 1} we can support the following sorts using the index.

var values = db.getSisterDB("indexes").values;
values.find().sort({nationality: 1, age: -1}).toArray();
values.find().sort({nationality: -1, age: 1}).toArray();
values.find().sort({nationality: -1, age: 1, name: -1}).toArray();
values.find().sort({nationality: 1, age: -1, name: 1}).toArray();

Sort can use the index if they match the order specified or the exact reverse order specified but not otherwise.

Multikey indexes

Multikey indexes lets MongoDB index arrays of values. Take the following document.

{
  "title": "Superman",
  "tags": ["comic", "action", "xray"],
  "issues": [
    {
      "number": 1,
      "published_on": "June 1938"
    }
  ]
}

Multikey indexes lets us search on the values in the tags array as well as in the issues array. Let’s create two indexes to cover both.

var comics = db.getSisterDB("store").comics;
comics.ensureIndex({tags: 1});
comics.ensureIndex({issues: 1});

The two indexes lets you do exact matches on values in the tags and issues arrays of values.

var comics = db.getSisterDB("store").comics;
comics.find({tags: "action"});
comics.find({issues: {number: 1, published_on: "June 1938"}}).toArray();

The first query will use the index on tags to return the document. The second query will use the index on issues to return the document. One thing to notice about the second query is that it’s dependent on the order of the fields in the documents indexed. Meaning that if the number and published_on field change order the second query would fail. If the document changes structure it would be better to create a specific compound index on the fields needed in sub element documents. A better index would be.

var comics = db.getSisterDB("store").comics;
comics.ensureIndex({"issues.number":1, "issues.published_on":1});

To use the index correctly the second query can be issues as.

var comics = db.getSisterDB("store").comics;
comics.find({
  "issues.number":1,
  "issues.published_on": "June 1938"}).toArray()

Geospatial indexes

MongoDB offers several a couple of Geospatial indexes. The indexes makes it possible to perform efficient Geospatial queries.

Specialized 2d Sphere index

The 2d Geospatial Sphere index allows to perform queries on a earth-like sphere making for better accuracy in matching locations.

Take the following example document.

{
  loc: {
    type: "Point",
    coordinates: [60, 79]
  },
  type: "house"
}

Create a 2dsphere index.

var locations = db.getSisterDB("geo").locations;
locations.ensureIndex({loc: "2dsphere", house: 1});

Query the index using a square box and the type.

var locations = db.getSisterDB("geo").locations;
locations.find({loc: {
    $geoWithin: {
      $geometry: {
        type: "Polygon",
        coordinates: [[
          [ 0 , 0 ] , [ 0 , 80 ] , [ 80 , 80 ] , [ 80 , 0 ] , [ 0 , 0 ]
        ]]
      }
    }
  }}).toArray();

Gotchas 2dsphere

The 2d sphere index is a pure GeoSpatial index and is limited to the ranges for latitude (-90 - 90) and longitude (-180 to 180). It also only accepts $geometry like queries and supports a subset of the 2d index. In return it’s faster and more accurate than the general 2d index.

General 2d index

The 2d index is a flat index that does not take into consideration any projection. One of the benefits of the 2d index is that it allows to set lower and upper bounds for the coordinate system as well as the search resolution. This makes the index a general 2d index.

Let’s add a sample document.

var houses = db.getSisterDB("2d").houses;
houses.insert({
  price_room: [10000, 3],
  type: "house"
});

Notice that the price_room is just an array. This is because the 2d index is not inherently tied to the GeoJSON format in the same way as the 2dsphere index.

Let’s create a 2d index.

var houses = db.getSisterDB("2d").houses;
houses.ensureIndex({price_room: "2d"}, { min: 0, max: 200000, bits: 32 });

Now let’s look for all houses that fall inside the range of 2000 to 20000 in price and has 0 to 5 rooms.

db.houses.find( { price_room :
  { $geoWithin : {
      $box : [ [ 2000 , 0 ] , [ 20000 , 5 ] ]
    }
  }
}).toArray();

2d indexes

The min and max values lets you project any 2d data with numeric values into a 2d index where you can use geo queries like $near, $box etc to cut and slice the data. Once one realizes it’s a generalized 2d index it becomes very useful for a range of shape queries not easily done using the normal query operators.

Text indexes

From 2.6 on text search is integrated into the MongoDB query language (In 2.4 it was available as beta command). It relies on an underlying text index.

Let’s insert some sample documents.

var entries = db.getSisterDB("blogs").entries;
entries.insert( {
  title : "my blog post",
  text : "i am writing a blog. yay",
  site: "home",
  language: "english" });
entries.insert( {
  title : "my 2nd post",
  text : "this is a new blog i am typing. yay",
  site: "work",
  language: "english" });
entries.insert( {
  title : "knives are Fun",
  text : "this is a new blog i am writing. yay",
  site: "home",
  language: "english" });

Let’s define create the text index.

var entries = db.getSisterDB("blogs").entries;
entries.ensureIndex({title: "text", text: "text"}, { weights: {
    title: 10,
    text: 5
  },
  name: "TextIndex",
  default_language: "english",
  language_override: "language" });

This ensureIndex command shows how weights can be used to control the weighting of fields. In this case any search that matches title should be ranked higher than a match in the text field. We also pass in a name parameter that allows to give the index a custom name. The default_language specifies that any document missing a specific language field should default to english. The option language_override tells the text index to look for individual documents language definition under the language field. If the language field for a specific document is set to f.ex spanish, MongoDB will index it using the spanish stop list and stemming.

Now let’s query for all the blog entries that contain the **blog(()) word and filter by the site field.

var entries = db.getSisterDB("blogs").entries;
entries.find({$text: {$search: "blog"}, site: "home"})

The query matches all the documents that contain the word blog in either the title or text field and then filters them by the site field. To include the individual search scores modify the query slightly.

var entries = db.getSisterDB("blogs").entries;
entries.find({$text: {$search: "blog"}, site: "home"},
  {score: {$meta: "textScore"}}).sort({score: {$meta: "textScore"}});

The query includes the score given to the individual documents and sorts them in descending order by the score.

Text Indexes Can Get Big

Text indexes can grow to be bigger than the actual stored documents and can take a while to build if the collection is big. They also add additional overhead to writes such as inserts and updates compared to simpler indexes.

Sparse indexes

Sparse indexes are indexes where no values are included for fields that do not exist. Take the following two documents.

var sparse = db.getSisterDB("indexes").sparse;
sparse.insert({ hello: "world", number: 1 });
sparse.insert({ hello: "world" });

A non-sparse index for the field number will contain an entry for both documents in the index. A sparse index will contain only the documents that contains the number field. This saves memory and disk space for the index in comparison to a normal field level index.

To create a sparse index.

var sparse = db.getSisterDB("indexes").sparse;
sparse.ensureIndex({number: 1}, {sparse: true});

Unique indexes

An unique index is different from a normal index in that it only allows a single document to exist for a field value. Let’s define the index for a field.

var unique = db.getSisterDB("indexes").unique;
unique.ensureIndex({number: 1}, {unique: true});

Now let’s try to insert some documents

var unique = db.getSisterDB("indexes").unique;
unique.insert({ hello: "world", number: 1 });
unique.insert({ hello: "world", number: 1 });

The second insert will fail as there is already a document with the field number equal to 1.

Time To Live indexes

Time to live indexes (TTL) are a special type of index that will remove documents that fail to meet the index condition. One use for TTL indexes is for a cache of documents, allowing old documents to be gradually removed by MongoDB instead of bulk removing documents with an external process.

Let’s insert some documents.

var ttl = db.getSisterDB("indexes").ttl;
ttl.insert({ created_on: new Date() });

Let’s define an a TTL index on created_on with a expire time of 1000 seconds in the future.

var ttl = db.getSisterDB("indexes").ttl;
ttl.ensureIndex({created_on: 1}, {expireAfterSeconds: 1000});

When the documents cross the created_on + 1000 seconds they will get removed.

Notes about TTL

The expireAfterSeconds is not a hard limit. MongoDB will remove and expired documents once it has time to do it. So the actual time of removal might vary.

Covered Indexes

Covered indexes are queries that can be answered using only the information stored in the index. Basically MongoDB answers the index using the fields stored in a covered index. Let’s insert some documents.

var covered = db.getSisterDB("indexes").covered;
covered.insert({ text: "hello", site: "home"});
covered.insert({ text: "hello", site: "work" });

Let’s define the covered index.

var covered = db.getSisterDB("indexes").covered;
covered.ensureIndex({text: 1, site: 1});

Let’s perform a covered index query.

var covered = db.getSisterDB("indexes").covered;
covered.find({text: "hello"}, {_id: 0, text:1, site:1});

Let’s look at the query plan.

var covered = db.getSisterDB("indexes").covered;
covered.find({text: "hello"}, {_id: 0, text:1, site:1}).explain();

The results look like.

{
  "cursor" : "BtreeCursor text_1_site_1",
  "isMultiKey" : false,
  "n" : 2,
  "nscannedObjects" : 0,
  "nscanned" : 2,
  "nscannedObjectsAllPlans" : 0,
  "nscannedAllPlans" : 2,
  "scanAndOrder" : false,
  "indexOnly" : true,
  "nYields" : 0,
  "nChunkSkips" : 0,
  "millis" : 0,
  ...
  "server" : "christkv.local:27017"
}

Notice how the query plan result includes indexOnly set to true meaning that the query was completely covered by the index and MongoDB never touched the documents.

Covered Index Gotchas

Noticed how {_id: 0, text:1, site:1} excludes _id. A covered index query cannot include the _id field.