Schema Basics

Before presenting Schema patterns for MongoDB we will go through the basics of MongoDB Schema design and ways on how to model traditional relational relationships such as one-to-one, one-to-many and many-to-many.

One-To-One (1:1)

A One to One Relational Example

The 1:1 relationship can be modeled in two ways using MongoDB. The first way is to embed the relationship as a document, the second one is as a link to a document in a separate collection. Let’s look at both ways of modeling the one to one relationship using the following two documents.

Model

{
  name: "Peter Wilkinson",
  age: 27
}

{
  street: "100 some road",
  city: "Nevermore"
}

Strategy

Embedding

The first approach is simply to embed the address as a document in the User document.

{
  name: "Peter Wilkinson",
  age: 27,
  address: {
    street: "100 some road",
    city: "Nevermore"
  }
}

The benefit is that we can retrieve the user details and the address using a single read operation.

Linking

The second approach is to link the address and user document using a foreign key.

{
  _id: 1,
  name: "Peter Wilkinson",
  age: 27
}

{
  user_id: 1,
  street: "100 some road",
  city: "Nevermore"
}

This is similar to how traditional relational databases would store the data. It is however important to note that MongoDB does not enforce any foreign key constraints so the relation only exists as part of the application level schema.

Embedding Preferred

In the one to one relationship Embedding is the preferred way to model the relationship as it’s a more efficient way to retrieve the document.

One-To-Many (1:N)

A One to Many Relational Example

The 1:N relationship can be modeled in couple of different ways using MongoDB . The first one is embedding, the second one is linking and the third one is a bucketing strategy that is useful for some particular cases. Let’s use the model of a Blog Post and its Comments.

Model

{
  title: "An awesome blog",
  url: "http://awesomeblog.com",
  text: "This is an awesome blog we have just started"
}

A Blog Post is a single document that describes one specific blog post.

{
  name: "Peter Critic",
  created_on: ISODate("2014-01-01T10:01:22Z"),
  comment: "Awesome blog post"
}

{
  name: "John Page",
  created_on: ISODate("2014-01-01T11:01:22Z"),
  comment: "Not so awesome blog"
}

For each Blog Post we can have one or more Comments.

Strategy

Embedding

The first approach is to embed the comments in the Blog Post.

{
  title: "An awesome blog",
  url: "http://awesomeblog.com",
  text: "This is an awesome blog we have just started",
  comments: [{
    name: "Peter Critic",
    created_on: ISODate("2014-01-01T10:01:22Z"),
    comment: "Awesome blog post"
  }, {
    name: "John Page",
    created_on: ISODate("2014-01-01T11:01:22Z"),
    comment: "Not so awesome blog"
  }]
}

The benefits are that we can easily retrieve all the comments with the Blog Post in a single read. Adding new comments is as simple as appending the new comment document to the end of the comments array. However there are three possible problems with this approach.

The first one is that the comments array might grow larger than the maximum document size of 16 MB.

The second has to do with write performance. As each Blog Post will get comments added to it over time it makes it hard for MongoDB to predict the correct document padding to apply when a new document is created. This means that the document has to be moved around in memory as it grows causing additional IO and impacting write performance.

It’s however important to note that this only matters for high write traffic and might not be a problem for smaller applications.

The third one is performing pagination off the comments. As cannot easily filter out comments returned from the single Blog Post we will have to retrieve all the comments and filter in the application.

Linking

The second approach is to link comments to the Blog Post using a more traditional foreign key.

{
  _id: 1,
  title: "An awesome blog",
  url: "http://awesomeblog.com",
  text: "This is an awesome blog we have just started"
}

{
  blog_entry_id: 1,
  name: "Peter Critic",
  created_on: ISODate("2014-01-01T10:01:22Z"),
  comment: "Awesome blog post"
}

{
  blog_entry_id: 1,
  name: "John Page",
  created_on: ISODate("2014-01-01T11:01:22Z"),
  comment: "Not so awesome blog"
}

The benefits from this model is that additional comments will not grow the original Blog Post document, making it less likely that the applications will run in the the maximum document size of 16 MB. It’s also much easier to return paginated comments as the application can slice and dice the comments more easily. On the downside if we have 1000 comments on a blog post we need to retrieve all 1000 documents causing a lot of reads from the database.

Bucketing

The third approach is a hybrid of the two above. Basically it tries to balance the rigidity of the embedding strategy with the flexibility of the linking strategy. For this example we might decide that we will split the comments into buckets with a maximum of 50 comments in each bucket.

{
  _id: 1,
  title: "An awesome blog",
  url: "http://awesomeblog.com",
  text: "This is an awesome blog we have just started"
}

{
  blog_entry_id: 1,
  page: 1,
  count: 50,
  comments: [{
    name: "Peter Critic",
    created_on: ISODate("2014-01-01T10:01:22Z"),
    comment: "Awesome blog post"
  }, ...]
}

{
  blog_entry_id: 1,
  page: 2,
  count: 1,
  comments: [{
    name: "John Page",
    created_on: ISODate("2014-01-01T11:01:22Z"),
    comment: "Not so awesome blog"
  }]
}

The main benefit of using buckets in this case is that we can perform a single read to fetch 50 comments at the time, allowing for efficient pagination.

When to use bucketing

When you have the possibility of splitting up your documents in discreet batches it makes sense to consider bucketing to speed up retrieval of documents.

Typical cases are things like bucketing data by hours, days or number of entries on a page (such as comments pagination).

Many-To-Many (N:M)

A Many to Many Relational Example

N:M relationships are modeled in the relational database by using a join table. A typical example is the relationship between books and authors where an author has authored multiple authors and a book can be written by multiple authors. Let’s look at two ways of modeling many to many relationships.

Two Way Embedding

Embedding the books in an authors document

Model

In Two Way Embedding we will include the Book foreign keys under the books field.

{
  _id: 1,
  name: "Peter Standford",
  books: [1, 2]
}

{
  _id: 2,
  name: "Georg Peterson",
  books: [2]
}

In the same way for each Book we include the Author foreign keys under the author field.

{
  _id: 1,
  title: "A tale of two people",
  categories: ["drama"],
  authors: [1, 2]
}

{
  _id: 2,
  title: "A tale of two space ships",
  categories: ["scifi"],
  authors: [1]
}

Queries

var db = db.getSisterDB("library");
var booksCollection = db.books;
var authorsCollection = db.authors;

var author = authorsCollection.findOne({name: "Peter Standford"});
var books = booksCollection.find({_id: {$in: author.books}}).toArray();

var db = db.getSisterDB("library");
var booksCollection = db.books;
var authorsCollection = db.authors;

var book = booksCollection.findOne({title: "A tale of two space ships"});
var authors = authorsCollection.find({_id: {$in: book.authors}}).toArray();

As we can see we have to perform two queries in both directions. First finding either the author or the book and then performing an $in query to find the books or authors.

Consider

If one way is massively unbalanced in size this modeling might not be feasible. Such a possible scenario is Products and Categories where f.ex a TV might have a single Category associated with it but a Category might have n number of items associated with it, meaning embedding all the product id’s in a Category is not feasible.

One Way Embedding

The One Way Embedding strategy take optimizes the many to many relationship by embedding only in one direction which is very useful if one side is massively unbalanced in size. Consider the case above of the categories. Let’s pull the categories out in a separate document.

Model

{
  _id: 1,
  name: "drama"
}

{
  _id: 1,
  title: "A tale of two people",
  categories: [1],
  authors: [1, 2]
}

The reason we are doing a single direction for categories is due to there being a lot more books in the drama category than categories in a book. If one embeds the books in the category document it’s easy to foresee that one could break the 16MB max document size for certain broad categories.

Queries

var db = db.getSisterDB("library");
var booksCol = db.books;
var categoriesCol = db.categories;

var book = booksCol.findOne({title: "A tale of two space ships"});
var categories = categoriesCol.find({_id: {$in: book.categories}}).toArray();

var db = db.getSisterDB("library");
var booksCollection = db.books;
var categoriesCollection = db.categories;

var category = categoriesCollection.findOne({name: "drama"});
var books = booksCollection.find({categories: category.id}).toArray();

Establish Relationship Balance

Establish the max size of N and the size of M. F.ex if N is a max of 3 categories for a book and M is a max of 500000 books in a category you should pick One Way Embedding. If N is a max of 3 and M is a max of 5 then Two Way Embedding might work well.