Nested Categories

The nested categories schema design pattern targets the hierarchical structures traditionally found in a product catalog on an line e-commerce website.

A Category Hierarchy Example

Trees using Paths

Let’s insert the category /electronics/embedded and a product that belongs in this category.

var categories = db.getSisterDB("catalog").categories;
categories.insert([{
    "name": "electronics"
  , "parent": "/"
  , "category": "/electronics"
}, {
    "name": "embedded"
  , "parent": "/electronics"
  , "category": "/electronics/embedded"
}, {
    "name": "cases"
  , "parent": "/"
  , "category": "/cases"
}, {
    "name": "big"
  , "parent": "/cases"
  , "category": "/cases/big"
}, {
    "name": "small"
  , "parent": "/cases"
  , "category": "/cases/small"
}]);

var products = db.getSisterDB("catalog").products;
products.insert({
    "name": "Arduino"
  , "cost": 125
  , "currency": "USD"
  , "categories": ["/electronics/embedded"]
});

In the Trees as Paths we are using what looks like file directory paths from UNIX. Together with regular expressions we can slice the paths as we need to efficiently retrieve any level of the tree structure.

A couple of notes about the schema above. Notice that the product has an array of categories. This lets you easily list the same product in multiple categories (maybe by attributes such as embedded but also Arduino).

Lets fetch all the categories just below the top level /

var col = db.getSisterDB("catalog").categories;
var categories = col.find({parent: /^\/$/}).toArray();

for(var i = 0; i < categories.length; i++) {
  print(categories[i].category);
}

Notice the regular expression /^\/$/. This translates to all documents where the field parent starts and ends with /.

Locate the entire tree structure below the cases level /cases

var col = db.getSisterDB("catalog").categories;
var categories = col.find({category: /^\/cases\//}).toArray();

for(var i = 0; i < categories.length; i++) {
  printjson(categories[i]);
}

Notice the regular expression /^\/cases\/$/. This matches all documents where the field category starts with /cases/, thus returning all the categories in the subtree below /cases

To locate all the products for the /electronics/embedded category

var col = db.getSisterDB("catalog").products;
var products = col.find({categories: /^\/electronics\/embedded$/}).toArray();

for(var i = 0; i < products.length; i++) {
  printjson(products[i]);
}

This will match any documents where the categories array contains the string /electronics/embedded.

Indexes

Adding indexes ensure the lookup is as fast as possible as the database needs to spend less effort in locating the data. Below are some recommended indexes for the schema.

var col = db.getSisterDB("catalog").categories;
col.ensureIndex({parent:1})
col.ensureIndex({category:1})

var col = db.getSisterDB("catalog").products;
col.ensureIndex({categories:1})

Covered Indexes

Covered indexes are indexes that contain enough information to be able to answer the query with the data stored in the index and are in general, many times faster than queries that need to search the documents. We could create a covered index for categories to allow for quick retrieval of a specific level.

var col = db.getSisterDB("catalog").categories;
col.ensureIndex({parent:1, name: 1})

The Index {parent:1, name:1} is a compound index and will contain both the parent and name field and can cover queries containing those fields.

Let’s rewrite the query of all the categories just below the top level / and look at the explain plan.

var col = db.getSisterDB("catalog").categories;
col.find({parent: /^\/$/}, {_id:0, parent:1, name:1}).explain();

This should return a document result that contains a field indexOnly that is set to true indicating that the query can be answered by only using the index. However you do have to give up the _id field for this query but in many cases covered index queries can give a radical performance boost for a query where _id is not needed and a small set of fields need to be returned.

Knowing this we can rewrite the query for the direct sub categories of the path /.

var col = db.getSisterDB("catalog").categories;
var categories = col.find({parent: /^\/$/}, {_id:0, parent:1, name:1});

for(var i = 0; i < categories.length; i++) {
  print(categories[i].category);
}

Pros and Cons

Pros

  • Quick retrieval of a subtree, backed by index
  • Flexible

Cons

  • Expensive to retrieve the parent path for a specific node (ex:all parent categories of small)
  • Relies on regular expressions make it more complicated (wrong regexp’s can cause collection scans)

Trees using Ancestors Array

In Trees using Ancestor’s Array each tree node contains it’s node path allowing you to retrieve a single node and being able to retrieve all it’s parents nodes in a single query. Below is the categories tree from the above example.

var categories = db.getSisterDB("catalog").categories;
categories.insert([{
  "_id": "root"
} , {
    "_id": "electronics"
  , "tree": ["root"]
  , "parent": "root"
}, {
    "_id": "embedded"
  , "tree": ["root", "electronics"]
  , "parent": "electronics"
}, {
    "_id": "cases"
  , "tree": ["root"]
  , "parent": "root"
}, {
    "_id": "big"
  , "tree": ["root", "cases"]
  , "parent": "cases"
}, {
    "_id": "small"
  , "tree": ["root", "cases"]
  , "parent": "cases"
}, {
    "_id": "yellow"
  , "tree": ["root", "cases", "small"]
  , "parent": "small"
}]);

var products = db.getSisterDB("catalog").products;
products.insert({
    "name": "Arduino"
  , "cost": 125
  , "currency": "USD"
  , "categories": ["embedded"]
});

Locating all the direct descendants of the cases tree node.

var col = db.getSisterDB("catalog").categories;
var categories = col.find({parent: "cases"}).toArray();

for(var i = 0; i < categories.length; i++) {
  printjson(categories[i]);
}

Locate all the nodes that share the common parent node cases (allowing you to extract a subtree of categories)

var col = db.getSisterDB("catalog").categories;
var categories = col.find({tree: "cases"}).toArray();

for(var i = 0; i < categories.length; i++) {
  printjson(categories[i]);
}

Locate all the parent nodes for the yellow node.

var col = db.getSisterDB("catalog").categories;
var nodes = col.findOne({_id: "small"}).tree;
var categories = col.find({_id: {$in: nodes}}).toArray();

for(var i = 0; i < categories.length; i++) {
  printjson(categories[i]);
}

One thing to notice is that one can retrieve the entire path from the root for a specific node in two queries. This is in contrast to the Path based tree where there is no cheap way to retrieve it.

Indexes

Just as the Path based tree we can benefit from a couple of indexes to improve retrieval performance.

var col = db.getSisterDB("catalog").categories;
col.ensureIndex({parent:1})
col.ensureIndex({tree:1})

var col = db.getSisterDB("catalog").products;
col.ensureIndex({categories:1})

Notice that in this case we are not explicitly creating a _id as it’s by default a unique index.

Pros and Cons

Pros

  • Quick retrieval of all ascendants and descendants of a particular node
  • More straightforward to understand than the Path Based tree
  • Not reliant on regular expressions for matching

Cons

  • More expensive than the Path Based tree