Time Series
The first pattern we are going to explore is the time series pattern. This pattern is a write optimization pattern to ensure maximum write performance for a typical analytics application. A time series is made up of discreet measurements at timed intervals. Examples can include counting the number of page views in a second or the temperature per minute. For this pattern we will discuss time series in the context of web page views.
| Schema Attributes | |——————–|——————————| | Optimized For | Write Performance | | Pre-Allocation | Benefits from Pre-Allocation |
To maximize our write performance for a time series we have are going to assume that we are interested in discreet buckets of time. That’s to say an individual page view is not interesting to the application, only the number of page views in a particular second, minute, hour, day or any time range in-between. This means the smallest unit of time we are interested in is a single minute.
Schema
Taking that into account let’s model a bucket to keep all our page views for a particular minute.
{
"page": "/index.htm",
"timestamp_minute": ISODate("2014-01-01T10:01:00Z"),
"total_views": 0,
"seconds": {
"0": 0
}
}
Let’s have a quick view of what the fields mean.
Field | Description |
---|---|
page | The web page we are measuring |
timestamp_minute | The actual minute the bucket is for |
total_views | Total page views in this minute |
seconds | Page views for a specific second in the minute |
As we can see the document represents not only a single minute of page views for a specific page but also allows for looking at individual seconds.
Update Page Views
Let’s simulate what happens in an application that is counting page views for a specific page. We are going to simulate updating a bucket for a specific page view in the 2nd second of the ISODate(“2014-01-01T10:01:00Z”) bucket.
use analytics
var secondInMinute = 2;
var updateStatment = {$inc: {}};
updateStatment["$inc"]["seconds." + secondInMinute] = 1;
db.page_views.update({
page: "/index.htm",
timestamp_minute: ISODate("2014-01-01T10:01:00Z")
}, updateStatment, true)
The first part of the updateStatement sets up the $inc value to increment the field in the seconds field named 2 which corresponds with the second second in our time period. If the field does not exist MongoDB will set it to one otherwise it will increment the existing value with one. Notice the last parameter of the update statement. This is telling MongoDB to do an upsert, meaning MongoDB will create a new document if none exists. Let’s query for the document.
use analytics
db.page_views.findOne({
page: "/index.htm",
timestamp_minute: ISODate("2014-01-01T10:01:00Z")
});
Returns the following document.
{
"_id" : ObjectId("52de4ef8297f2f3b6f41d242"),
"page" : "/index.htm",
"timestamp_minute" : ISODate("2014-01-01T10:01:00Z"),
"seconds" : {
"2" : 1
}
}
Unfortunately there is a slight problem with this way of creating new buckets, namely that the document will grow over time causing MongoDB to have to move it around an incurring a write performance penalty. Luckily there is a work around to improve the write performance.
Pre-Allocation
We can preallocate documents for our minute buckets. Let’s look at a simple little script that takes in a specific hour and pre-allocates minute buckets for that hour.
var preAllocate = function(coll, pageName, timestamp) {
for(var i = 0; i < 60; i++) {
coll.insert({
"page": pageName,
"timestamp_minute" : timestamp,
"seconds" : {
"0":0,"1":0,"2":0,"3":0,"4":0,"5":0,"6":0,"7":0,"8":0,"9":0,
"10":0,"11":0,"12":0,"13":0,"14":0,"15":0,"16":0,"17":0,"18":0,"19":0,
"20":0,"21":0,"22":0,"23":0,"24":0,"25":0,"26":0,"27":0,"28":0,"29":0,
"30":0,"31":0,"32":0,"33":0,"34":0,"35":0,"36":0,"37":0,"38":0,"39":0,
"40":0,"41":0,"42":0,"43":0,"44":0,"45":0,"46":0,"47":0,"48":0,"49":0,
"50":0,"51":0,"52":0,"53":0,"54":0,"55":0,"56":0,"57":0,"58":0,"59":0
}
})
timestamp.setMinutes(timestamp.getMinutes() + 1);
}
}
Let’s take this pre-allocation method out for a test run.
var col = db.getSisterDB("analytics").page_views;
col.drop();
preAllocate(col, "index.htm", ISODate("2014-01-01T10:01:00Z"));
For this example we are dropping any existing documents in the page_views collection for clarity reasons. Now run the following commands.
var col = db.getSisterDB("analytics").page_views;
col.count()
col.find()
The col.count() returns 60 showing we have inserted 60 buckets. Looking over the results from the col.find() you’ll notice that each one has an incrementing timestamp and that the interval is 1 minutes.
With out pre-allocated documents, the update command will hit an existing empty bucket and since the bucket is at it’s maximum size it will never grow and MongoDB will avoid having to copy the data to a new location. This will increase the write performance as MongoDb can spend more of it’s time performing updates in place.