The GridFS API
The MongoDB Node.js driver now supports a
stream-based API for GridFS
that’s compatible with Node.js’
streams3, so you can .pipe()
directly from file streams to MongoDB. In
this tutorial, you’ll see how to use the new GridFS streaming API to upload
a CC-licensed 28 MB recording of the overture from Richard Wagner’s opera Die Meistersinger von Nurnberg
to MongoDB using streams.
Uploading a File
You can use GridFS to upload a file to MongoDB. This example
assumes that you have a file named meistersinger.mp3
in the
root directory of your project. You can use whichever file you want, or you
can just download a Die Meistersinger Overture mp3.
In order to use the streaming GridFS API, you first need to create
a GridFSBucket
.
const client = new mongodb.MongoClient(uri);
client.connect(function(error) {
assert.ifError(error);
const db = client.db(dbName);
var bucket = new mongodb.GridFSBucket(db);
// Use bucket...
});
The bucket has an
openUploadStream()
method that creates an upload stream for a given
file name. You can pipe a Node.js fs
read stream to the
upload stream.
const assert = require('assert');
const fs = require('fs');
const mongodb = require('mongodb');
const uri = 'mongodb://localhost:27017';
const dbName = 'test';
const client = new mongodb.MongoClient(uri);
client.connect(function(error) {
assert.ifError(error);
const db = client.db(dbName);
var bucket = new mongodb.GridFSBucket(db);
fs.createReadStream('./meistersinger.mp3').
pipe(bucket.openUploadStream('meistersinger.mp3')).
on('error', function(error) {
assert.ifError(error);
}).
on('finish', function() {
console.log('done!');
process.exit(0);
});
});
Assuming that your test
database was empty, you should see that the above
script created 2 collections in your test
database: fs.chunks
and
fs.files
. The fs.files
collection contains high-level metadata about
the files stored in this bucket. For instance, the file you just uploaded
has a document that looks like what you see below.
> db.fs.files.findOne()
{
"_id" : ObjectId("561fc381e81346c82d6397bb"),
"length" : 27847575,
"chunkSize" : 261120,
"uploadDate" : ISODate("2015-10-15T15:17:21.819Z"),
"md5" : "2459f1cdec4d9af39117c3424326d5e5",
"filename" : "meistersinger.mp3"
}
The above document indicates that the file is named ‘meistersinger.mp3’, and tells
you its size in bytes, when it was uploaded, and the
md5 of the contents. There’s also a
chunkSize
field indicating that the file is
broken up into chunks of size 255 kilobytes, which is the
default.
> db.fs.chunks.count()
107
Not surprisingly, 27847575⁄261120 is approximately 106.64, so the fs.chunks
collection contains 106 chunks with size 255KB and 1 chunk that’s roughly
255KB * 0.64. Each individual chunks document is similar to the document below.
> db.fs.chunks.findOne({}, { data: 0 })
{
"_id" : ObjectId("561fc381e81346c82d6397bc"),
"files_id" : ObjectId("561fc381e81346c82d6397bb"),
"n" : 0
}
The chunk document keeps track of which file it belongs to and its order in
the list of chunks. The chunk document also has a data
field that contains
the raw bytes of the file.
You can configure both the chunk size and the fs
prefix for the files and
chunks collections at the bucket level. For instance, if you specify the
chunkSizeBytes
and bucketName
options as shown below, you’ll get
27195 chunks in the songs.chunks
collection.
const bucket = new mongodb.GridFSBucket(db, {
chunkSizeBytes: 1024,
bucketName: 'songs'
});
fs.createReadStream('./meistersinger.mp3').
pipe(bucket.openUploadStream('meistersinger.mp3')).
on('error', function(error) {
assert.ifError(error);
}).
on('finish', function() {
console.log('done!');
process.exit(0);
});
Downloading a File
Congratulations, you’ve successfully uploaded a file to MongoDB! However,
a file sitting in MongoDB isn’t particularly useful. In order to stream the
file to your hard drive, an HTTP response, or to npm modules like
speaker, you’re going to need
a download stream. The easiest way to get a download stream is
the openDownloadStreamByName()
method.
const bucket = new mongodb.GridFSBucket(db, {
chunkSizeBytes: 1024,
bucketName: 'songs'
});
bucket.openDownloadStreamByName('meistersinger.mp3').
pipe(fs.createWriteStream('./output.mp3')).
on('error', function(error) {
assert.ifError(error);
}).
on('end', function() {
console.log('done!');
process.exit(0);
});
Now, you have an output.mp3
file that’s a copy of the original
meistersinger.mp3
file. The download stream also enables you to do some
neat tricks. For instance, you can cut off the beginning of the song by
specifying a number of bytes to skip. You can cut off the first 41 seconds of
the mp3 and skip right to the good part of the song as shown below.
bucket.openDownloadStreamByName('meistersinger.mp3').
start(1024 * 1585). // <-- skip the first 1585 KB, approximately 41 seconds
pipe(fs.createWriteStream('./output.mp3')).
on('error', function(error) {
assert.ifError(error);
}).
on('end', function() {
console.log('done!');
process.exit(0);
});
An important point to be aware of regarding performance is that the GridFS streaming API can’t load partial chunks. When a download stream needs to pull a chunk from MongoDB, it pulls the entire chunk into memory. The 255 kilobyte default chunk size is usually sufficient, but you can reduce the chunk size to reduce memory overhead.
Moving On
Congratulations, you’ve just used MongoDB and Node.js streams to store and
manipulate a .mp3 file. With GridFS, you have a file system with all the
horizontal scalability features of MongoDB. It also has a stream-based
API you can use to pipe()
files to and from MongoDB.