Just how bad can it be?

Mongo has proven to be very useful in recent years, and the way in which it integrates with Python makes it an ideal storage tool for Python based projects. However, I've started to notice on projects that are sensitive to performance, it can be a little on the slow side. So, I blindly implemented my own database to see if it cured the problems I was having, and it did .. but it made me wonder .. So here's the very simple benchmark, just generate and write 5000 records into a table with three indexes, first the Mongo

        for index, session in enumerate(range(start, start+count)):
            record = {
                'day': int(random()*6),
                'hour': int(random()*24),
                'when': time(),
                'origin': 'linux.co.uk',
                'sid': start+count-index}
            db.insert(record)

Versus the home-grown Mongo clone sitting on LMDB;

        with self.db._env.begin(write=True) as txn:
            for index, session in enumerate(range(start, start+count)):
                record = {
                    'day': int(random()*6),
                    'hour': int(random()*24),
                    'when': time(),
                    'origin': 'linux.co.uk',
                    'sid': start+count-index
                }
                self.db.put(self.record, txn=txn)

And the library is live and running in my development code, so I know it really is writing the data and indexes correctly ...

Mongo comes up with 2005 cycles per second.
Homespun comes up with 63717 cycles per second.

I'm sure there are ways to make this benchmark closer, but it's "real" in terms of what a programmer might implement (indeed this code is pulled from a unit test case), but both are running default settings with three natural indexes.

Wow!

Update ~~

Ok, just upgraded to MongoDB 3.4.2 and switched from mmapv1 to the WiredTiger storage engine;

> db.serverStatus().storageEngine
{
    "name" : "wiredTiger",
    "supportsCommittedReads" : true,
    "readOnly" : false,
    "persistent" : true
}

On running the benchmark now, we've gone from 2005 per second down to 1924 per second ... (!)

If anyone has a faster Mongo server and wants to try the difference, the full code I'm running is;

        from pymongo import MongoClient
        from time import time
        connection = MongoClient()
        db = connection.demodb.testtable
        db.drop()
        db.create_index("origin")
        db.create_index("session")
        db.create_index("day")
        start=0
        count=5000
        begin = time()
        for index, session in enumerate(range(start, start+count)):
            record = {
                'day': int(random()*6),
                'hour': int(random()*24),
                'when': time(),
                'origin': 'linux.co.uk',
                'sid': start+count-index}
            db.insert(record)
        finish = time()
        speed = 5000/(finish-begin)
        print("Mongo Create Speed/sec=",speed)
Update ~~

From Twitter;

@garethbult I blame the Python driver. Surely that has a part to play in any performance comparison? Try a test using Node.

Fair comment, althouth there are reasons why I work on Python and not node, partly this is familiarity, however mostly this is down to maturity and stability. That aside, I've ported the test to NodeJS, and here it is;

var MongoClient = require('mongodb').MongoClient, assert = require('assert');

MongoClient.connect("mongodb://localhost:27017/demodb", function(err, db) {  
  if(!err) {
    var testtable = db.collection('testtable');
    testtable.drop();
    testtable.createIndex("origin", {'w':1}, function(err, indexName) { 
        assert.equal("origin_1", indexName);
    });
    testtable.createIndex("session", {'w':1}, function(err, indexName) { 
        assert.equal("session_1", indexName);
    });
    testtable.createIndex("day", {'w':1}, function(err, indexName) { 
        assert.equal("day_1", indexName);
    });
    var start=0, count=10000, now = new Date();
    for(index=start; index<count; index++) {
        record = {
            day: parseInt(Math.random()*6),
            hour: parseInt(Math.random()*24),
            when: now,
            origin: 'linux.co.uk',
            sid: start+count-index
        }
        testtable.insert(record, {'w':1}, function(err, result) {
            assert.equal(result['result']['ok'],1);
        });
    }
    finish = new Date();
    seconds = (finish-now)/1000;
    console.log("Requests per second = ", parseInt((1/seconds) * count));
    process.exit();
  }
});
Results for NodeJS
$ node test.js
Requests per second =  10718  

This is actually MUCH better than Python / Mongo, but still 1/6th the speed of the homespun version. There is an additional problem in that there is something wrong with my JS code, bonus points to whoever can point it out. (I've no idea what it is atm) Symptoms are that if I try to test with count=500000, the process runs out of memory and the machines goes into load average meltdown.

Gareth Bult

Linux and Open Source Enthusiast, Programmer, SysOps, DevOps, BeerOps and anything related to Javascript, C, Python or Pizza.

South Wales (UK) http://linux.co.uk

Subscribe to Swapping Apples ...

Get the latest posts delivered right to your inbox.

or subscribe via RSS with Feedly!