Oy! Pack it in!
Computers are never fast enough, Python is never fast enough, but there's always scope for improvement ...
So a couple of years ago while playing around with micro-services, I came across a requirement for a local, fast, key-value based database that could handle a few more advanced features like automatic secondary indexes and compound keys. The net result of this was PYNNDB.
Now as far as it goes this is fine, essentially everything is stored on disk as JSON so the programmer essentially reads and writes Python dict's, and at the same time has access to a feature-set that approximates to something like "dbase".
One remaining issue however is speed. If we're doing a linear seek through a table, the speed limit is still in the low hundreds of thousands of rows per second, primarily because once a row has been read, you need to convert the serialised JSON data into a Python "dict" before Python can use it. ("json.loads") Now this can be mitigated a little by using the likes of "ujson", however there is still the fundamental issue that converting JSON to a Python "dict" is actually a time consuming process, not least because it will create a bunch of Python variables, an operation in itself that becomes significant once you start to repeat it millions of times per second.
Enter ZLMDB. This is kind of cool and runs along the same lines as PYNNDB, but has the ability to use flatbuffers to serialise data, the benefit here is that you can read data out of a "flatbuffers" record, without de-serialising the entire record. i.e. you can just read out the data you need, which makes it "much" faster. The serious downside here is that flatbuffers requires a compiled schema, so it would not typically be classified as a NOSQL implementation.
So .. the solution would seem to be, we want to be able to read from a serialised structure, but at the same time, we need the structure to honour schema-less semantics such that we still adhere to NOSQL principles. Turns out that not only is this possible, it's also pretty quick, indeed (it's not "finished") currently it's looking a little faster than the compiled "flatbuffers" solution.
Raw Speed
First thing to establish is, if we create a structure to effective replace the "dict" we would have received from "json.loads", how much slower is it going to be? So, currently I have a module called "gpack" that implements three different classes;
# GHASHTABLE - is a c-extension that implements a fast C-level hash lookup table
# GCODEC - has the ability to convert from a Python dict into our new packed format
# GOBJECT - is a wrapper around our packed data
JSON = {
'name': 'Fred Bloggs',
'age': 21
}
htab = gpack.GHASHTABLE()
codec = gpack.GCODEC(htab)
assert codec.encode(JSON) is True
obj = gpack.GOBJECT(htab)
obj._setbuffer(codec.buffer)
print(obj.name, obj.age)
=> Fred Bloggs 21
So typically in a database scenario looping through rows in a table, we would read a row, use "_setbuffer" to point to the data read from the row, then "obj" would be the top-level object through which we could access the attributes within the row. So removing all the encoding, decoding and database from the equation, how does our new class compare to a raw dict item in terms of speed;
JSON = {
'name': 'Fred Bloggs',
'age': 21,
'mylist': ["one","two","three","four",5],
'mydict': {
'a': 1,
'b': 2,
'c': {
'nested': 'this is a long string',
'num': 9321,
'list': [{'a': 1}, {'b': 2, 'c':3}]
}
},
'amt': 1.23,
'end': '****',
}
htab = gpack.GHASHTABLE()
codec = gpack.GCODEC(htab)
assert codec.encode(JSON) is True
obj = gpack.GOBJECT(htab)
obj._setbuffer(codec.buffer)
beg = time.time()
total = 0
max = 1000000
obj._setbuffer(codec.buffer)
for x in range(max):
total += obj.age
end = time.time()
rate1 = max/(end-beg)/1000000
print(f'Time1: "{end-beg:2.6}" , Cycles: {rate1:2.02}M/sec')
encoded = ujson.dumps(JSON)
beg = time.time()
total = 0
max = 1000000
obj = ujson.loads(encoded)
for x in range(max):
total += obj['age']
end = time.time()
rate2 = max/(end-beg)/1000000
print(f'Time2: "{end-beg:2.6}" , Cycles: {rate2:2.02}M/sec')
print(f'Compare RAW dict speed: {round(rate1/rate2,2)}x')
Gives us;
Time1: "0.146585" , Cycles: 6.8M/sec
Time2: "0.145299" , Cycles: 6.9M/sec
Compare RAW dict speed: 0.99x
So for our test case, GOBJECT (which is reading data directly from within a serialised object) is coming in at pretty much the same speed at a raw Python dictionary .. so we're not really losing anything speed-wise by substituting dict's for our new custom object.
Compared to JSON "loads"
So now what happens when we emulate what's happening when we read from a database. With JSON encoding, we need to "loads" the buffer to de-serialise it before we can access it, but with GOBJECT, we can read data directly from the serialised structure. (note we're doing 10x fewer iterations for the JSON loop just to speed things up a little)
beg = time.time()
total = 0
max = 1000000
for x in range(max):
obj._setbuffer(codec.buffer)
total += obj.age
end = time.time()
rate1 = max/(end-beg)/1000000
print(f'Time1: "{end-beg:2.6}" , Cycles: {rate1:2.02}M/sec')
encoded = ujson.dumps(JSON)
beg = time.time()
total = 0
max = 100000
for x in range(max):
obj = ujson.loads(encoded)
total += obj['age']
end = time.time()
rate2 = max/(end-beg)/1000000
print(f'Time2: "{end-beg:2.6}" , Cycles: {rate2:2.02}M/sec')
print(f'Best Speed increase: {round(rate1/rate2,2)}x')
Gives us;
Time1: "0.420047" , Cycles: 2.4M/sec
Time2: "0.335924" , Cycles: 0.3M/sec
Best Speed increase: 8.0x
So if you allow for the fact that GOBJECT is still being tweaked, there's already an 8x performance boost in there .. which is pretty substantial. By it's very nature GOBJECT is mutable rather than immutable, which means you can modify the attributes within the structure without creating a new Python object, which also carries some huge performance benefits when you need to update your data.
Next steps ...
It'll be interesting to see how fast PYNNDB gets once also coded as a C-Extension. So far it's looking like for the above test (i.e. sum(age)) that a scan rate of around 10M rows per second (on my test machine) is potentially achievable.