OMG an ORM for NoSQL?!

OMG an ORM for NoSQL?!

My previous contract involved using SQLAlchemy with Postgres, neither product was something I'd spent much time on previously, so using them in anger was quite enlightening. I can sort of see why people like to use ORM's, especially if you're coming from a background where they are the norm (Java?), but in the same vein I can see why so many projects suffer from massive IO bottlenecks that DBA's find so hard to fix.

Having previously written a little Database system in Python that sort of addresses performance problems I've had on other projects, I vaguely wondered if the same principle could be applied (to that database) to address the performance (and other) issues I found when using SQLAlchemy.

Although it doesn't have all the features of Alchemy and there are probably a number of edge cases I've not exercised yet, there is a working implementation now integrated into Pymamba;
https://oddjobz.github.io/pymamba/README-ORM.html

In a nutshell, it encapsulates the database structure and removes responsibility for inter-table references (and eventually referential integrity) from the programmer, and adds calculated or virtual field functionality at application level, rather than inside the database's DDL (if it has one).

What does it look like?

As an example I'm going to implement an address book solution based around four interlinked tables. We can have multiple address books, each book can contain entries for businesses and people, businesses can be related to multiple people, and each person can be associated with multiple businesses. In addition, businesses and people can be associated with multiple addresses, and each address can potentially be used by multiple people or businesses. Bit of a nightmare if you're coding this up by hand?

I'm going to leave out the table definitions for a second as these are detailed in addressbook.py in the source repo, and go straight to the code;

db = Database('databases/contacts_db', size=size_mb(10))
tbl_address_book = AddressBookModel(table=db.table('address_book'))
tbl_business = BusinessModel(table=db.table('business'))
tbl_person = PersonModel(table=db.table('person'))
tbl_address = AddressModel(table=db.table('address'))
address_book_business = ManyToMany(db, tbl_address_book, tbl_business)
address_book_person = ManyToMany(db, tbl_address_book, tbl_person)
business_person = ManyToMany(db, tbl_business, tbl_person)
business_address = ManyToMany(db, tbl_business, tbl_address)
person_address = ManyToMany(db, tbl_person, tbl_address)

What that's done is to create all the infrastructure we need (including the tables) for the ORM to do it's thing. All we need to do now is to add some data, first we need to add an address book;

book = tbl_address_book.add({'name': 'Personal Address Book'})

Once we have an anchor (i.e. a python variable) we can start filling out some of the data in the associated tables;

book.business.append({'name': "Fred's Business"}) \
 .person.append({'name': "Fred Bloggs"}) \
 .address.append({'name': "Fred's House", 'postcode': "FRED 001"})
book.save()
book.business.append({'name': "John's Business"}) \
 .person.append({'name': "John Smith"}) \
 .address.append({'name': "John's House", 'postcode': "JOHN 001"})
book.save()

Will have created an address book, two contacts, two businesses, two addresses, and linked them together as implied above. Not that nothing was saved to the database until we arrived at save(). Just to check we can add a little bit of print code;

tbl_address_book.list()
+--------------------------------+
| name                           | 
+--------------------------------+
| Personal Address Book          | 
+--------------------------------+
tbl_business.list()
+--------------------------------+
| name                           | 
+--------------------------------+
| Fred's Business                | 
| John's Business                | 
+--------------------------------+
tbl_person.list()
+--------------------------------+
| name                           | 
+--------------------------------+
| Fred Bloggs                    | 
| John Smith                     | 
+--------------------------------+
tbl_address.list()
+--------------------------------+-----------------+
| name                           | postcode        | 
+--------------------------------+-----------------+
| Fred's House                   | FRED 001        | 
| John's House                   | JOHN 001        | 
+--------------------------------+-----------------+

So if we revisit the record at a later date, getting hold of all these details involves very little work. Once we read in a record, we immediately have access to all linked information, all of which is available on a lazy-load basis, so if we don't try to access linked information, the ORM won't waste resource by pre-fetching it.

So let's say we grab an address book and want to list all the post-codes in the address book. To do that we're going to need to read the address book businesses, and for each business read all the people, then for all the people read all the addresses, then for each linked address, print the postcode. Consider;

print(book.business[0].person[0].address[0].postcode)
FRED 001

Address book has a virtual field called business that maps to all connected businesses and presents them as a list. Each business has a virtual field called people, which presents all people linked to the business as a list, then people has virtual field called address that presents all addresses linked to the person as a list. The easy way to extract a postcode list would be with a nested list comprehension;

print([address.postcode
       for book in tbl_address_book.find()
       for business in book.business
       for person in business.person
       for address in person.address
       ])
['FRED 001', 'JOHN 001']

Conversely, if we wanted to find out what address book(s) a given postcode existed in, we could do this;

print([book.name
       for address in tbl_address.find(expression=lambda doc: doc['postcode'] == 'JOHN 001')
       for person in address.person
       for business in person.business
       for book in business.address_book
       ])
['Personal Address Book']

But wait (!) it gets better, let's say we want to prefix the name of all addresses in a specific address book with an asterisk if they begin with 'F'. Ok, maybe a little contrived, but;

def fun(address):
    if address.postcode[0] == 'F':
        address.postcode = '*'+address.postcode
        address.save()

[fun(address)
    for book in tbl_address_book.find()
    for business in book.business
    for person in business.person
    for address in person.address
]
=> results in ..
['*FRED 001', 'JOHN 001']

Anyway, it's still new and I've spotted a bug while I've been playing, but it does have the potential to lend relationship management to NoSQL as a native function. Next on the list is to eradicate versioning issues and the need to use tools like alembic :-)