Life in chains

Life in chains

Back in the 80's I was introduced to Unix pipes and was amazed by what could be accomplished by joining together a sequence of relatively simple commands. More recently I've come across a similar mechanism in jQuery for chaining together a list of selectors to produce a very complex operation with a very concise syntax.

Sadly nothing really exists like this for Python, I've seen many people asking the question, but no really definitive solutions. After experimenting extensively with generators recently I wondered if they could be applied to fill this void. Turns out it's easier than I'd imagined!

If you're bored already, the code is here;

The Pipe

To make a pipeline there are essentially two components, firstly something to put together a number of pipeline stages, and secondly something to exercise the assembled pipe.

To give a simple example, we want a pipeline stage called filter to limit what actually makes it through the pipe. It we assume that filter will take in all items in the pipeline thus far, but only pass on those that pass the filter criteria, the filter routine looks like this;

def filter(self, expression):
        def handler(item=None):
            if not expression(item): return None
            yield item
        return self._add_stage(handler)

For the sake of this example, if we look at another stage "cat" which should provide us with some input from a file, it's going to look like this;

    def cat(self, filename):
        def handler(item=None):
            with open(filename) as handle:
                while True:
                    line = handle.readline()
                    if not line: return
                    yield line.rstrip('\r\n')
        return self._add_stage(handler)

You will notice that (a) there is a very standard form to these routines and (b) they look very easy to implement. Anyway, if we assume there is an equally simple routine called print (there is) then when we use the pipeline we will be able to do;

>>> pyipe().cat('/etc/lsb-release').filter(lambda s:'CODE' in s).print()()

If you're not a fan of lambda functions, then you could equally write;

def is_in(s):
    return 'CODE' in s

Obviously the lambda is more concise and more flexible. So for demonstration purposes I've written a small collection of POSIX shell style components (filter, print, count, limit, cat, sort, uniq, cut) which can be chained together in any combination, much like Unix commands from the bash shell.

Database components

It's easy to extend the base out and add your own, I have a couple of other modules I will be adding to the repo soon, one to read/write from Mongo, and other LMDB. Why is this useful? Well, just for kick-off I'm currently converting some Mongo Data to work from a (much faster) LMDB setup, so to transfer a Mongo collection to LMDB I can do;


Of if I just want "some" data I can do something like;


And if I want to check the results in the target database I can do;

pyipe().lmdb('demodb/testtable').filter(lambda doc: doc['age']==51).print()()

With enough good components, the database(s) in question become far less relevant and moving information around directly from the Python shell becomes a bit of snip ... (more examples / outputs on the Github page)