Just to remind myself, I'm going to start posting useful microservice snippets so when I figure out how I'm going to add templates and libraries I have a stock of stuff to add. In this instance I've previously been calling CRON (the Unix tool) for batch processing, writing a stand-alone CLI tool to actually carry out the operation.

Always seemed very inefficient, but it was quick and easy.

On reflection however, I've come up with a better standard mechanism that can be used to allow microservices to run batch jobs as part of it's default event loop. It looks like this, first initialise a batch instance with something like this in the post_init for your microservice;

"""microservice initialisation routine"""
def function(self, details):
    """this is just a cut down skeleton"""
    self.cron_install("0 0 * * *", self.cron_midnight)

Then we create a utility routine in private called cron_install which just creates a closure to encompass the repeating process;

"""install a cron job"""
from twisted.internet.defer import inlineCallbacks
from twisted.internet import reactor
from crontab import CronTab
def function(self, spec, routine):
    """install a cronjob"""
    cronjob = CronTab(spec)

    @inlineCallbacks
    def runner():
        """run the job"""
        print('>> Running CRON job "{}"'.format(spec))
        yield routine()
        print('>> CRON job complete')
        reactor.callLater(cronjob.next(), runner)
    reactor.callLater(cronjob.next(), runner)

And in this particular instance we need an instance in private called cron_midnight which is the job itself, in this case I've included an actual job, just to show it has the scope to do some real work.

"""CRON job to run at midnight"""
from twisted.internet.defer import inlineCallbacks
@inlineCallbacks
def function(self):
    """midnight script"""
    myself = yield self.db.aggregators.find_one({'uid': self.uid})
    if not myself:
        print('%% aggregator not configured "{}"'.format(self.uid))
        return
    for origin in myself.get('origins', {}):
        yield self.publish('pub.log.'+origin, {'type': 'reset'})

Very simple, if you have a bunch of jobs, all you need is one line per job in post_init to reflect the schedule and the routine to run, and you're away. If the jobs are a little on the heavy side and/or you don't want them to conflict with other running code, just add a new microservice and run in a decidated context. The module I'm using is the crontab parser module available via PIP, which can be installed with;

python3 -m pip install crontab

(not to be confused with python-crontab, which is a little different)