Posts Tagged ‘redis’

The Architecture of a New Project

Wednesday, January 11th, 2012

Yesterday I started working with Ajax Push, wrote a quick demo for a friend, and then stripped that and wrote a functional demo project with documentation. I did this to test if Ajax Push worked well enough for another concept project. As it turns out, using APE does work, but, it leaves a little to be desired.

While I was working with APE and tweaking the documentation and demo, a problem I had faced a few weeks back popped into my mind. Using Ajax Push for this application was perfect, it was all server push rather than client communication and the concept would work wonderfully.

What now?

We’re faced with a few dilemmas. This problem is 99% Ajax/Long Polling and 1% frontend. An Android and IOS app need to be developed to interface with the system, but, that is the simple part of the project.


At first I considered Python/Pyramid as the frontend, Varnish for caching content and APE for handling the Ajax Push/Long Polling. I’ll need to write an API to handle the Android and IOS Authenticating and communicating with the system. I suspect my app will become an OAuth2 endpoint for the apps which I’ll explain in a moment.

It was at this point that I realized, I could use node.js and to handle the long polling, but, the frontend requirements are so lightweight, I could do most of the web app in Node.js. Since I’m using node.js quite heavily, I’ll probably use Redis and CouchDB to do my storage – just in case.


Now, I had an epiphany. While I don’t really intend to open the API for the project initially, there’s a certain logic to making your own project utilize the same API that you will later make public. If anything, it makes designing your IOS and Android app easier since they utilize an API rather than relying on separate methods for communications with the webapp. One single interface rather than two and later if Windows Mobile gets an app, we’ve already got the API designed. Since we’re an OAuth2 endpoint, our mobile apps can take advantage of numerous existing libraries – saving quite a bit of time.

Later, if the API is made public, we’re not facing a new engineering challenge and we’ve had some first-hand experience with the API.

Recently there has been a lot of discussion about using ‘the right tool for the job’ and why that is wrong. ‘Use the same language for every part of the project’ is the other school of thought. There are things I know Python does well, there are things I know it doesn’t do well. There are things Erlang can handle, and things it shouldn’t. While I’m not a fan of Javascript, for this project, it really does seem like the right tool for the job. The difference between APE and node.js was Spidermonkey versus V8. In both cases, I’m writing Javascript, so, why not choose the option that has a much larger installed base – and a demo that has a use case very similar to my final app.

Now what?

While I’ve not used node.js, I’m expecting the next few days to be a rapid iteration of development and testing.

…and I’ll be using git. :)

git init

Using Redis (or Memcached) as a buffer for SQL resulting in near-realtime stats

Saturday, October 23rd, 2010

While I’ve used memcached for a number of things where MySQL’s query cache just isn’t quick enough, the nature of a key-value store without unions didn’t work for this particular project. While it would have been easy enough to run memcached alongside Redis, two software stacks to solve the same problem wasn’t appealing.

What I’ve come up with will work for either memcached or Redis and the theory is simple:

Create a unique key for your counter, increment the key, store the key in a list. Have a separate process iterate through the list, write the summarized data to your database, reinsert key into list if it is for the current hour.

Using r as our Redis object, the pseudocode looks like:

    dayhour_key = time.strftime('%Y%m%d%H', time.localtime())
    r.sinterstore('processlog', ['log'])
    numitems = r.scard('processlog')    # return number of items in our set 'log'

    for loop in range(0, numitems):
        logkey = r.spop('processlog')     # grab an item from our set 'log' and delete it from the set
        (table,dayhour) = logkey.split(':')
        count = r.get(logkey)     # get the count from our key
        if count == 0:
            # if the count is 0, delete the key (leftover from same hour decrement)
            if dayhour < dayhour_key:
                # do our atomic update/insert incrementing table by count
                r.srem('log', logkey)
                r.delete(logkey)             # delete our key, it is not from the current hour
                # if we are processing the current hour, we must decrement by count in case 
                # another process modified the value while we were working
                r.decrby(logkey, count)    # decrement the key by count
                r.sadd('log', logkey)          # add the key to our set for processing

The concept is to use a key that is as granular as the data you want to keep. In this case we append a datehour stamp of yyyymmddHH (year, month, day, hour) to our unique id and end up with a key of stat:id:datehour. We use stat: to signify that the entry is for Statistics. For Zone 1 we end up with a key of stat:1:2010102314 (assuming 2pm) which is incremented and added to our 'log' set. When our log daemon runs in the current hour, we decrement the key by the count, and readd it to our set. If the log daemon runs on something in the past, we know that it cannot receive any updates so we are free to delete the key. Since we have used pop to remove the item from the list, any data prior to the current hour is automatically removed from the set, but, we need to add any key from the current hour.

We decrement the key by the count in the current hour just in case something else has updated that key while we were working. If in the next hour the count is 0, our routine skips it, but, still needs to delete the key.

In preliminary testing, we've saved roughly 280 transactions per second and stats are rarely more than a minute or two behind realtime. It also allowed us to move from daily to hourly statistics. The same theory could be applied to per-minute statistics as well.

Entries (RSS) and Comments (RSS).
Cluster host: li