Using Redis (or Memcached) as a buffer for SQL resulting in near-realtime stats
Saturday, October 23rd, 2010While I’ve used memcached for a number of things where MySQL’s query cache just isn’t quick enough, the nature of a key-value store without unions didn’t work for this particular project. While it would have been easy enough to run memcached alongside Redis, two software stacks to solve the same problem wasn’t appealing.
What I’ve come up with will work for either memcached or Redis and the theory is simple:
Create a unique key for your counter, increment the key, store the key in a list. Have a separate process iterate through the list, write the summarized data to your database, reinsert key into list if it is for the current hour.
Using r as our Redis object, the pseudocode looks like:
dayhour_key = time.strftime('%Y%m%d%H', time.localtime()) r.sinterstore('processlog', ['log']) numitems = r.scard('processlog') # return number of items in our set 'log' for loop in range(0, numitems): logkey = r.spop('processlog') # grab an item from our set 'log' and delete it from the set (table,dayhour) = logkey.split(':') count = r.get(logkey) # get the count from our key if count == 0: # if the count is 0, delete the key (leftover from same hour decrement) r.delete(logkey) else: if dayhour < dayhour_key: # do our atomic update/insert incrementing table by count r.srem('log', logkey) r.delete(logkey) # delete our key, it is not from the current hour else: # if we are processing the current hour, we must decrement by count in case # another process modified the value while we were working r.decrby(logkey, count) # decrement the key by count r.sadd('log', logkey) # add the key to our set for processing
The concept is to use a key that is as granular as the data you want to keep. In this case we append a datehour stamp of yyyymmddHH (year, month, day, hour) to our unique id and end up with a key of stat:id:datehour. We use stat: to signify that the entry is for Statistics. For Zone 1 we end up with a key of stat:1:2010102314 (assuming 2pm) which is incremented and added to our 'log' set. When our log daemon runs in the current hour, we decrement the key by the count, and readd it to our set. If the log daemon runs on something in the past, we know that it cannot receive any updates so we are free to delete the key. Since we have used pop to remove the item from the list, any data prior to the current hour is automatically removed from the set, but, we need to add any key from the current hour.
We decrement the key by the count in the current hour just in case something else has updated that key while we were working. If in the next hour the count is 0, our routine skips it, but, still needs to delete the key.
In preliminary testing, we've saved roughly 280 transactions per second and stats are rarely more than a minute or two behind realtime. It also allowed us to move from daily to hourly statistics. The same theory could be applied to per-minute statistics as well.