Embedded indexing versus Client/Server

For a particular application, I require temporary persistent storage of some data.  That data consists of a key value and a payload.  That key value can be a dupe, which is what causes the problem.

File_DB in perl handles duplicates and I can delete a key/value pair without too much difficulty.  However, file locking is not handled very well with File_DB which created concurrency issues with the threaded daemon.

Sqlite3 had no problem with duplicates, and could be compiled with the delete from/limit clause to easily handle duplicate keys.  Rather than recompile the packaged Sqlite3 in Debian, I made a slight modification to the code on my side so that I could do further testing.  Due to a few issues with threading and a potential issue with storing binary data and retrieving it in perl, I needed to reevaluate.

BerkeleyDB solves a few problems.  It supports concurrency, it supports proper file locking, but, a minor limitation is that duplicate keys are not handled well when you want to delete a key.  It’ll require a rewrite of some functionality to use BerkeleyDB, but, I believe that solution will provide the least potential for failures.

I could have use MySQL which I am very comfortable with, but, the storage of the data really only needs to be there for a few minutes in most cases, and the amount of data stored is 10-20K at most.  With MySQL’s client timeout, I couldn’t really guarantee everything would work every time without writing in considerable error checking.  While MySQL would handle everything perfectly, it was overkill for the task at hand.

I’m rewriting the File_DB methods to use BerkeleyDB and modifying the saved data slightly to work around the key delete issue.

It should work and should raise the reliability of this process from 99.2% to 99.9% which will be a considerable improvement.

Tags: , , ,

Leave a Reply

You must be logged in to post a comment.

Entries (RSS) and Comments (RSS).
Cluster host: li