Posts Tagged ‘file_db’

From File_DB to BerkeleyDB

Thursday, April 2nd, 2009

File_DB lacked decent file locking and concurrence.  I wasn’t really willing to move to MySQL which would have solved the problem, but, added a few minor inconveniences along the way.  I needed to store a few thousand bytes for a number of seconds.  While File_DB was wrapped with file locking and assisted by my own lock routine, it lacked truly concurrent access which I felt was leading to some of the issues we were seeing.

However, after a relatively painless conversion from File_DB to BerkeleyDB, it did not solve the problem completely.  The error I was addressing is now much harder to get to occur in normal use, but, I am able to reproduce it with a small test script.

The documentation for the perl methods to access BerkeleyDB are a bit sparse for setting up CDB, but, after digging through the documentation, and a few examples on the net, I ended up with some code that did indeed work consistently.

Since CDB isn’t documented very well, I ended up with the following script to test file locking and ensure things worked as expected.


use Data::Dumper;
use BerkeleyDB;

my %hash;
my $filename = "filt.db";
unlink $filename;

my $env = new BerkeleyDB::Env

my $db = tie %hash, 'BerkeleyDB::Hash',
  -Filename   => $filename,
  -Flags        => DB_CREATE,
  -Env        => $env
or die "Cannot open $filename: $!\n" ;

my $lock = $db->cds_lock();

$hash{"abc"} = "def" ;
my $a = $hash{"ABC"} ;
# ...

print Dumper(%hash);
undef $db ;
untie %hash ;

Path issues caused most of the issues as did previous tests not actually clearing out the _db* and filt.db file. One test got CDB working, I modified a few things and didn’t realize I had actually broken CDB creation because the other files were still present. Once I moved the script to another location, it failed to work. A few quick modifications and I was back in business.

Perhaps this will save someone a few minutes of time debugging BerkeleyDB and Perl.


Due to a logic error in the way I handled deletions to work around the fact that BerkeleyDB doesn’t allow you to delete a single record when you have a duplicate key, my code didn’t work properly in production. After diagnosing that and fixing it with a little bit of code, 125 successive tests resulted in 100% completion. I’ve pushed it to a few machines and will monitor it, but, I do believe that BerkeleyDB fixed the issues I was having with File_DB.

Embedded indexing versus Client/Server

Saturday, March 28th, 2009

For a particular application, I require temporary persistent storage of some data.  That data consists of a key value and a payload.  That key value can be a dupe, which is what causes the problem.

File_DB in perl handles duplicates and I can delete a key/value pair without too much difficulty.  However, file locking is not handled very well with File_DB which created concurrency issues with the threaded daemon.

Sqlite3 had no problem with duplicates, and could be compiled with the delete from/limit clause to easily handle duplicate keys.  Rather than recompile the packaged Sqlite3 in Debian, I made a slight modification to the code on my side so that I could do further testing.  Due to a few issues with threading and a potential issue with storing binary data and retrieving it in perl, I needed to reevaluate.

BerkeleyDB solves a few problems.  It supports concurrency, it supports proper file locking, but, a minor limitation is that duplicate keys are not handled well when you want to delete a key.  It’ll require a rewrite of some functionality to use BerkeleyDB, but, I believe that solution will provide the least potential for failures.

I could have use MySQL which I am very comfortable with, but, the storage of the data really only needs to be there for a few minutes in most cases, and the amount of data stored is 10-20K at most.  With MySQL’s client timeout, I couldn’t really guarantee everything would work every time without writing in considerable error checking.  While MySQL would handle everything perfectly, it was overkill for the task at hand.

I’m rewriting the File_DB methods to use BerkeleyDB and modifying the saved data slightly to work around the key delete issue.

It should work and should raise the reliability of this process from 99.2% to 99.9% which will be a considerable improvement.

Multithreaded madness

Monday, March 23rd, 2009

An application I wrote long ago that used File_DB for short-term persistent multithreaded storage had a few issues with concurrency.  I debated rewriting the script to use BerkeleyDB which included proper file locking, but, decided to use Sqlite3 instead as it was closer to SQL and would eliminate a bit of code.

The transition was relatively easy.  Writing self-test functions worked well and a few bugs were dealt with along the way.  Most of the issues were getting used to Sqlite3’s quirks, but, all in all the code worked fine.  Multiple tests with multiple terminal windows resulted in everything working as expected including locking tables, concurrency issues and removing a logic error on the prior application.

First startup of the application resulted in a rather strange result which didn’t make a lot of sense.  I chalked that up to something I had done during testing, deleted the sqlite3 database file and restarted the application.


The application started, set it self as a daemon and detached from the terminal.  I sent a task to the daemon, and bam.  It seemed to work, it complained of a length error in the unpack which meant there was some data that didn’t get retrieved correctly from the database.  A second task was sent and the error received was even stranger.  Trying to connect to sqlite3 through the command line resulted in:

sqlite> select * from tasks;
SQL error: database disk image is malformed

Ok, something broke badly.

I checked and doublechecked my code with perl running in strict mode and could find nothing that would cause this.  It seems that the packaged version of sqlite3 in debian’s packaged perl is not compiled with threading enabled.


I missed that when I was digging through the library configs and will have to build that package for testing.  I did want to avoid using the BerkeleyDB library and move to Sqlite3, but, I think in the interest of finishing this project a little sooner, I will rewrite the code and adjust the locking and use Sqlite3 in the future.

Sqlite3 works very well with SQLAlchemy and TurboGears, but, in this case, it didn’t quite solve the problem that I needed solved.

Entries (RSS) and Comments (RSS).
Cluster host: li