Archive for the ‘Programming Languages’ Category

Why do you use an Object Relational Mapping (ORM) System in Development?

Monday, October 12th, 2009

Here’s a programmer that is saying goodbye to ORMs at Hatful of Hollow.

And another site offering a tutorial of sorts dealing with ORMs Why should you use an ORM.

While both have their points, both have missed a fundamental benefit that an ORM hands you.

Most of my development is in Pylons. Django’s ORM and template language can do the same thing. A programmer that has used PHP/Smarty to develop large scale systems will likely resist ORMs. After working with a team to develop 90k+ lines of PHP/Smarty over a six year period, making the shift required a paradigm shift.

Let’s consider the following structure. We have a cp_ticket table and a cp_ticket_detail table. A Ticket can have multiple detail records. The output we wish to have is:

ticket id, ticket header information
         ticket detail line
         ticket detail line #2
ticket id, ticket header information
         ticket detail line
         ticket detail line #2
         ticket detail line #3
ticket id, ticket header information
         ticket detail line
         ticket detail line #2

Our model:

class cp_ticket(DeclarativeBase):
    __tablename__ = 'cp_ticket'

    ticket_id = Column(mysql.MSBigInteger(20, unsigned = True), primary_key=True, autoincrement = True)
    priority = Column(mysql.MSEnum('1','2','3','4','5'), default = '3')

    ticket_detail = relation('cp_ticket_detail', order_by='cp_ticket_detail.ticket_detail_id')

class cp_ticket_detail(DeclarativeBase):
    __tablename__ = 'cp_ticket_detail'

    ticket_id = Column(mysql.MSBigInteger(20, unsigned = True), ForeignKey('cp_ticket.ticket_id'), default = '0')
    ticket_detail_id = Column(mysql.MSBigInteger(20, unsigned = True), primary_key=True, autoincrement = True)
    stamp = Column(mysql.MSTimeStamp, PassiveDefault('CURRENT_TIMESTAMP'))
    detail = Column(mysql.MSLongText, default = '')

Our query to pass to our template:

        tickets = meta.Session.query(cp_ticket).filter(cp_ticket.client_id==1).all()

Compared with the query as you would write it without an ORM:

select * from cp_ticket,cp_ticket_detail where client_id=1 and cp_ticket.ticket_id=cp_ticket_detail.ticket_id;

Both are doing the same fundamental thing, but, the ORM maps the results almost identical to the way we want to display the data. This makes template design easy.

Using Mako, we use the following code to display the results:

<table border="1">
 <tr><th>Ticket ID</th><th>Status</th><th>Detail</th></tr>
%for ticket in tmpl_context.tickets:
  <tr>
    <td><strong>${ticket.ticket_id}</strong></td>
    <td><strong>${ticket.priority}</strong></td>
  </tr>
  %for detail in ticket.ticket_detail:
  <tr>
    <td></td>
    <td>${detail.stamp}</td>
    <td>${detail.detail}</td>
  </tr>
  % endfor
% endfor
</table>

To do the same thing without using an ORM, you need to revert to a control break structure similar to the following:

current_ticket=0
for ticket in tickets:
  if (current_ticket != ticket.ticket_id):
    #new row, print the header
    print "<tr><td>first piece</td></tr>"
    current_ticket = ticket.ticket_id
  # print our detail row
  print "<tr><td></td><td>stamp and detail</td></tr>"

Control Break structures require you to be able to set a variable within your template language. Some template languages don’t allow that. If your template language (in any language) can’t do variable assignments in the template, guess where your html generation logic needs to go?

With an ORM, the template contains your display logic. Your webmaster/design team can modify the template without having to modify html contained within your code. The loops are simple to understand and designers usually have little problem avoiding the lines that start with %.

Sure, you could wrap much of this logic in your template to do the control-break structure, but, as you get more complex data, deciding how to display the data requires a define or some other functionality.

An ORM adds some insulation to the process, but, the result is a much easier page structure when displaying related data. Granted there are some performance hits and SQLAlchemy appears to create some queries that are not optimal, unless there is a tremendous performance hit, I think the benefits of the ORM for developing a web application are tremendous.

Once you move into an environment where you are dealing with multiple developers, having a defined schema with comments is much easier than using reflection to figure out what the meaning of a status field as enum(‘U’,'A’,'P’,'C’,'R’,'S’).

However, as the original poster mentions, you can do raw SQL within SQLAlchemy and do all of your work with reflection as he has done with his ORM^H^H^H, abstraction. If he’s still using SQLAlchemy, he can selectively decide when to use it and when to avoid it.

del.icio.us Digg Facebook Google Yahoo Buzz StumbleUpon Twitter

mysql-python and Snow Leopard

Tuesday, September 1st, 2009

After the upgrade to Snow Leopard, mysql-python cannot be installed through easy_install.

* Install mysql’s x86_64 version from the .dmg file (I had a problem doing this when booted into the 64bit kernel, a 32bit kernel macbook had no problem) With the 64bit kernel, the system reported ‘no mountable file systems’ when trying to mount the .dmg file. A reboot into 32bit mode allowed the .dmg to be mounted and installed.
* Change into your virtual environment if desired, source bin/activate
* fetch MySQL-python-1.2.3c1

tar xzf MySQL-python-1.2.3c1.tar.gz
cd MySQL-python-1.2.3c1
ARCHFLAGS='-arch x86_64' python setup.py build
ARCHFLAGS='-arch x86_64' python setup.py install

If everything works, you should see:

$ python
Python 2.6.1 (r261:67515, Jul  7 2009, 23:51:51)
[GCC 4.2.1 (Apple Inc. build 5646)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import MySQLdb
>>>

Some of the possible things you’ll encounter:

After python setup.py build:

ld: warning: in build/temp.macosx-10.6-universal-2.6/_mysql.o, file is not of required architecture
ld: warning: in /usr/local/mysql/lib/libmysqlclient.dylib, file is not of required architecture
ld: warning: in /usr/local/mysql/lib/libmygcc.a, file is not of required architecture
ld: warning: in build/temp.macosx-10.6-universal-2.6/_mysql.o, file is not of required architecture
ld: warning: in /usr/local/mysql/lib/libmysqlclient.dylib, file is not of required architecture
ld: warning: in /usr/local/mysql/lib/libmygcc.a, file is not of required architecture

This means that you have the i386 version of mysql installed. Or, if you have the x86_64 version, you have didn’t include the proper ARCHFLAGS command.

ImportError: dynamic module does not define init function (init_mysql)

this means that the easy_install or the build/install process was run which tried to build a ppc/i386/x86_64 combined build.

If you see messages like:

$ python
Python 2.6.1 (r261:67515, Jul  7 2009, 23:51:51)
[GCC 4.2.1 (Apple Inc. build 5646)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import MySQLdb
/Users/xxxxx/Python/django/lib/python2.6/site-packages/MySQL_python-1.2.3c1-py2.6-macosx-10.6-universal.egg/_mysql.py:3: UserWarning: Module _mysql was already imported from /Users/xxxxx/Python/django/lib/python2.6/site-packages/MySQL_python-1.2.3c1-py2.6-macosx-10.6-universal.egg/_mysql.pyc, but /Users/xxxxx/Python/django/MySQL-python-1.2.3c1 is being added to sys.path
>>>

Then you are still in the build directory. cd .. and try again.

del.icio.us Digg Facebook Google Yahoo Buzz StumbleUpon Twitter

Google’s App Engine goof

Friday, July 3rd, 2009

While Google’s App Engine is a well planned service and it does work incredibly well for what it does, sometimes things break due to resource limits, etc.

While the app engine platform is still running, it appears to be an issue with this particular application’s committed resources. The App Gallery has exceeded it’s memory quota.

Google App Engine App Gallery

del.icio.us Digg Facebook Google Yahoo Buzz StumbleUpon Twitter

User Interface Design

Wednesday, June 24th, 2009

Programmers are not designers. Technical people should not design User Interfaces.

* 810 source files
* 90658 lines of code
* 10213 lines of html

For an internal project tasked to a series of programmers throughout the years without enough oversight, it is a mass of undocumented code with multiple programming styles. PHP allowed lazy programming, Smarty didn’t have some of the finesse required, so, the User Interface suffered. Functional but confusing to anyone that hadn’t worked intimately with the interface or been walked through it.

The truest statement is that it is easier for me to do things through the MySQL command line than through the application. While this does have a tendency to introduce possible typos, it has altered SQL practices here.

update table set value=123 where othervalue=246;

could have an accidental typo of

update table set value=123 where othervalue-=246;

which would have completely unintended consequences. One typo altered the DNS entries for 48000 records. Shortly after that typo, ingrained in company policy was that I never wanted to ever see a query like that executed in the command line regardless of how simple the command.

Even within code, the above command would be entered as:

update table set value=123 where othervalue in (246);

This prevented a number of potential typos. Even limit clauses with deletions were enforced to make sure things didn’t go too haywire in an update.

With Python, indenting is mandatory which results in multiple programmer’s code looking similar and easier to troubleshoot. Utilizing SQLAlchemy which enforces bind variables when talking with the database engine, we’ve eliminated the potential for a typo updating too many records. Even cascade deletes are enforced in SQLAlchemy even when running on top of MyISAM. With MVC, our data model is much better defined and we’re not tied down to remembering the relationship between two tables and possible dependencies. Conversion from the existing MySQL database to a DeclarativeBase model hasn’t been without issues, but, a simple python program allowed the generation of a simple model that took care of most of the issues. Hand tweaking the database model while developing the application has allowed for quite a bit of insight into issues that had been worked around rather than making adjustments to the database.

Fundamental design issues in the database structure were worked around with code rather than fixed. Data that should have been retained was not, relationships between tables was defined in code rather than in the database leading to a painful conversion.

When it was decided to rewrite the application in Python using TurboGears, I wasn’t that familiar with the codebase nor the user interface. Initially it was envisioned that the templates would be copied and the backend engine would be written to power those templates. After a few hours running through the application, and attempting the conversion on a number of templates, I realized the application was functional but it was extremely difficult to use in its current state. So much for having a programmer design an interface.

Some functionality from the existing system was needed so I peered into the codebase and was unprepared for that surprise. At this point it became evident that a non-programmer had designed the interface. While Smarty was a decent template language, it was not a formtool, so, methods were designed to give a consistent user experience when dealing with error handling. A single php file was responsible for display, form submission and validation and writing to the database for each ‘page’ in the application. The code inside should have been straightforward.

* Set up default CSS classes for each form field for an ‘ok’ result
* Validate any passed values and set the CSS class as ‘error’ for any value that fails validation
* Insert/Update the record if the validation passes
* Display the page

Some validation takes place numerous times throughout the application, and, for some reason one of the ‘coders’ decided that copy and paste of another function that used that same validation code was better than writing a function to do the validation. Of course when that validation method needed to be changed, it needed to be changed in eight places.

So, what should have been somewhat simple has changed considerably:

* Evaluate each page
* Redesign each page to make the process understandable
* Adjust terminology to make it understandable to the application’s users
* modify the database model
* rewrite the form and validation

A process that should have been simple has turned into quite a bit more work than anticipated. Basically, development boils down to looking at the page, figuring out what it should be, pushing the buttons to see what they do and rewriting from scratch.

TurboGears has added a considerable amount of efficiency to the process. One page that dealt with editing a page of information was reduced from 117 lines of code to 12 lines of code. Since TurboGears uses ToscaWidgets and Formencode, validation and form presentation is removed from the code resulting in a controller that contains the code that modifies the tables in the database with validated input. Since Formencode already has 95% of the validators that are needed for this project, we can rest assured that someone else has done the work to make sure that field will be properly validated. Other validation methods can be maintained and self-tested locally, but, defined in such a manner that they are reused throughout the application rather than being cut and pasted into each model that is validating data. In addition, bugs should be much less frequent as a result of a much-reduced codebase.

Due to the MVC framework and the libraries selected by the developers at TurboGears, I wouldn’t be surprised if the new codebase is 10%-15% the size of the existing application with greater functionality. The code should be more maintainable as python enforces some structure which will increase readability.

While I am not a designer, even using ToscaWidgets and makeform, the interface is much more consistent. Picking the right words, adding the appropriate help text to the fields and making sure things work as expected has resulted in a much cleaner, understandable interface.

While there are some aspects of ToscaWidgets that are a little too structured for some pages, our current strategy is to develop the pages using ToscaWidgets or makeform to make things as clear as possible making notes to overload the Widget class for our special forms at a later date.

While it hasn’t been a seamless transition, it did provide a good opportunity to rework the site and see a number of the problems that the application has had for a long time.

del.icio.us Digg Facebook Google Yahoo Buzz StumbleUpon Twitter

Recursive Category Table in Python

Saturday, June 13th, 2009

While working with a project, the problem with the recursive category table being built came up. The table holds the parent_id of the category name and the result is a list with the id and the name of the category. The category name is prepended with spaces equivalent to the level of indentation.

model:

class AchievementCategory(DeclarativeBase):
	__tablename__ = 'achievement_categories'

	id = Column(mysql.MSBigInteger(20, unsigned = True), primary_key = True)
	parent_id = Column(mysql.MSBigInteger(20, unsigned = True), default = 0)
	name = Column(Unicode(80))

code:

def get_cats(n = 0, c_list = [], level = 0):
	sql = DBSession.query(AchievementCategory).filter_by(parent_id = n).order_by(AchievementCategory.name).all()
	for e in sql:
		c_list.append([e.id, level * " " + e.name, level])
		get_cats(e.id, c_list, level+1)
	return c_list

print get_cats()
del.icio.us Digg Facebook Google Yahoo Buzz StumbleUpon Twitter

TurboGears, Tableform and a callable option to a Widget

Wednesday, June 10th, 2009

While doing some TurboGears development I ran into an issue where I needed to generate a select field’s options from the database that was dependent on authentication. Since defining the query in the model results in a cached result when the class is instantiated, the query couldn’t be defined there. There are multiple mentions of using a callable to deal with this situation, but, no code example.

From this posting in Google Groups for TurboGears, we were able to figure out the code that made this work.

template:

<div xmlns="http://www.w3.org/1999/xhtml"
      xmlns:py="http://genshi.edgewall.org/"
      xmlns:xi="http://www.w3.org/2001/XInclude"
      py:strip="">

${tmpl_context.form(value=value)}

</div>

controller:

    @expose('cp.templates.template')
    def form(self):
        c.form = TestForm()
        c.availips = [[3,3],[2,2]]
        return dict(template='form',title='Test Form',value=None)

model:

from pylons import c

def get_ips():
    return c.availips

class TestForm(TableForm):
    action = '/test/testadd'
    submit_text = 'Add test'

    class fields(WidgetsList):
        User = TextField(label_text='FTP Username', size=40, validator=NotEmpty())
        Password = PasswordField(label_text='FTP Password', size=40, validator=NotEmpty())
        ip = SingleSelectField(label_text="Assign to IP",options=get_ips)
del.icio.us Digg Facebook Google Yahoo Buzz StumbleUpon Twitter

Combined Web Site Logging splitter for AWStats

Friday, May 29th, 2009

AWStats has an interesting problem when working with combined logging. When you have 500 domains and combined logfiles at roughly 2 gigabytes a day, awstats spends a lot of time shuffling through all of the log files to return the results. The simple solution appeared to be a small python script that read the awstats config directory and split the logfile into pieces so that awstats could run on individual logfiles. It requires one loop through the combined logfile to create all of the logfiles, rather than looping through the 2 gigabyte logfile for each domain when awstats was set up with combined logging.

#!/usr/bin/python

import os,re
from string import split

dirs = os.listdir('/etc/awstats')

domainlist = {}

for dir in dirs:
  if (re.search('\.conf$',dir)):
    dom = re.sub('^awstats\.', '', dir)
    dom = re.sub('\.conf$', '', dom)
    domainlist[dom] = 1

loglist = open('/var/log/apache2/combined-access.log.1','r')
for line in loglist:
  (domain,logline) = line.split(None, 1)
  if (domain in domainlist):
    if (domainlist[domain] == 1):
      domainlist[domain] = open('/var/log/apache2/' + domain + '-access.log.1', 'w')
    domainlist[domain].write(logline)

While the code isn’t particularly earthshattering, it cut down log processing to roughly 20 minutes per day rather than the previous 16-30 hours per day.

del.icio.us Digg Facebook Google Yahoo Buzz StumbleUpon Twitter

RSA with Perl, PHP and Python

Tuesday, April 28th, 2009

Ages ago we had a system that used MySQL’s built in DES3 encryption. It made coding applications in multiple languages easy because we could send it a string with a key and it would be encoded in the database. It wasn’t secure if someone got hold of the code and the database, but, rather than use a hash, we could store things and get the plaintext back if we needed it. Due to some policy issues with Debian, DES3 encryption was removed from MySQL and we were faced with converting that data to another format. We chose RSA for the fact that it was supported in every language we were currently developing in — Perl, PHP and C, and we knew it was supported in Python even though we hadn’t started development with Python at the time.

However, due to issues with PHP and long key lengths (at the time the code was written), our payloads had to be broken into packets smaller than the smallest key length which was 256 bytes. Since we didn’t know if we would run into similar issues using a longer packet length even if we had a longer key, we opted to use a packet size smaller than the smallest key we could generate. Our code initially converted data using PHP, stored the data in MySQL, and allowed a Perl script to access the data. Getting Perl and PHP to cooperate was somewhat difficult until we dug into PHP’s source code to see just how they were handling RSA.

PHP code:

define("ENCPAYLOAD_FORMAT",'Na*');

function cp_encrypt($hostname,$message) {
  $public_key=file_get_contents(KEY_LOCATION . $hostname . '.public.key');
  openssl_get_publickey($public_key);

  $blockct = intval(strlen($message) / 245)  + 1;
  $encpayload = "";
  for ($loop=0;$loop<$blockct;$loop++) {
    $blocktext = substr($message,$loop * 245, 245);
    openssl_public_encrypt($blocktext,$encblocktext,$public_key);
    $encpayload .= $encblocktext;
  }
  return(pack(ENCPAYLOAD_FORMAT,$blockct,$encpayload));
}

function cp_decrypt($hostname,$message) {
  $priv_key=file_get_contents(KEY_LOCATION . $hostname . '.private.key');
  openssl_get_privatekey ($priv_key);
  $arr = unpack('Nblockct/a*',$message);
  $blockct = $arr['blockct'];$encpayload=$arr[1];
  $decmessage = "";
  for ($loop=0;$loop<$blockct;$loop++) {
    $blocktext = substr($encpayload, $loop*256, 256);
    openssl_private_decrypt($blocktext,$decblocktext,$priv_key);
    $finaltext .= $decblocktext;
  }

  return($finaltext);
}

Perl Code:

use Crypt::OpenSSL::RSA;
use constant ENCPAYLOAD_FORMAT => 'Na*';

sub cp_encrypt {
  my $hostname = shift;
  my $message = shift;

  my $keyfile = $KEY_LOCATION . $hostname . '.public.key';

  if (-e $keyfile) {
    open PUBLIC, $keyfile;
    my $public_key = do{local $/; };
    close(PUBLIC);
    my $rsa = Crypt::OpenSSL::RSA->new_public_key($public_key);
    $rsa->use_pkcs1_padding();

    my $blockct = int(length($message) / 245)  + 1;
    my $encpayload = "";
      for ($loop=0;$loop<$blockct;$loop++) {
        $encpayload .= $rsa->encrypt(substr($message,$loop * 245, 245));
      }
    return(pack(ENCPAYLOAD_FORMAT,$blockct,$encpayload));
  }
  return(-1);
}

sub cp_decrypt {
  my $hostname = shift;
  my $message = shift;

  my $keyfile = $KEY_LOCATION . $hostname . '.private.key';
  if (-e $keyfile) {
    open PRIVATE, $keyfile;
    my $private_key = do{local $/; };
    close(PRIVATE);
    my $rsa = Crypt::OpenSSL::RSA->new_private_key($private_key);
    $rsa->use_pkcs1_padding();
    my ($blockct,$encpayload) = unpack(ENCPAYLOAD_FORMAT,$message);
    my $decmessage = "";
    for ($loop=0;$loop<$blockct;$loop++) {
      $decmessage .= $rsa->decrypt(substr($encpayload, $loop*256, 256));
    }
    return($decmessage);
  }
  return(-1);
}

1;

Python code:

import M2Crypto.RSA
from os import path
from struct import unpack, pack, calcsize

ENCPAYLOAD_FORMAT = 'Na*'

def perl_unpack (perlpack, payload):
    if (perlpack == 'Na*'):
        count = calcsize('!L')
        perlpack = '!L%ds' % (len(payload) - count)
        return unpack(perlpack,payload)
    return

def perl_pack (perlpack, blockcount, payload):
    if (perlpack == 'Na*'):
        perlpack = '!L%ds' % len(payload)
        return pack(perlpack,blockcount,payload)
    return

def cp_encrypt (hostname,message):
  keyfile = KEY_LOCATION + hostname + '.public.key'
  if (path.exists(keyfile)):
    public_key = M2Crypto.RSA.load_pub_key(keyfile)

    blockct = int(len(message) / 245)  + 1
    encpayload = ""
    for loop in range(0,blockct):
      encpayload += public_key.public_encrypt(message[(loop*245):(245*(loop+1))],
                    M2Crypto.RSA.pkcs1_padding)
    return(perl_pack(ENCPAYLOAD_FORMAT, blockct, encpayload))
  return(-1)

def cp_decrypt (hostname, message):
  keyfile = KEY_LOCATION + hostname + '.private.key';
  if (path.exists(keyfile)):
    privatekey = M2Crypto.RSA.load_key(keyfile)
    (blockct,encpayload) = perl_unpack(ENCPAYLOAD_FORMAT,message)

    decmessage = ""
    for loop in range(0,blockct):
      decmessage += privatekey.private_decrypt(encpayload[(loop*256):(256*(loop+1))], M2Crypto.RSA.pkcs1_padding)
    return(decmessage);
  return(-1)

There is nothing really that restricts you from encrypting a message longer than the key size, but, if the message contains enough data, portions of the key can be discovered. Since the key is a very large prime number, exposing a portion of that key can be vital to decrypting the data.

del.icio.us Digg Facebook Google Yahoo Buzz StumbleUpon Twitter

Python, Perl and PHP interoperability with pack and unpack

Monday, April 27th, 2009

Perl has very powerful capabilities for dealing with structures.  PHP’s support of those structures was based on Perl’s wisdom.  Python went a different direction.

Perl pack/unpack definitions

PING_FORMAT => ‘(a4n2N2N/a*)@245′;
TASK_FORMAT => ‘a4NIN/a*a*’;
RETR_FORMAT => ‘a4N/a*N’;
ENCPAYLOAD_FORMAT => ‘Na*’;

PHP pack/unpack definitions

define(‘TASK_FORMAT’, ‘a4NINa*a*’);
define(“ENCPAYLOAD_FORMAT”,’Na*’);

For a communications package written in perl that communicates with 32 bit and 64 bit machines that may not share the same endian structure.  The problem I’ve run into now is that Python does not support the Perl method, and, I don’t know why they didn’t at least offer some compatibility.  pack and unpack give enormous power to communication systems between machines and their support of the perl methods allowed for reasonable interoperability between the two platforms.

Python on the other hand opted to not support some of the features, which was one issue, but, their requirement is that you cannot send variable length packets.

In Python, we’re able to replicate N, network endian Long by using !L:

>>> import struct
>>> print struct.unpack(‘!L’,'\0\0\1\0′);
(256,)

However, there is no method to support a variable length payload behind that value.  We’re able to set a fixed length like 5s, but, this means that we’ve got to know the length of the payload being sent.

>>> print struct.unpack(‘!L5s’,'\0\0\1\0abcde’);
(256, ‘abcde’)

If we overstate the size of the field, Python is more than happy to tell us that the payload length doesn’t match the length of the data.

>>> print struct.unpack(‘!L8s’,'\0\0\1\0abcde’);
Traceback (most recent call last):
File “<stdin>”, line 1, in <module>
File “/usr/lib/python2.5/struct.py”, line 87, in unpack
return o.unpack(s)
struct.error: unpack requires a string argument of length 12

The cheeseshop/pypi seems to show no suitable alternative which brings up a quandry.  For this particular solution, I’ll write a wrapper function to do the heavy lifting on the two unpack strings I need to deal with and then I’ll debate pulling the perl unpack/pack routines out of the perl source and wrapping it into an .egg, possibly for distribution.

del.icio.us Digg Facebook Google Yahoo Buzz StumbleUpon Twitter

From File_DB to BerkeleyDB

Thursday, April 2nd, 2009

File_DB lacked decent file locking and concurrence.  I wasn’t really willing to move to MySQL which would have solved the problem, but, added a few minor inconveniences along the way.  I needed to store a few thousand bytes for a number of seconds.  While File_DB was wrapped with file locking and assisted by my own lock routine, it lacked truly concurrent access which I felt was leading to some of the issues we were seeing.

However, after a relatively painless conversion from File_DB to BerkeleyDB, it did not solve the problem completely.  The error I was addressing is now much harder to get to occur in normal use, but, I am able to reproduce it with a small test script.

The documentation for the perl methods to access BerkeleyDB are a bit sparse for setting up CDB, but, after digging through the documentation, and a few examples on the net, I ended up with some code that did indeed work consistently.

Since CDB isn’t documented very well, I ended up with the following script to test file locking and ensure things worked as expected.

#!/usr/bin/perl

use Data::Dumper;
use BerkeleyDB;

my %hash;
my $filename = "filt.db";
unlink $filename;

my $env = new BerkeleyDB::Env
   -Flags => DB_INIT_CDB|DB_INIT_MPOOL|DB_CREATE;

my $db = tie %hash, 'BerkeleyDB::Hash',
  -Filename   => $filename,
  -Flags        => DB_CREATE,
  -Env        => $env
or die "Cannot open $filename: $!\n" ;

my $lock = $db->cds_lock();

$hash{"abc"} = "def" ;
my $a = $hash{"ABC"} ;
# ...
sleep(10);

print Dumper(%hash);
$lock->cds_unlock();
undef $db ;
untie %hash ;

Path issues caused most of the issues as did previous tests not actually clearing out the _db* and filt.db file. One test got CDB working, I modified a few things and didn’t realize I had actually broken CDB creation because the other files were still present. Once I moved the script to another location, it failed to work. A few quick modifications and I was back in business.

Perhaps this will save someone a few minutes of time debugging BerkeleyDB and Perl.

—–

Due to a logic error in the way I handled deletions to work around the fact that BerkeleyDB doesn’t allow you to delete a single record when you have a duplicate key, my code didn’t work properly in production. After diagnosing that and fixing it with a little bit of code, 125 successive tests resulted in 100% completion. I’ve pushed it to a few machines and will monitor it, but, I do believe that BerkeleyDB fixed the issues I was having with File_DB.

del.icio.us Digg Facebook Google Yahoo Buzz StumbleUpon Twitter