Archive for the ‘Programming Languages’ Category

Variable Naming Conventions

Friday, January 13th, 2012

While working on a new project I have to come up with a database schema and variable names for use throughout the project. While some names appear humanized, there is some logic to choosing names that are more descriptive.

Consider the difference between start_date and end_date as variables. While you read the code, when you later look at the database schema, you may not make the association that they are related. date_start and date_end are probably better names.

Likewise when you’re dealing with storing the files submitted by users, thumbnail, high_res, low_res probably isn’t as descriptive as file_thumbnail, file_highres, file_lowres.

Later, when you decide to add dimensions, you can then have file_thumbnail, file_thumbnail_width and file_thumbnail_height and your database schema will be more readable.

Also, you can do the same with table names. trouble_ticket, trouble_ticket_detail and trouble_ticket_attachment will make it easier to associate than trouble_ticket, attachment_ticket, ticket_detail.

A little planning can make things much easier down the road.

Git it done

Sunday, December 25th, 2011

I’ve written software for a number of years and I’ve used a lot of different version control systems. From the old VMS ;12 days to today, where I primarily use git.

For the last nine years, I’ve used SVN with its quirky Apache DAV setup and the stupid uid/gid issues of running svn on a development server where that Apache was also used for testing. Ok, so, that was a poor architecture choice on my part.

With Pyramid, I started to run into small issues that I knew I could fix and my early tickets consisted of

diff -Naur

output pasted into the ticket, or, telling the team what to fix. While dealing with Pyramid, I found a bug, broke down and decided to submit bug fixes the right way.

I forked it, I cloned it, I made my changes, I did my git add . and git commit, followed by my git push, then, from the web interface, created a Pull request. I do intend to figure out how to do the fork and pull without having to resort to the web interface. I don’t remember how long it took for the fix to be imported, but, it wasn’t long. It was a very minor change to make some templates XHTML compliant, but, the project lead merely had to merge my fix (if they agreed) and it was done. They didn’t have to remake the changes on their copy of the source.

Git isn’t that hard

With that newfound appreciation, I submitted a fix to Pyramid OpenID which took roughly a month to get incorporated. It was a small fix, but again, very little effort required to merge the changes in.

I’ve used GitHub, Bitbucket, code.google.com (for SVN and Git) and recently set up Gitosis with Gitweb for some private repositories. After a few months of working with git, I exported all of my local SVN repositories and imported them into git. Over the next few days, I would find a few stray projects that were quick weekend hacks that I never used VCS for, and decided to import them as well.

While I have had a mystery issue with a commit that appears from out of nowhere on GitHub that no one seems to be able to fix, overall, my experience with git has been fairly positive. Once I merge the other branch, the mystery commit should disappear, but, it is annoying having to specify the branch on every push request so that git push doesn’t submit changes to master and my branch. If I forget, I have a git command I put in a bash script that allows me to revert back to the version before that change.

Why use Version Control Software as the sole developer?

First and foremost, it is a simple, almost realtime backup (to your last commit) of your codebase. You can go back in time and look at the changes to see what changed. You can commit chunks of code as ‘save points’ so you can look back and see what has gone on. GitHub seems to be the preferred repo for Open Source projects, though, if you have five or fewer team members and need a private repo, BitBucket might fill that need since they charge per team member rather than by number of projects. With fewer than five team members, BitBucket’s private repository hosting is free. Or, if you feel like setting up Gitosis+Gitweb, you can host your own private repo on a machine where you have a shell account.

Deploying code from git is easy as well. git pull, restart apache, done. It isn’t difficult to set up multiple branches so that you have a production, staging and development branch. This allows you to fix bugs on staging and production, then push to production while having longer term additions handled on your development branch.

What about multiple users?

This is where git, or any version control software, really becomes powerful. Multiple people can work on the same codebase, changes can be merged, branches and tags can be used to do tests without affecting production code and then later merged.

How do I set up Gitosis?

I used the guide from here and had Gitosis running after 15 minutes. I tried Gitolite prior to this, but, preferred Gitosis. After a few days, I decided it was time to set up Gitweb which was fairly straightforward. If you get a 404 when viewing your gitweb root, make sure there is no trailing / on your $projectroot.

What benefit is there?

If you’re doing any development, use version control. It doesn’t matter which one, just use one. If you have multiple people on your team, absolutely use version control. It ends the ‘what did you change?’ phase of software development when something breaks. With git or any other VCS, you gain accountability. You can see who made which changes and track the evolution of a problem. Maybe you want to test a new feature and keep it separate from production – use branches or tags. Once that branch is declared complete, you can merge it with production. Even if there are modifications made to the master, you can merge those in along the way so that you’re not maintaining two codebases that require a large merge later on. Conflict resolution is a little cumbersome, but, it is much easier to keep a development branch in sync with staging/production bugfixes than it is to do a huge merge at the end of a large project. Save yourself some time when working on a new branch and merge master in frequently.

What do I use?

Mostly public projects? GitHub, BitBucket, Code.google.com, SourceForge (really, they are coming back and their offerings do include git)

A few small private projects and some public projects? BitBucket is free if you have fewer than five team members. GitHub seems somewhat costly for a small organization to have private repos. Gitosis can be run on a single account on a small VPS.

Mostly private projects? Gitosis. It took 15 minutes to get it set up and working and import the first project. It took a few days before I installed GitWeb which isn’t needed, but, is a handy tool at times to look through the commit logs.

Do I have to use git?

No, you can use svn, mercurial or git or any other version control software. You’ll find that the open source world seems to have embraced git and GitHub appears to be the most popular hosting for open source projects.

If you’re managing an open source project, the number of people familiar with git is increasing every day.

What am I missing?

The only thing I miss is the updated release or version number. I’ve not found a way to update a listed ‘release’ number that can be incorporated in file headers. That way, if I look at a production system, I can easily see what codebase it is using. I’d like to put that version number in a template. With SVN, I could set an svn-property and use $Revision$. While I’ve been manually updating the build id in a template, it would be nice to have that hook without a git add/git commit/update/git add/git commit.

For a few projects we’re using Sphinx for documentation, and, having those docs autobuild and push to the document hosting would be nice. I believe this can be done with git hooks, but, I haven’t really investigated it too much.

Version control of files that should be hidden. .gitignore doesn’t track files, but, if I want to have files tracked, but not published, I haven’t found a way to do that. I have a document push script that I don’t want to be public, but, I would like it if I could do a git pull and not have to find an old repo with a copy of that script each time. I know you can set up masking on scripts which would allow me to hide the hostname or other private parts of files – allowing me to post actual production.ini or development.ini to the repo. I find that important because documentation is usually clipped from a production file, but, when new changes are made, sometimes modifications to those files are forgotten and one needs to dig around to see what changed.

All in all, git works very well. Any version control software is a benefit. Use it.

Documenting projects as you write them

Monday, December 12th, 2011

In Feb 2009 I started converting a PHP application over to Python and a framework. Today, I have finished all of the functional code and am at the point where I need to write a number of interfaces to the message daemon (currently a Perl script I wrote after ripping apart a C/libace daemon we had written).

The code did work in prior frameworks, I’ve moved to Pyramid, but, now I’m having to figure out why I ever used __init__ with 13 arguments. Of course everything is a wrapper around a wrapper around a single call and nothing is documented other than some sparse comments. Encrypted RPC payloads are sent to the daemon – oops, I also changed the host and key I’m testing from.

Yes, I actually am using RPC, in production, the right way.

Total Physical Source Lines of Code (SLOC) = 5,154

The penultimate 3% has added almost 200 lines of code. I suspect the last 2% adding the interfaces will add another 100 or so lines. Had I written better internal documentation, getting pulled away from the project for weeks or months at a time would have resulted in less ramp-up time when sitting back down to code. There were a few times where it would take me a few hours just to get up to speed with something I had written 18 months ago because I didn’t know what my original intentions were, or, what problem I was fixing.

Original PHP/Smarty project:

Total Physical Source Lines of Code (SLOC) = 45,040

In 2009, when I started this project, test code written resulted in roughly a 10:1 reduction in the codebase. It isn’t a truly fair comparison — the new code does more, has much better validation checks, and adds a number of features.

It’s been a long road and I’ve faced a number of challenges along the way. After Coderetreat, I’ve attempted to write testing as I’m writing code. That is a habit that I’ll have to reinforce. I don’t know that I’ll actually do Test Driven Development, but, I can see more test code being written during development, rather than sitting down with the project after it is done and writing test code. Additionally, I’m going to use Sphinx even for internal documentation.

People might question why I went with Turbogears, Pylons, and ended up with Pyramid, but, at the time I evaluated a number of frameworks, Django‘s ORM wasn’t powerful enough for some of the things I needed to do and I knew I needed to use SQLAlchemy. While Django and SQLAlchemy could be used at the time, I felt TurboGears was a closer match. As it turns out, Pyramid is just about perfect for me. Light enough that it doesn’t get in the way, heavy enough that it contains the hooks that I need to get things done.

If I wrote a framework, and I have considered it, Pyramid is fairly close to what I would end up with.

Lesson learned… document.

Today is going to be a very frustrating day wiring up stuff to classmethods that have very little documentation and buried __init__ blocks. Yes, I’ll be documenting things today.

Global Day of Coderetreat

Sunday, December 4th, 2011

Yesterday I participated in the Global Day of Coderetreat. Coderetreat is an event inspired by Corey Haines who spent time traveling around the country teaching groups fundamentals of software development – asking for just enough money to make it to the next city, accommodations on someone’s couch, etc.

It is a one day, intensive. pairs programming exercise focused on Test Driven Development. CoderetreatMiami was organized by Tom Ordonez and Carlos Ordonez from Aeronautic Investments, Inc. and facilitated by Bryce Kerley and Michael Feathers.

After a brief intro, we were explained the rules for Conway’s Game of Life – basically, you have a matrix, and look through each live node. If the node has two or three neighbors, it lives, otherwise it dies. Then, you look at all of the dead nodes and if it has three living neighbors, it comes alive.

Then, we were told to choose a partner to pair with and start writing code. We chose python, wrote a quick library, some functions and did some test code to make sure our function was working as it should. We chose tuples stored in a list. After 40 minutes, we started working in unittesting when we were told that time was up, delete your code…

What? People were a little shocked. We just spent 45 minutes writing the code, delete it? Can’t save it, can’t work on it later, can’t save it to a repo, etc. Delete it.

rm -rf life/

After a few minute discussion, we’re told to pair up with another person – someone we haven’t paired with and do it again. This time I paired up with a Drupal/Javascript guy and we proceeded to write the game of life again. We didn’t get as far due to the fact that my Javascript coding isn’t as strong as my Python/Perl coding, but, we did have some tests written and had some functionality. Time’s up, delete your code. Again?!?

I then paired up with another person and we used Python. We changed our strategy a bit, decided to do a bounding box check for the alive portion to eliminate having to walk the large grid. Times up, delete your code.

We took a brief break followed by Michael Feathers showing us Test Driven Development in Ruby. Starting from an empty function with a test defined that showed him what his expected output should be. Run the test script, failure, fix this, test, failure, fix another bug, test, different results, still a failure, fix the code, test passed. Then we looked at a more detailed example of a (in his words) badly written Game of Life and he showed us a few iterations of the testing.

Time to pair up again and we’re off and running. Perl this time, however, additional condition, try to write it without IF statements. I’ve missed something because I can’t quite remember passing arrays of arrays and strings in Perl so I take a few minutes to write some test code to remember that @{$blah} gets me what I need and we’re off and running. Boolean and binary anding gets us pretty close to not needing ifs. We decide our test case code can have ifs, but not our game functions. Again, writing a test that has a few cells populated and writing the check_alive function, we get through that and start to write the check_dead routine and bam. Time’s up. Delete the code? Yes…

Another partner and this time it is PHP. While I am comfortable with PHP, we’re told, no two dimensional arrays. After some internal debate, we decide that using a transform on a one dimensional array is really just using a two dimensional array and we settle on some tricky column math and three loops to test the adjacent cells. After a bit of coding, we start writing some test functions to test check_alive, run into an Out of Bounds error because one of our test points is on the edge (a case which we talked about, but, didn’t code for), time’s up, delete the code.

On my last pairing, I am paired with someone I know. I’ve gotten the game working twice, he’s gotten it working once. He’s running a language called Processing which has a really simple IDE and ability to run the code, and, graphically display our matrix. Prior to this, all of our development has been testing code and looking at True/False tests or lists of strings to make sure they are equivalent, etc. We write our code really quickly but run into a problem with Processing’s storage of global arrays, so, we have to do a little trickery to swap arrays before the draw function. At the end of 45 minutes, we’re very close, it iterates once, goes to a blank screen, then displays the start screen again. We know it is something with the array copy (and Don ends up solving it later), 45 minutes is up… you can keep this code if you want to work on it.

Normally, they do one more session where you are paired up with your original partner to make a final attempt, but, we ran a little short on time. After a recap, we’re asked to stand up and give a brief Introduction, What we learned, What surprised us and What we’ll do differently in the future.

For me, I’ve often focused on unit tests well after the code has been written. While I don’t think I can easily change that on a number of projects, I think I will try some Test Driven Development for new code and some other projects.

All in all, it was a great experience. I met a lot of great people, learned some new coding techniques and learned a rather intriguing method of teaching. Pairs programming is good, but, for learning coding, iterating over the same problem six times in a rather intensive environment showed me other people’s thought processes in how they attacked the problem.

Many of the people looked at the problem in a much different way. One group used functional programming and maps, my first attempt used tuples and a list (and a second attempt as well which tried to solve the dead check by looking at the maximum bounding box rather than doing a complete traversal of the matrix). When we were asked to try it without if statements and in a one dimensional array, people’s thought processes changed as did the process when we couldn’t use a two dimensional array.

While there is always more than one way to solve a problem, seeing the different ways people approached the same problem, even after a number of iterations was intriguing.

Pictures from CoderetreatMiami.

Quite a fun event, highly recommended if you get the chance to attend one.

Python coding standards for imports

Friday, December 2nd, 2011

Recently I’ve been refactoring a lot of code and I’m seeing a few trends that I find easier to read.

With imports, I prefer putting each import on its own line. It takes up more screenspace, but, when looking at a commit diff, I can see what was added or removed much more easily than a string changed in the middle (though, more of the diff tools are showing inline differences).

However, what I started to do recently was do imports like:

from module import blah, \
                   blah2, \
                   blah3

I keep them in alphabetical order, but, I find that reading through that removes the ‘wall of text’ effect.

Original imports:

from module import blah
from module import blah2
from module import blah3

I find that the new method I’m using allows me to more easily see that two imports came from the same module.

Another possibility as mentioned by Chris McDonough is:

from module import (blah,
                    blah2,
                    blah3)

Entries (RSS) and Comments (RSS).
Cluster host: li