Hey Blackberry, do you get paid for Bandwidth burned on data networks?

January 5th, 2012

Requests from:

User-Agent: BlackBerry8530/5.0.0.886 Profile/MIDP-2.1 Configuration/CLDC-1.1 VendorID/105
Accept: application/vnd.rim.html,text/html,application/xhtml+xml,
application/vnd.wap.xhtml+xml,text/vnd.sun.j2me.app-descriptor,
image/vnd.rim.png,image/jpeg,application/xvnd.rim.pme.b,
application/vnd.rim.ucs,image/gif;anim=1,application/vnd.rim.jscriptc;
v=0-8-72,application/x-javascript,application/vnd.rim.css;v=2,text/css;
media=screen,application/vnd.wap.wmlc;q=0.9,application/vnd.wap.wmlscriptc;q=0.7,
text/vnd.wap.wml;q=0.7,*/*;q=0.5

We prefer a number of content-types, but, if worse comes to worse, we’ll accept everything anyhow.

440 bytes transmitted on EVERY request made from a Blackberry when you could have just done:

Accept: */*

and saved 428 bytes PER request.

This particular page had 97 assets, amounting to almost 32k in wasted bandwidth sending headers to the CDN.

Compression and Massive Logging to flatfiles for DDOS logging

January 2nd, 2012

While working with a DDOS attack that has gone on for over two years, we learn that varnishncsa is not the best logging platform out there. While Varnish does a superb job at protecting the site, the logging leaves a little to be desired. A kill/varnishncsa redirect script runs every night at midnight, logrotate compresses the files and we’re left with a big set of logfiles — logfiles that don’t represent the entire picture.

Because we’re firewalling attacker IPs, our logs only show the requests that make it through the firewall – which minimizes the data that we can collect. From a forensic analysis standpoint, that makes the collected data less valuable. As a result, we need to collect the data off a span port, but, even though it is a denial of service attack against the web, it is good to log all TCP/UDP/syn traffic on the machine to make sure we register everything.

In an ideal world, the machine should have three ethernet ports, or, you should do this monitoring from another machine, but, this is a component to the ISO I’m putting together that can be used as a front-end proxy-cache that logs the attacks. The concept is to create an ISO or USB stick installation that sets up Varnish, IPSet, this logger and the blocker that adds the rules to IPSet.

Tux, a kernel mode http accelerator, used to log to a compressed file and had a tux2w3c helper that would convert the logs to an ASCII readable format that could be processed by weblog software. Since we’re not logging the actual web request, but the TCP packet received, we have a lot more information that we can look at. Our analysis software can look for markers within that data to make decisions and send to IPSet to self-protect and self-heal through the use of expiration times on the rules.

Initially I believe the log format will look something like this:

<timestamp><attacker ip><countrycode><attacked IP><port><tcp payload>

A tool to output the logfile in an ASCII readable form will be written as well so that the data can later be analyzed. Each row will be bzip2 compressed so that the daemon can run endlessly. Logfile names will be portlog.incidentid.20120102 and won’t require rotation. I suspect it might be worthwhile to later allow the logfile to include the hour, resulting in 24 files per day.

Git it done

December 25th, 2011

I’ve written software for a number of years and I’ve used a lot of different version control systems. From the old VMS ;12 days to today, where I primarily use git.

For the last nine years, I’ve used SVN with its quirky Apache DAV setup and the stupid uid/gid issues of running svn on a development server where that Apache was also used for testing. Ok, so, that was a poor architecture choice on my part.

With Pyramid, I started to run into small issues that I knew I could fix and my early tickets consisted of

diff -Naur

output pasted into the ticket, or, telling the team what to fix. While dealing with Pyramid, I found a bug, broke down and decided to submit bug fixes the right way.

I forked it, I cloned it, I made my changes, I did my git add . and git commit, followed by my git push, then, from the web interface, created a Pull request. I do intend to figure out how to do the fork and pull without having to resort to the web interface. I don’t remember how long it took for the fix to be imported, but, it wasn’t long. It was a very minor change to make some templates XHTML compliant, but, the project lead merely had to merge my fix (if they agreed) and it was done. They didn’t have to remake the changes on their copy of the source.

Git isn’t that hard

With that newfound appreciation, I submitted a fix to Pyramid OpenID which took roughly a month to get incorporated. It was a small fix, but again, very little effort required to merge the changes in.

I’ve used GitHub, Bitbucket, code.google.com (for SVN and Git) and recently set up Gitosis with Gitweb for some private repositories. After a few months of working with git, I exported all of my local SVN repositories and imported them into git. Over the next few days, I would find a few stray projects that were quick weekend hacks that I never used VCS for, and decided to import them as well.

While I have had a mystery issue with a commit that appears from out of nowhere on GitHub that no one seems to be able to fix, overall, my experience with git has been fairly positive. Once I merge the other branch, the mystery commit should disappear, but, it is annoying having to specify the branch on every push request so that git push doesn’t submit changes to master and my branch. If I forget, I have a git command I put in a bash script that allows me to revert back to the version before that change.

Why use Version Control Software as the sole developer?

First and foremost, it is a simple, almost realtime backup (to your last commit) of your codebase. You can go back in time and look at the changes to see what changed. You can commit chunks of code as ‘save points’ so you can look back and see what has gone on. GitHub seems to be the preferred repo for Open Source projects, though, if you have five or fewer team members and need a private repo, BitBucket might fill that need since they charge per team member rather than by number of projects. With fewer than five team members, BitBucket’s private repository hosting is free. Or, if you feel like setting up Gitosis+Gitweb, you can host your own private repo on a machine where you have a shell account.

Deploying code from git is easy as well. git pull, restart apache, done. It isn’t difficult to set up multiple branches so that you have a production, staging and development branch. This allows you to fix bugs on staging and production, then push to production while having longer term additions handled on your development branch.

What about multiple users?

This is where git, or any version control software, really becomes powerful. Multiple people can work on the same codebase, changes can be merged, branches and tags can be used to do tests without affecting production code and then later merged.

How do I set up Gitosis?

I used the guide from here and had Gitosis running after 15 minutes. I tried Gitolite prior to this, but, preferred Gitosis. After a few days, I decided it was time to set up Gitweb which was fairly straightforward. If you get a 404 when viewing your gitweb root, make sure there is no trailing / on your $projectroot.

What benefit is there?

If you’re doing any development, use version control. It doesn’t matter which one, just use one. If you have multiple people on your team, absolutely use version control. It ends the ‘what did you change?’ phase of software development when something breaks. With git or any other VCS, you gain accountability. You can see who made which changes and track the evolution of a problem. Maybe you want to test a new feature and keep it separate from production – use branches or tags. Once that branch is declared complete, you can merge it with production. Even if there are modifications made to the master, you can merge those in along the way so that you’re not maintaining two codebases that require a large merge later on. Conflict resolution is a little cumbersome, but, it is much easier to keep a development branch in sync with staging/production bugfixes than it is to do a huge merge at the end of a large project. Save yourself some time when working on a new branch and merge master in frequently.

What do I use?

Mostly public projects? GitHub, BitBucket, Code.google.com, SourceForge (really, they are coming back and their offerings do include git)

A few small private projects and some public projects? BitBucket is free if you have fewer than five team members. GitHub seems somewhat costly for a small organization to have private repos. Gitosis can be run on a single account on a small VPS.

Mostly private projects? Gitosis. It took 15 minutes to get it set up and working and import the first project. It took a few days before I installed GitWeb which isn’t needed, but, is a handy tool at times to look through the commit logs.

Do I have to use git?

No, you can use svn, mercurial or git or any other version control software. You’ll find that the open source world seems to have embraced git and GitHub appears to be the most popular hosting for open source projects.

If you’re managing an open source project, the number of people familiar with git is increasing every day.

What am I missing?

The only thing I miss is the updated release or version number. I’ve not found a way to update a listed ‘release’ number that can be incorporated in file headers. That way, if I look at a production system, I can easily see what codebase it is using. I’d like to put that version number in a template. With SVN, I could set an svn-property and use $Revision$. While I’ve been manually updating the build id in a template, it would be nice to have that hook without a git add/git commit/update/git add/git commit.

For a few projects we’re using Sphinx for documentation, and, having those docs autobuild and push to the document hosting would be nice. I believe this can be done with git hooks, but, I haven’t really investigated it too much.

Version control of files that should be hidden. .gitignore doesn’t track files, but, if I want to have files tracked, but not published, I haven’t found a way to do that. I have a document push script that I don’t want to be public, but, I would like it if I could do a git pull and not have to find an old repo with a copy of that script each time. I know you can set up masking on scripts which would allow me to hide the hostname or other private parts of files – allowing me to post actual production.ini or development.ini to the repo. I find that important because documentation is usually clipped from a production file, but, when new changes are made, sometimes modifications to those files are forgotten and one needs to dig around to see what changed.

All in all, git works very well. Any version control software is a benefit. Use it.

Documenting projects as you write them

December 12th, 2011

In Feb 2009 I started converting a PHP application over to Python and a framework. Today, I have finished all of the functional code and am at the point where I need to write a number of interfaces to the message daemon (currently a Perl script I wrote after ripping apart a C/libace daemon we had written).

The code did work in prior frameworks, I’ve moved to Pyramid, but, now I’m having to figure out why I ever used __init__ with 13 arguments. Of course everything is a wrapper around a wrapper around a single call and nothing is documented other than some sparse comments. Encrypted RPC payloads are sent to the daemon – oops, I also changed the host and key I’m testing from.

Yes, I actually am using RPC, in production, the right way.

Total Physical Source Lines of Code (SLOC) = 5,154

The penultimate 3% has added almost 200 lines of code. I suspect the last 2% adding the interfaces will add another 100 or so lines. Had I written better internal documentation, getting pulled away from the project for weeks or months at a time would have resulted in less ramp-up time when sitting back down to code. There were a few times where it would take me a few hours just to get up to speed with something I had written 18 months ago because I didn’t know what my original intentions were, or, what problem I was fixing.

Original PHP/Smarty project:

Total Physical Source Lines of Code (SLOC) = 45,040

In 2009, when I started this project, test code written resulted in roughly a 10:1 reduction in the codebase. It isn’t a truly fair comparison — the new code does more, has much better validation checks, and adds a number of features.

It’s been a long road and I’ve faced a number of challenges along the way. After Coderetreat, I’ve attempted to write testing as I’m writing code. That is a habit that I’ll have to reinforce. I don’t know that I’ll actually do Test Driven Development, but, I can see more test code being written during development, rather than sitting down with the project after it is done and writing test code. Additionally, I’m going to use Sphinx even for internal documentation.

People might question why I went with Turbogears, Pylons, and ended up with Pyramid, but, at the time I evaluated a number of frameworks, Django‘s ORM wasn’t powerful enough for some of the things I needed to do and I knew I needed to use SQLAlchemy. While Django and SQLAlchemy could be used at the time, I felt TurboGears was a closer match. As it turns out, Pyramid is just about perfect for me. Light enough that it doesn’t get in the way, heavy enough that it contains the hooks that I need to get things done.

If I wrote a framework, and I have considered it, Pyramid is fairly close to what I would end up with.

Lesson learned… document.

Today is going to be a very frustrating day wiring up stuff to classmethods that have very little documentation and buried __init__ blocks. Yes, I’ll be documenting things today.

What is a startup?

December 12th, 2011

All this talk about startups, and I often wonder if people really understand what a startup is.

A bakery is not a startup. A consulting company is not a startup. Anything that requires multiplying the number of people in the business to scale, is not a startup.

A startup is a company that has an idea where doubling income does not require doubling staff. It is a business where scaling to add another 1000 clients requires very little additional hardware.

A software company is a startup. After writing that first copy, selling 1000 more has only slight incremental costs. Selling 10000 more requires only slightly more resources than that. A subscription web site is a startup. The difference between 10 paying subscribers and 1000 paying subscribers in terms of the labor to produce the site is minimal.

If you start a business and you are primarily responsible for earning the income through your direct efforts, you are an employee, not a business. A consultant is not running a business, s/he is a contracted employee with many employers. A software developer that sells his product and can take three days off without materially affecting his income is a business.

Entries (RSS) and Comments (RSS).
Cluster host: li