Quick Python search and replace script

October 28th, 2011

Have a client machine that is a little loaded that has a ton of modified files. Normally we just restore off the last backup or the previous generation backup, but, over 120k files since June 2011 have been exploited. Since the machine is doing quite a bit of work, we need to throttle our replacements so that we don’t kill the server.

#!/usr/bin/python
"""

Quick search and replace to replace an exploit on a client's site while
trying to keep the load disruption on the machine to a minimum.

Replace the variable exploit with the code to be replaced. By default, 
this script starts at the current directory. max_load controls our five
second sleep until the load drops.

"""

import glob
import os
import re
import time

path = '.'
max_load = 10

exploit = """
<script>var i,y,x="3cblahblahblah3e";y='';for(i=0;i
""".strip()

file_exclude = re.compile('\.(gif|jpe?g|swf|css|js|flv|wmv|mp3|mp4|pdf|ico|png|zip)$', \
                          re.IGNORECASE)

def check_load():
    load_avg = int(os.getloadavg()[0])
    while load_avg > max_load:
        time.sleep(30)
        load_avg = int(os.getloadavg()[0])

def getdir(path):
    check_load()
    for file in os.listdir(path):
        file_path = os.path.join(path,file)
        if os.path.isdir(file_path):
            getdir(file_path)
        else:
            if not file_exclude.search(file_path):
                process_file(file_path)

def process_file(file_path):
    file = open(file_path, 'r+')
    contents = file.read()
    if exploit in contents:
        print 'fixing:', file_path
        contents = contents.replace(exploit, '')
        file.truncate(0)
        file.seek(0, os.SEEK_SET )
        file.write(contents)
    file.close()

getdir(path)

Thankfully, since this server is run as www-data rather than SetUID, the damage wasn’t as bad as it could have been.

IPTables Performance

October 26th, 2011

I did a talk a while back at Hack and Tell regarding a DDOS attack that we had and last night I was reminded about a section of it while diagnosing a client machine with some performance problems.

IPTables rule evaluations are sequential. The longer your ruleset, the more time it takes to process each packet. There are shortcuts and hash lookup methods like IPSet and nf-hipac which help when dealing with large rulesets you might need when dealing with a DDOS, but, this client’s machine is dealing with legitimate traffic and SI% was higher than I had suspected it should be.

Creating shortcuts in the rulesets to decide whether to process a packet means that the very first rule should be your ACCEPT Related,Established. Since a packet with those flags set isn’t New and is part of an existing stream, there isn’t a reason to continue with rulechecks. So, we short-circuit the condition and automatically accept the packet. This resulted in a 120ms drop in Time to First Byte – yikes. You might contend that blocking an IP won’t affect the current stream, and, you’d be correct. Only when that IP sends a New connection would it be firewalled.

The next set of rules are your Drops for your highest volume service. In this case, port 80, followed by the ACCEPT New. Obviously, port 443/https may be a good candidate for the first or second ruleset depending on your traffic patterns.

The other services on the machine, ssh, ftp, smtp, pop, imap, etc can be placed in as needed. Your goal is to make sure that http/https is served quickly.

Another thing to consider is using RECENT as minor protection:

/sbin/iptables -A INPUT -p tcp --dport ssh -i eth0 -m state --state NEW -m recent --set
/sbin/iptables -A INPUT -p tcp --dport ssh -i eth0 -m state --state NEW -m recent --update --seconds 60 --hitcount 6 -j DROP

The above ruleset allows 6 connections within a minute to connect to SSH. Once an IP exceeds that connection rate, a 60 second counter is updated each time they connect. In order to connect to that port again, 60 seconds after the LAST connection attempt must lapse. You can protect any port like this, but, you wouldn’t want to rely on this for http/https except in extreme cases.

SetUID versus www-data

October 25th, 2011

For years I’ve been an advocate of running apache as www-data rather than SetUID.

Quickly, an explanation of the differences:

www-data

The Apache (or alternate webserver) runs as a low privilege account, usually www-data or httpd or a similarly named user/group. When a request is served, the low privilege apache process needs to have access to read the file, which usually means that the files must be world readable, and, the directories world executable. As such, any rogue process on the machine that knows the filesystem structure could traverse the filesystem and read files, like, wp-config.php, etc. Preventing that traversal becomes difficult if one can read config files and know which domains are served from that machine. Using a predictable filesystem layout makes it easier. However, any file that is not world writeable cannot be modified by the web process. This is why an exploit running on a site is only able to write to particular directories – usually ones that are made world writeable to allow uploads.

SetUID

In this case, the server takes the request, then, changes the UID to the user account that owns the files. Traversing the filesystem to find files like wp-config.php becomes difficult if the file and directory permissions are set correctly. The web process is the user, so, it is able to write to any file – just as it could with your FTP account. An exploit that is loaded now has access to modifying any file in your FTP account.

Why www-data or SetUID

While www-data has some shortcomings, SetUID is immensely more popular for two reasons: It avoids trouble tickets where people can’t understand why their application can’t upload a file to a directory, and, it protects the user’s files from being read by other people also running on that machine.

There is another mode that can be used – running the web server as www-data, but, running suPHP which spawns php processes as a particular user, while the webserver itself still runs in low privilege mode. Any PHP script still has the permission to write files on the filesystem, but, CGI/WSGI scripts would not have the ability to write to those files.

While I’m not a real fan of SetUID, one of the projects we’re working on will use it mostly to avoid the common questions regarding why this directory needs to be chmod 777 and it will cut down on the support tickets.

As a result, we need to plan for multiple generations of backups because rather than get trouble tickets regarding applications that can’t write files, we’ll be getting trouble tickets from users that have had their sites exploited and every file modified – rather than just the files in the directories that have been given permission.

I do have another theory on how to deal with this with groups. Basically, you would still run the webserver in a low privilege mode, but, it would switch to www-data.usergroup which would prevent traversal, and, the user could selectively allow group write on directories that needed it. Since usergroup would be given read access, a script running as a different user would not be able to traverse the directory since each user’s tree would be owned by their user:group.

I guess we would call this SetGID hosting.

Credit cards, recurring payments, client upgrades

October 24th, 2011

After some code wrangling late last night, I solved a conundrum regarding payment plan upgrades and wrote the code to support it. Our goal is to avoid partial credits and prorated charges. While that means we are delivering service we may not be charging for, in the long run, the number of customer service calls regarding small incremental charges will probably eat that revenue.

If a user has a free plan, and upgrades to a paid plan, they get charged immediately and their billing period is set.

If a user has a paid plan and upgrades to a more expensive pay plan, they are upgraded to the plan, but, not charged for the upgrade through the end of their current billing period.

If a user has a paid plan and downgrades to a cheaper plan, they are given until the end of the current billing period with their current ‘services’, at which point the system may have to selectively deactivate accounts if they haven’t disabled the extra features.

If a user has a paid plan and downgrades to free, billing profile is removed, plan downgraded on billing period end date.

The issue I ran into here was one I’ve dealt with for years – what do you do about service increases and decreases in price? Prorating ends up with a tiny charge or credit applied to the next invoice, but, usually results in a phone call from someone that doesn’t understand why they got charged $5 and $2.72 last month and $10 this month.

User Experience doesn’t end with the web page design.

These are issues that you need to think about when deploying a paid service that recurs because people will adjust their plan based on features, etc. You can’t immediately downgrade the account as they have paid through the end of the period assuming you billed in advance. If you bill after the fact, you face the possibility that you’ve delivered service for a month before you collect any money.

This gets trickier as you expand to longer term payments, i.e. quarterly or bi-annual payments. Based on the merchant service agreement, annual agreements are usually not allowed since you only have 6 months to contest a charge, if your vendor disappears after that 6 months and you have a remaining 6 months left, you have no recourse with your credit card company.

In any case, it wasn’t really a showstopper for this app, but, I certainly spent more time thinking about how to handle and automate a portion of the business that is currently done manually with my existing business.

Google+ API Wishlist

October 23rd, 2011

While I was a very early adopter of Google+, today I’ve basically disabled my Twitter account and my Facebook account remains open only to manage a few advertising campaigns and applications. I’ve used Google+ as my primary social outlet since late June. Initially I started to write a scraper to deal with Google+ to fix a few things I didn’t like about it, but, Google did mention that an API was coming. Google+’s API contains read-only access to your account, but, surely needs improvement.

While Games do appear to have access to these APIs, releasing them to the general public so that they can create their own apps would be greatly appreciated. I understand the complexity of writing an API and getting it right the first time, I’d like to put forward a list of items that would be helpful.

Post to Stream
  Circles/Public/People
  Notification list. Perhaps the post circles contains a tuple that can 
    turn notification on for each of the circles or people. If Public is
    passed a notification, ignore it silently. Alternatively, a second 
    list of notification targets.
  Content of post
  Attached media object(s) Picture URL, Gallery URL Link URL, Video tag/url.
    Currently Google+ only supports a single attached object, but why not
    plan for the future here. Options might include, preview thumbnail/fullsize
    inserted into stream.
  Email People not yet using Google+, default to false/no.
Get Circles
  return a list of the circles that the user currently has
Get Members in Circles
  return a list of the members in a circle. If no circle name passed, return
  list of circles with members. Pagination if too large?

What would be nice for the Google+ API

Add Member to Circle
  Add a member ID to a particular circle
Delete Member from Circle
  Delete a member ID from a circle
Add Circle
Delete Circle

Personally, adding members to circles would greatly simplify the manual process behind http://plus.cd34.com/, but, I understand the obvious spam implications here.

With even the basic functionality listed above, even if we couldn’t attach objects, we could have our blogs post to Google+ or have our favorite desktop/webtop software post to Google+, making it one of the ‘Big Three’ rather than the duopoly the social media world currently has.

I would love to have the ability to post to Google+ from certain apps that I have running locally. I used to tweet IPv6 traffic tracker data of weekly statistics on email % received over IPv6, IPv6 traffic data volumes and other such data. I set up a small project that I thought was fun – replaying historic events synchronized to the actual event so that people could follow along. At present, there is no easy way to do this. Knowing what application published to the stream would also be very helpful – allowing developers to customize the ‘posted by’ line. When someone sees a post, they would know if it was automated or entered through the web client.

As a hobbyist, I’d love to see a slightly expanded API.

Entries (RSS) and Comments (RSS).
Cluster host: li