Posts Tagged ‘mod_rewrite’

Apache2 – Using mod_rewrite and RewriteMap with a Python program to georedirect

Friday, August 2nd, 2013

Almost every machine we run has geoip loaded and clients are able to access the country code of a surfer without having to write too much code. One client decided that they wanted to redirect users from certain states to another page to deal with sales tax and shipping issues but had a mixture of perl scripts and php scripts running their shopping cart.

Having done something very similar using mod_rewrite and rewritemap with a text file, the simple solution appeared to be a short Python script to interface with the GeoIPLite data and use mod_rewrite.

In our VirtualHost config, we put:

RewriteMap statecode prg:/var/www/rewritemap/tools/rewritemap.py

To set up our Python environment, we did:

virtualenv /var/www/rewritemap
cd /var/www/rewritemap
source bin/activate
git clone https://github.com/maxmind/geoip-api-python.git
cd geoip-api-python
python setup.py install
cd ..
mkdir tools
cd tools
wget -N http://geolite.maxmind.com/download/geoip/database/GeoLiteCity.dat.gz
gunzip GeoLiteCity.dat.gz

rewritemap.py

#!/var/www/rewritemap/bin/python

import sys

import GeoIP
gi = GeoIP.open("/var/www/rewritemap/tools/GeoLiteCity.dat", \
         GeoIP.GEOIP_STANDARD)

STATE = 'NULL'
gir = gi.record_by_name(sys.stdin.readline())
if gir != None:
  if gir['country_code'] == 'US':
    STATE = gir['region']

print STATE

Then, in our .htaccess we put:

RewriteEngine on
RewriteCond ${statecode:%{REMOTE_ADDR}|NONE} ^(PA|DC|CO)$
RewriteRule .* /specialpage/ [R=302,L]

Apache mod_rewrite Performance issue discussion and fix

Monday, May 16th, 2011

This weekend I was with a client that was having some issues unrelated to this issue, but, it raised an interesting point. Apache’s handlers have a load order dependent on the modules installed and there are certain modules that slow down apache enough that you want to avoid them on production servers – mod_status being one of those.

The story behind this one is probably something that you’ve run into. WebApp written for one machine, client base grows and it is time to expand. Moving from one server to two, is infinitely harder than moving from two to three. However, you have a legacy that you need to support – clients that won’t change the hyperlink pointing to some API that you’ve designed, so, you use mod_rewrite to fix the problem.

A simple mod_rewrite can redirect the traffic to our old location to the new location so that you don’t need to worry about clients that aren’t going to change the HTML. Lets also pretend this app was written before RESTful APIs were handy and we need to also pass the query string.

RewriteEngine on
RewriteRule ^specialapi.php$ http://newserver.superapp.com/specialapi.php [R=301,L,QSA]

So, after some testing, we’re satisfied that things work as expected and we’re happy that we could split things effectively.

What happens in a request for that url

Our original API did some processing of the request based on some command line arguments, and redirects the person elsewhere. When we do a normal request for this object, using strace, we get the following output:

accept(4, {sa_family=AF_INET6, sin6_port=htons(49632), inet_pton(AF_INET6, "2001:470:5:590::cd34", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, [28]) = 10
fcntl64(10, F_GETFD)                    = 0
fcntl64(10, F_SETFD, FD_CLOEXEC)        = 0
getsockname(10, {sa_family=AF_INET6, sin6_port=htons(80), inet_pton(AF_INET6, "2604:3500::c:21", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, [28]) = 0
fcntl64(10, F_GETFL)                    = 0x2 (flags O_RDWR)
fcntl64(10, F_SETFL, O_RDWR|O_NONBLOCK) = 0
gettimeofday({1305564609, 151401}, NULL) = 0
gettimeofday({1305564609, 151686}, NULL) = 0
read(10, "GET /test/api.php HTTP/1.1\r\nHost"..., 8000) = 435
gettimeofday({1305564609, 153050}, NULL) = 0
gettimeofday({1305564609, 153303}, NULL) = 0
gettimeofday({1305564609, 153521}, NULL) = 0
gettimeofday({1305564609, 153741}, NULL) = 0
gettimeofday({1305564609, 153933}, NULL) = 0
gettimeofday({1305564609, 154152}, NULL) = 0
gettimeofday({1305564609, 154317}, NULL) = 0
gettimeofday({1305564609, 154533}, NULL) = 0
gettimeofday({1305564609, 154722}, NULL) = 0
gettimeofday({1305564609, 154914}, NULL) = 0
gettimeofday({1305564609, 155103}, NULL) = 0
gettimeofday({1305564609, 155295}, NULL) = 0
gettimeofday({1305564609, 155483}, NULL) = 0
gettimeofday({1305564609, 156089}, NULL) = 0
gettimeofday({1305564609, 156279}, NULL) = 0
gettimeofday({1305564609, 156496}, NULL) = 0
gettimeofday({1305564609, 156685}, NULL) = 0
gettimeofday({1305564609, 156877}, NULL) = 0
gettimeofday({1305564609, 157065}, NULL) = 0
stat64("/var/www/uc/test/api.php", {st_mode=S_IFREG|0644, st_size=22, ...}) = 0
open("/var/www/.htaccess", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/var/www/uc/.htaccess", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = 11
fcntl64(11, F_GETFD)                    = 0x1 (flags FD_CLOEXEC)
fcntl64(11, F_SETFD, FD_CLOEXEC)        = 0
fstat64(11, {st_mode=S_IFREG|0644, st_size=30, ...}) = 0
read(11, "ErrorDocument 404 /index.html\n", 4096) = 30
read(11, "", 4096)                      = 0
close(11)                               = 0
open("/var/www/uc/test/.htaccess", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/var/www/uc/test/api.php/.htaccess", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = -1 ENOTDIR (Not a directory)
setitimer(ITIMER_PROF, {it_interval={0, 0}, it_value={60, 0}}, NULL) = 0
rt_sigaction(SIGPROF, {0xb70c1a60, [PROF], SA_RESTART}, {0xb70c1a60, [PROF], SA_RESTART}, 8) = 0
rt_sigprocmask(SIG_UNBLOCK, [PROF], NULL, 8) = 0
umask(077)                              = 022
umask(022)                              = 077
getcwd("/", 4095)                       = 2
chdir("/var/www/uc/test")               = 0
setitimer(ITIMER_PROF, {it_interval={0, 0}, it_value={30, 0}}, NULL) = 0
time(NULL)                              = 1305564609
open("/var/www/uc/test/api.php", O_RDONLY|O_LARGEFILE) = 11
fstat64(11, {st_mode=S_IFREG|0644, st_size=22, ...}) = 0
fstat64(11, {st_mode=S_IFREG|0644, st_size=22, ...}) = 0
fstat64(11, {st_mode=S_IFREG|0644, st_size=22, ...}) = 0
mmap2(NULL, 22, PROT_READ, MAP_SHARED, 11, 0) = 0xb6de7000
munmap(0xb6de7000, 22)                  = 0
close(11)                               = 0
chdir("/")                              = 0
umask(022)                              = 022
open("/dev/urandom", O_RDONLY)          = 11
read(11, "\247q\340\"", 4)              = 4
close(11)                               = 0
open("/dev/urandom", O_RDONLY)          = 11
read(11, "\216\241*W", 4)               = 4
close(11)                               = 0
open("/dev/urandom", O_RDONLY)          = 11
read(11, "\270\267\22+", 4)             = 4
close(11)                               = 0
setitimer(ITIMER_PROF, {it_interval={0, 0}, it_value={0, 0}}, NULL) = 0
writev(10, [{"HTTP/1.1 200 OK\r\nDate: Mon, 16 M"..., 237}, {"\37\213\10\0\0\0\0\0\0\3", 10}, {"+I-.\1\0", 6}, {"\f~\177\330\4\0\0\0", 8}], 4) = 261
gettimeofday({1305564609, 174811}, NULL) = 0
gettimeofday({1305564609, 175003}, NULL) = 0
read(10, 0xb93489e0, 8000)              = -1 EAGAIN (Resource temporarily unavailable)
write(7, "2001:470:5:590::cd34 - - [16/May"..., 214) = 214
write(8, "vhost_combined\n", 15)        = 15

Briefly, the request comes in for the asset http://testserver.com/test/api.php as you can see by the:

Apache checks to see if the file exists:

stat64("/var/www/uc/test/api.php", {st_mode=S_IFREG|0644, st_size=22, ...}) = 0

And does something odd:

open("/var/www/uc/test/api.php/.htaccess", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = -1 ENOTDIR (Not a directory)

Even though the file exists, and isn’t a directory, apache is checking to see if there is a .htaccess file in the api.php directory. This is where part of the issue comes to light.

Eventually, apache changes to the directory and serves the content:

chdir("/var/www/uc/test")               = 0
setitimer(ITIMER_PROF, {it_interval={0, 0}, it_value={30, 0}}, NULL) = 0
time(NULL)                              = 1305564609
open("/var/www/uc/test/api.php", O_RDONLY|O_LARGEFILE) = 11
fstat64(11, {st_mode=S_IFREG|0644, st_size=22, ...}) = 0
fstat64(11, {st_mode=S_IFREG|0644, st_size=22, ...}) = 0
fstat64(11, {st_mode=S_IFREG|0644, st_size=22, ...}) = 0

So, a normal request works, and we’re able to see what Apache is doing. Now, lets put our modified rule in to redirect people to the new location:

accept(4, {sa_family=AF_INET6, sin6_port=htons(50286), inet_pton(AF_INET6, "2001:470:5:590::cd34", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, [28]) = 10
fcntl64(10, F_GETFD)                    = 0
fcntl64(10, F_SETFD, FD_CLOEXEC)        = 0
getsockname(10, {sa_family=AF_INET6, sin6_port=htons(80), inet_pton(AF_INET6, "2604:3500::c:21", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, [28]) = 0
fcntl64(10, F_GETFL)                    = 0x2 (flags O_RDWR)
fcntl64(10, F_SETFL, O_RDWR|O_NONBLOCK) = 0
gettimeofday({1305565527, 718766}, NULL) = 0
gettimeofday({1305565527, 718990}, NULL) = 0
read(10, "GET /test/api.php HTTP/1.1\r\nHost"..., 8000) = 435
gettimeofday({1305565527, 719683}, NULL) = 0
gettimeofday({1305565527, 719909}, NULL) = 0
gettimeofday({1305565527, 720127}, NULL) = 0
gettimeofday({1305565527, 720347}, NULL) = 0
gettimeofday({1305565527, 720539}, NULL) = 0
gettimeofday({1305565527, 720732}, NULL) = 0
gettimeofday({1305565527, 720921}, NULL) = 0
gettimeofday({1305565527, 721936}, NULL) = 0
gettimeofday({1305565527, 722127}, NULL) = 0
gettimeofday({1305565527, 722343}, NULL) = 0
gettimeofday({1305565527, 722533}, NULL) = 0
gettimeofday({1305565527, 722724}, NULL) = 0
gettimeofday({1305565527, 722913}, NULL) = 0
gettimeofday({1305565527, 723106}, NULL) = 0
gettimeofday({1305565527, 723295}, NULL) = 0
gettimeofday({1305565527, 723487}, NULL) = 0
gettimeofday({1305565527, 723676}, NULL) = 0
gettimeofday({1305565527, 723869}, NULL) = 0
gettimeofday({1305565527, 724058}, NULL) = 0
stat64("/var/www/uc/test/api.php", {st_mode=S_IFREG|0644, st_size=22, ...}) = 0
open("/var/www/.htaccess", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/var/www/uc/.htaccess", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = 11
fcntl64(11, F_GETFD)                    = 0x1 (flags FD_CLOEXEC)
fcntl64(11, F_SETFD, FD_CLOEXEC)        = 0
fstat64(11, {st_mode=S_IFREG|0644, st_size=30, ...}) = 0
read(11, "ErrorDocument 404 /index.html\n", 4096) = 30
read(11, "", 4096)                      = 0
close(11)                               = 0
open("/var/www/uc/test/.htaccess", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = 11
fcntl64(11, F_GETFD)                    = 0x1 (flags FD_CLOEXEC)
fcntl64(11, F_SETFD, FD_CLOEXEC)        = 0
fstat64(11, {st_mode=S_IFREG|0644, st_size=72, ...}) = 0
read(11, "RewriteEngine on\nRewriteRule ^ap"..., 4096) = 72
read(11, "", 4096)                      = 0
close(11)                               = 0
open("/var/www/uc/test/api.php/.htaccess", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = -1 ENOTDIR (Not a directory)
writev(10, [{"HTTP/1.1 301 Moved Permanently\r\n"..., 303}, {"\37\213\10\0\0\0\0\0\0\3", 10}, {"mP\301N\3030\f\275\367+LOpX\334\26\t!\224E\32k\21\2236\250D9p\364\32\263"..., 236}, {"\314\226,\242>\1\0\0", 8}], 4) = 557
gettimeofday({1305565527, 734362}, NULL) = 0
gettimeofday({1305565527, 734577}, NULL) = 0
read(10, 0xb93489e0, 8000)              = -1 EAGAIN (Resource temporarily unavailable)
write(7, "2001:470:5:590::cd34 - - [16/May"..., 215) = 215
write(8, "vhost_combined\n", 15)        = 15

In this case, we see something that shouldn’t really happen. Even though our mod_rewrite has rewritten the url, apache is still checking to see if api.php and api.php/.htaccess exist:

stat64("/var/www/uc/test/api.php", {st_mode=S_IFREG|0644, st_size=22, ...}) = 0

open("/var/www/uc/test/api.php/.htaccess", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = -1 ENOTDIR (Not a directory)

So, even with the mod_rewrite rule passing the file over to another machine, apache is still testing the existence of the file and a directory named api.php containing the file .htaccess. The latter check being the one that we’re going to fix.

accept(4, {sa_family=AF_INET6, sin6_port=htons(50516), inet_pton(AF_INET6, "2001:470:5:590::cd34", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, [28]) = 10
fcntl64(10, F_GETFD)                    = 0
fcntl64(10, F_SETFD, FD_CLOEXEC)        = 0
getsockname(10, {sa_family=AF_INET6, sin6_port=htons(80), inet_pton(AF_INET6, "2604:3500::c:21", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, [28]) = 0
fcntl64(10, F_GETFL)                    = 0x2 (flags O_RDWR)
fcntl64(10, F_SETFL, O_RDWR|O_NONBLOCK) = 0
gettimeofday({1305565791, 419574}, NULL) = 0
gettimeofday({1305565791, 419798}, NULL) = 0
read(10, "GET /test/api.php HTTP/1.1\r\nHost"..., 8000) = 409
gettimeofday({1305565791, 420459}, NULL) = 0
gettimeofday({1305565791, 420687}, NULL) = 0
gettimeofday({1305565791, 420905}, NULL) = 0
gettimeofday({1305565791, 421126}, NULL) = 0
gettimeofday({1305565791, 421319}, NULL) = 0
gettimeofday({1305565791, 421603}, NULL) = 0
gettimeofday({1305565791, 421891}, NULL) = 0
gettimeofday({1305565791, 422112}, NULL) = 0
gettimeofday({1305565791, 422360}, NULL) = 0
gettimeofday({1305565791, 422585}, NULL) = 0
gettimeofday({1305565791, 422809}, NULL) = 0
gettimeofday({1305565791, 423063}, NULL) = 0
gettimeofday({1305565791, 423313}, NULL) = 0
gettimeofday({1305565791, 423567}, NULL) = 0
gettimeofday({1305565791, 423818}, NULL) = 0
gettimeofday({1305565791, 424071}, NULL) = 0
gettimeofday({1305565791, 424297}, NULL) = 0
stat64("/var/www/uc/test/api.php", 0xbf8e9bfc) = -1 ENOENT (No such file or directory)
lstat64("/var", {st_mode=S_IFDIR|S_ISGID|0755, st_size=148, ...}) = 0
lstat64("/var/www", {st_mode=S_IFDIR|S_ISGID|0711, st_size=78, ...}) = 0
open("/var/www/.htaccess", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = -1 ENOENT (No such file or directory)
lstat64("/var/www/uc", {st_mode=S_IFDIR|S_ISGID|0755, st_size=4096, ...}) = 0
open("/var/www/uc/.htaccess", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = 11
fcntl64(11, F_GETFD)                    = 0x1 (flags FD_CLOEXEC)
fcntl64(11, F_SETFD, FD_CLOEXEC)        = 0
fstat64(11, {st_mode=S_IFREG|0644, st_size=30, ...}) = 0
read(11, "ErrorDocument 404 /index.html\n", 4096) = 30
read(11, "", 4096)                      = 0
close(11)                               = 0
lstat64("/var/www/uc/test", {st_mode=S_IFDIR|S_ISGID|0755, st_size=48, ...}) = 0
open("/var/www/uc/test/.htaccess", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = 11
fcntl64(11, F_GETFD)                    = 0x1 (flags FD_CLOEXEC)
fcntl64(11, F_SETFD, FD_CLOEXEC)        = 0
fstat64(11, {st_mode=S_IFREG|0644, st_size=72, ...}) = 0
read(11, "RewriteEngine on\nRewriteRule ^ap"..., 4096) = 72
read(11, "", 4096)                      = 0
close(11)                               = 0
lstat64("/var/www/uc/test/api.php", 0xbf8e9bfc) = -1 ENOENT (No such file or directory)
writev(10, [{"HTTP/1.1 301 Moved Permanently\r\n"..., 303}, {"\37\213\10\0\0\0\0\0\0\3", 10}, {"mP\301N\3030\f\275\367+LOpX\334\26\t!\224E\32k\21\2236\250D9p\364\32\263"..., 236}, {"\314\226,\242>\1\0\0", 8}], 4) = 557
gettimeofday({1305565791, 435764}, NULL) = 0
gettimeofday({1305565791, 435986}, NULL) = 0
read(10, 0xb934a9e8, 8000)              = -1 EAGAIN (Resource temporarily unavailable)
write(7, "2001:470:5:590::cd34 - - [16/May"..., 215) = 215
write(8, "vhost_combined\n", 15)        = 15

So, in this case we’re left with:

stat64("/var/www/uc/test/api.php", 0xbf8e9bfc) = -1 ENOENT (No such file or directory)
and
lstat64("/var/www/uc/test/api.php", 0xbf8e9bfc) = -1 ENOENT (No such file or directory)

And we’re not trying to open /var/www/uc/test/api.php/.htaccess, so, we’ve made the process a little smoother.

Briefly, when you use mod_rewrite to redirect traffic from an existing file, move the file out of the way to save extra lookups.

Additionally, you can move your mod_rewrite into your config file, and set AllowOverride none in your config which will prevent Apache from looking for .htaccess files in each of your directories. If you have a lot of static content being accessed, this will help considerably.

Entries (RSS) and Comments (RSS).
Cluster host: li