Google Latitude UX adjustments

April 24th, 2012

I use Google Latitude quite a bit with roughly three to four checkins a day. I remember when you could get free things at Arbys with a checkin – though, I never took advantage of that. Even the manager at our local Arbys had no idea it was available, nor did they have a way to track it in their Point of Sale system. Likewise, our local Walgreens didn’t know how to handle a coupon on a phone that they couldn’t collect, though, did offer to give the discount if we bought the product.

However, the one thing that is very annoying is the amount of data that must be transferred for a checkin. I run a T-Mobile phone on AT&T’s network which means I’m limited to 2G which maxes at 512kb/sec. A first checkin after a phone restart will take two or three minutes transferring data.

When I do a checkin, rather than wait for the fine GPS location, I should be presented with a screen using the coarse lookup with all of the checkins that I have been to. That first screen would usually have the place I’m checking into. Granted I could go to a different store or restaurant, 95% of the time, that first list is going to contain a very short list of the places I’m likely to check in to.

While that list is being presented and GPS is getting a better position lock, I could opt for the refresh once I see GPS has locked in, or, hit search. Hitting search while it is loading results takes me into Maps rather than searching checking locations, then I have to go to the location, click on it, then click checkin. Cumbersome on 3G speeds, irritating at 2G speeds.

However, once I have done that, the amount of data for a checkin must be incredible as it will normally take 10-15 seconds to get to the next page that shows the leaderboard. Even at 2G speeds, I can’t imagine how much data needs to be sent that ties up the phone that long. I can upload a 115k image in less time than it takes to get the leaderboard after checking in. I know it isn’t a lookup time problem as both the send/receive data indicators are solid during the leaderboard download.

It has made me seriously consider bringing in an unlocked HTC Desire Z from Canada so I could have a keyboard phone on AT&T. I tried TMobile for 47 hours and missed text messages and several phone calls even though I can see their antenna from my backyard.

Watching several apps, the amount of data transmitted is sometimes quite scary.

A discussion of Web Site Performance – from a design perspective

March 23rd, 2012

One of the things I always run into are clients that want their site to be faster. Often times, I’m told that it is the server or MySQL slowing their site down. Today, I had a conversation with a site owner that was talking about how slow their site was.

“The site loads the first post and sits there for five seconds, then the rest of the page comes in.”

Immediately, I think, <script src=” is likely the problem.

Load the page, yes, pauses… right where the social media buttons are loaded.

Successive reloads are better, but, that one javascript include appears to always be fetched. Turns out, expire time on that javascript is set to a date in the past, therefore, always fetches regardless of modifications. And, that script doesn’t load very quickly, adding to the delay.

Disable the plugin, reload, and the site is fast. The initial reaction is, lets move those includes to async javascript. Social Media buttons don’t need to hold up the pageload – they can be rendered after the site has loaded. It might look a little funny, but, most of the social media buttons are below the fold anyhow, and, we’re trying to get the site to display quickly.

There is a difference between the page being slow and the page rendering slowly. The latter is what most people will see and make a judgement that the site is slow. So, the first thing we need to do is move things to async. As an example, the social media buttons on this site are loaded by my cd34-social plugin.

But, the meat of the conversion to async is here:

<script type="text/javascript">
<!--
var a=["https://apis.google.com/js/plusone.js","http://platform.twitter.com/widgets.js","http://connect.facebook.net/en_US/all.js#xfbml=1"];for(script_index in a){var b=document.createElement("script");b.type="text/javascript";b.async=!0;b.src=a[script_index];var c=document.getElementsByTagName("script")[0];c.parentNode.insertBefore(b,c)};
// -->
</script>

What this code does is load https://apis.google.com/js/plusone.js, http://platform.twitter.com/widgets.js and http://connect.facebook.net/en_US/all.js#xfbml=1 after the page has loaded, and inserts it before itself.

This will make the social buttons load after the page has loaded, but, won’t hold up the page rendering.

However, this isn’t the only issue we’ve run into. The plugin they used, includes its own social media buttons. The plugin should use CSS sprites which are large images that contain a bar or matrix of icons where you use CSS positioning to move that image around and display only a portion of it. This way, you fetch one image rather than the 16 social media buttons and the 16 social media button hover images, and use CSS to move that image around and display the right icon out of the graphic.

Here are a collection of those sprites as used by Google, Facebook, Twitter and Twitter’s Bootstrap template:

With these sprites, you save the overhead of multiple fetches for each icon, and, present a much quicker overall experience for the surfer.

Not everything can be solved with low latency webservers, some performance problems are on the browser/rendering side.

REMOTE_ADDR handling with Varnish and Load Balancers

March 18th, 2012

While working with the ever present spam issue on this blog, I’ve started to have issues with many of the plugins not using the correct IP address lookup. While each plugin author can be contacted, trackbacks and comments through WordPress still have the Varnish server’s IP address.

In our vcl, in vcl_recv, we put the following:

       if (req.http.x-forwarded-for) {
           set req.http.X-Forwarded-For =
               req.http.X-Forwarded-For + ", " + client.ip;
       } else {
           set req.http.X-Forwarded-For = client.ip;
       }

and in our wp-config.php we put:

$temp_ip = explode(',', isset($_SERVER['HTTP_X_FORWARDED_FOR'])
  ? $_SERVER['HTTP_X_FORWARDED_FOR'] :
  (isset($_SERVER['HTTP_CLIENT_IP']) ?
  $_SERVER['HTTP_CLIENT_IP'] : $_SERVER['REMOTE_ADDR']));
$remote_addr = trim($temp_ip[0]);
$_SERVER['REMOTE_ADDR'] = preg_replace('/[^0-9.:]/', '', $remote_addr );

While we only need to check HTTP_X_FORWARDED_FOR in our case, this does handle things if you are behind one of a number of other proxy servers and corrects $_SERVER[‘REMOTE_ADDR’]. The ticket that was opened and later closed which would have made it very easy to overload a get_ip function says it should be fixed in the server.

in /wp-includes/comment.php:

 * We use REMOTE_ADDR here directly. If you are behind a proxy, you should ensure
 * that it is properly set, such as in wp-config.php, for your environment.
 * See {@link http://core.trac.wordpress.org/ticket/9235}

You can also use mod_rpaf if you’re using Apache which will fix this through an Apache module.

DNS Authoritative Server, disabled recursive lookups and analyzed the logs

March 8th, 2012

We operate a number of DNS servers, one of which we left open for recursive lookups for clients to see their sites before they’ve moved the DNS. While we normally handle that through a proxy server we maintain, DNS was a secondary method for doing this which was sometimes easier than having a client use a proxy server.

Today, after 60 days notice, we shut off external recursive lookups at 6pm EST. In the last two hours, we’ve received a number of ‘lookups’ from a few ‘high volume’ IP Addresses. The list is surprising.

Output from (annotated with whois info):

grep -E 'denied$' /var/log/syslog|cut -f 8 -d ' '|cut -f 1 -d '#'|sort|uniq -c|sort -nr|head -n 30
 506452 212.146.85.194  (GTS Telecom Romania Operations)
  38444 141.101.125.86  (Cloudflare EU)
  10490 141.101.124.86  (Cloudflare EU)
  10236 67.228.130.45  (Softlayer)
   4784 212.227.135.196  (OneandOne AG)
   1277 173.201.216.32  (Godaddy)
    673 163.121.134.170 
    620 163.121.194.154
    617 2001:2060:ffff:a01::53 (Sonera, Finland)
    528 95.142.101.5  (Cyber Technology BVBA) 
    528 95.142.100.5  (Cyber Technology BVBA)
    528 178.237.35.125
    464 66.93.87.2
    323 193.210.18.18
    306 67.15.238.64
    294 193.210.19.19
    270 94.23.147.151
    230 76.76.11.241
    213 72.53.193.43
    207 204.194.238.17
    202 72.53.193.42
    192 208.80.194.121
    190 78.28.197.6
    190 78.28.197.5
    190 212.93.150.198
    180 204.194.238.15
    168 213.157.178.54
    160 208.69.35.22
    153 163.121.128.90
    152 212.6.108.157

What are we seeing?

GTS Telecom Romania Operations:
# grep 212.146.85.194 /var/log/syslog|cut -f 2 -d "'"|sort|uniq -c|sort -nr|more
 539590 isc.org/ANY/IN

CloudFlare EU:
# grep 141.101.125.86 /var/log/syslog|cut -f 2 -d "'"|sort|uniq -c|sort -nr|more
  41660 ripe.net/ANY/IN

# grep 141.101.124.86 /var/log/syslog|cut -f 2 -d "'"|sort|uniq -c|sort -nr|more
  10490 ripe.net/ANY/IN

SoftLayer:
# grep 67.228.130.45 /var/log/syslog|cut -f 2 -d "'"|sort|uniq -c|sort -nr|more
  10924 ripe.net/ANY/IN
      1 pdkamoaaaaekt0000dkaaabaaafbadli.ripe.net/ANY/IN
      1 oobdjlaaaaekt0000dkaaabaaafbadli.ripe.net/ANY/IN
      1 onigfiaaaaekt0000dkaaabaaafbadli.ripe.net/ANY/IN
      1 ojfhfgaaaaekt0000dkaaabaaafbadli.ripe.net/ANY/IN
      1 nphhdiaaaaekt0000dkaaabaaafbadli.ripe.net/ANY/IN
      1 ngffklaaaaekt0000dkaaabaaafbadli.ripe.net/ANY/IN

OneandOne AG:
# grep 212.227.135.196 /var/log/syslog|cut -f 2 -d "'"|sort|uniq -c|sort -nr|more
   5196 isc.org/ANY/IN

Godaddy:
# grep 173.201.216.32 /var/log/syslog|cut -f 2 -d "'"|sort|uniq -c|sort -nr|more
   3031 ripe.net/ANY/IN

Lumped into that was an IPv6 resolver that actually pointed out an interesting issue. The domains 2001:2060:ffff:a01::53 is looking for should be hitting our server for an authoritative lookup, but, appears to be doing recursive lookups. I’ve made a minor change to our configs to see if I can log a bit more data as it isn’t a continuous stream.

It is interesting to see the number of hits that occurred in a two hour period which explains why the network PPS rate on that server has been higher than normal.

This traffic is part of a DNS reflection DDOS attack using spoofed UDP packets with the ‘source address’ set to the above targets. Why they’ve chosen isc.org or ripe.net as their typical entry, I don’t know. Since UDP is a connectionless protocol, there is no Syn/Ack, making DNS susceptible to spoofed packets. Our DNS resolver, which was previously publicly available, was responding based on the source IP in the UDP payload. The hackers chose a particular group of servers to send those responses to.

Oddly, I did see some TCP traffic from one of Cloudflare’s EU servers which should have been impossible as they are using anycast. Shortly after disabling recursive lookups, the attack stopped. This means that the hackers are watching the servers involved in the attack and when they saw that it was no longer affected, they stopped sending that spoofed traffic.

We are in the process of upgrading our DNS servers and swapping things around and this was the first step in redesigning our DNS infrastructure. It was only by chance we noticed that our resolver was being used in the DNS reflection attack as it was only sending out a few mb/sec more traffic than it should have been.

My apologies to the servers that were receiving DDOS traffic from our resolvers.

50000 Connection node.js/socket.io test

February 22nd, 2012

While working on a project, I started doing some benchmarking, but, benchmarks != real world.

So, I created a quick test, 50k.cd34.com, and, if you can, hit the url, send it to your friends and lets see if we can hit 50000 simultaneous connections. There is a counter that updates, and, the background color changes from black to white as it gets closer to 50000 viewers.

Code available on code.google.com.

I’ll probably be fixing IPv6 with socket.io soon. I was rather dismayed that it didn’t work.

Entries (RSS) and Comments (RSS).
Cluster host: li