Posts Tagged ‘google’

Google Latitude UX adjustments

Tuesday, April 24th, 2012

I use Google Latitude quite a bit with roughly three to four checkins a day. I remember when you could get free things at Arbys with a checkin – though, I never took advantage of that. Even the manager at our local Arbys had no idea it was available, nor did they have a way to track it in their Point of Sale system. Likewise, our local Walgreens didn’t know how to handle a coupon on a phone that they couldn’t collect, though, did offer to give the discount if we bought the product.

However, the one thing that is very annoying is the amount of data that must be transferred for a checkin. I run a T-Mobile phone on AT&T’s network which means I’m limited to 2G which maxes at 512kb/sec. A first checkin after a phone restart will take two or three minutes transferring data.

When I do a checkin, rather than wait for the fine GPS location, I should be presented with a screen using the coarse lookup with all of the checkins that I have been to. That first screen would usually have the place I’m checking into. Granted I could go to a different store or restaurant, 95% of the time, that first list is going to contain a very short list of the places I’m likely to check in to.

While that list is being presented and GPS is getting a better position lock, I could opt for the refresh once I see GPS has locked in, or, hit search. Hitting search while it is loading results takes me into Maps rather than searching checking locations, then I have to go to the location, click on it, then click checkin. Cumbersome on 3G speeds, irritating at 2G speeds.

However, once I have done that, the amount of data for a checkin must be incredible as it will normally take 10-15 seconds to get to the next page that shows the leaderboard. Even at 2G speeds, I can’t imagine how much data needs to be sent that ties up the phone that long. I can upload a 115k image in less time than it takes to get the leaderboard after checking in. I know it isn’t a lookup time problem as both the send/receive data indicators are solid during the leaderboard download.

It has made me seriously consider bringing in an unlocked HTC Desire Z from Canada so I could have a keyboard phone on AT&T. I tried TMobile for 47 hours and missed text messages and several phone calls even though I can see their antenna from my backyard.

Watching several apps, the amount of data transmitted is sometimes quite scary.

Google Groups Captcha 404

Thursday, December 8th, 2011

The other day I was reading a thread on google.groups and wanted to add the user to my Google+ circles as we work on a number of projects that are somewhat related. A search of his name came up with too many results to be helpful, so, I figured I would try searching by his email address.

I mistyped the first captcha:

Properly typed the second captcha:

and received a 404 page:

It is completely repeatable and I’ve tested it numerous times. You can of course go back to the original page, click the … and get a new captcha, but, make sure you solve it on the first try! Note that the topic is also set to “” on the 2nd captcha.

Now if only there was a place to report bugs on google.groups. A fifteen minute search of the sparse FAQ didn’t turn up anything.

Google+ API Wishlist

Sunday, October 23rd, 2011

While I was a very early adopter of Google+, today I’ve basically disabled my Twitter account and my Facebook account remains open only to manage a few advertising campaigns and applications. I’ve used Google+ as my primary social outlet since late June. Initially I started to write a scraper to deal with Google+ to fix a few things I didn’t like about it, but, Google did mention that an API was coming. Google+’s API contains read-only access to your account, but, surely needs improvement.

While Games do appear to have access to these APIs, releasing them to the general public so that they can create their own apps would be greatly appreciated. I understand the complexity of writing an API and getting it right the first time, I’d like to put forward a list of items that would be helpful.

Post to Stream
  Notification list. Perhaps the post circles contains a tuple that can 
    turn notification on for each of the circles or people. If Public is
    passed a notification, ignore it silently. Alternatively, a second 
    list of notification targets.
  Content of post
  Attached media object(s) Picture URL, Gallery URL Link URL, Video tag/url.
    Currently Google+ only supports a single attached object, but why not
    plan for the future here. Options might include, preview thumbnail/fullsize
    inserted into stream.
  Email People not yet using Google+, default to false/no.
Get Circles
  return a list of the circles that the user currently has
Get Members in Circles
  return a list of the members in a circle. If no circle name passed, return
  list of circles with members. Pagination if too large?

What would be nice for the Google+ API

Add Member to Circle
  Add a member ID to a particular circle
Delete Member from Circle
  Delete a member ID from a circle
Add Circle
Delete Circle

Personally, adding members to circles would greatly simplify the manual process behind, but, I understand the obvious spam implications here.

With even the basic functionality listed above, even if we couldn’t attach objects, we could have our blogs post to Google+ or have our favorite desktop/webtop software post to Google+, making it one of the ‘Big Three’ rather than the duopoly the social media world currently has.

I would love to have the ability to post to Google+ from certain apps that I have running locally. I used to tweet IPv6 traffic tracker data of weekly statistics on email % received over IPv6, IPv6 traffic data volumes and other such data. I set up a small project that I thought was fun – replaying historic events synchronized to the actual event so that people could follow along. At present, there is no easy way to do this. Knowing what application published to the stream would also be very helpful – allowing developers to customize the ‘posted by’ line. When someone sees a post, they would know if it was automated or entered through the web client.

As a hobbyist, I’d love to see a slightly expanded API.

Google+, Python, and mechanize

Sunday, July 3rd, 2011

Since Google+’s release, I’ve wanted access to an API. I’m told soon. I couldn’t wait.

#!/usr/bin/env python

import mechanize

cj = mechanize.LWPCookieJar()

br = mechanize.Browser()
br.set_handle_refresh(mechanize._http.HTTPRefreshProcessor(), max_time=1)
br.addheaders = [('User-agent', 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_6_8) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/14.0.810.0 Safari/535.1 cd34/0.9b')]'')


br.form.find_control("Email").readonly = False
br.form['Email'] = ''
br.form['Passwd'] = 'supersecretpasswordhere'


for l in br.links():
    print l"cookies.txt")

Reverse Engineering Youtube Statistics Generation Algorithm

Saturday, November 27th, 2010

While surfing Youtube a while back, I noticed that you could view the statistics for a given video. While most of the videos I view are quite boring and have low viewcounts, I thought that might be the trigger — Only popular videos have stats. However, while surfing Youtube today to see how they handled some statistics, I saw some patterns emerge that tossed that theory out the window. Videos with even a few hundred views had statistics.

Since we can assume that Google has kept track of every view and statistic possible since it was merged with their platform, even old videos have data back into late 2007 as evidenced by many different videos. Some videos mention 30 Nov 2007 as the earliest data collection date.

So, we face a quandary. We have videos from 2005 through today, stats from late 2007 through today and stats displayed on the video display page that have been rolled out since mid 2010. Old videos that don’t currently display stats obviously are gathering stats but must have a flag saying that the old data hasn’t been imported as it will only mention Honors for this Video. How do you approach the problem?

We know that the data is collected and applied in batches and it appears that every video has statistics from a particular date forward. Recent videos all have full statistics, even with a few hundred views, no comments, no favorites. The catalyst doesn’t appear to be when someone has an interaction with a video, merely viewing a video must signal the system to backfill statistics. There is probably some weight given to popular videos, though, those videos would have a lot more history. One must balance the time required to import a very popular video versus importing the history from hundreds of less popular videos. One of the benefits of bigtable – if architected properly – would be to process each video’s history in one shot, set the stats processed flag and do the next video. One might surmise that Google knew to collect the view data, but, may not have thought about how the data would be used.

How do you choose videos to be processed? When you process the live views, you might decide to put a video into a queue for backfill processing. But, on a very lightly viewed video, this might delay backfilling another video where statistics might be more interesting or provocative. We can assume that we have a fixed date in time where a video doesn’t require backfilling which makes our data backfill decision a little easier.

As the logs are being processed, we might keep a list of the video_id, creation date and number of daily views. That data would be inserted into a backfill queue for our backfill process. In the backfill process, we would look at the creation date, number of daily views and number of mentions in the backfill queue. To figure out a priority list of the items to process, we might look at the velocity of hits from one day to the next – triggering a job queue entry on a video that is suddenly getting popular. We might also influence decisions based on the views and the creation date delta off the fixed point in time where stat displays started. This would allow us to take a lightly viewed video that was created just before our fixed point and prioritize that in the backfill queue. Now we’ve got a dual priority system that would allow us to tackle two problems at the same time, and intersect in the middle. Each day, new entries are inserted into the queue, altering priority of existing and current entries which would allow the stats to be backfilled in a manner that would appear to be very proactive.

At some point, videos that haven’t been viewed that were created prior to the fixed point in time could be added to the cleanup queue. Since they weren’t viewed, generating the statistics for them isn’t as important. And, if a video has been viewed, it was already in the queue. Since the queue could dispatch the jobs to as many machines as Google wanted, stats could be added to Youtube videos based on the load of their distributed computer.

What do you think?

How would you backfill log data from an estimated 900 million videos, serving 2 billion video views a week.

Entries (RSS) and Comments (RSS).
Cluster host: li