Opinions on shortened URLs are a dime a dozen these days, but the basic facts are:
- They’re awfully convenient for passing around (and this was true even before Twitter came about)
- They are, by nature, short-lived (either the services or the URLs)
- You should never rely on their being around later on
So basically you have absolutely no excuse to not be able to handle them. I decided to mess around with the concept a few weeks back to see how simple I could make it all work, and came up with a couple of useful Python classes that I can share with the world:
Creating short URLs
The trouble with creating short URLs is that there are entirely too many shortening services, and far too many variations on APIs – in fact, nearly all of them suffer from “not invented here” syndrome and try to “enhance” their APIs to give you a lot of stuff that you basically don’t (ever) need, and wrap their results in JSON or XML
Me, I refuse to put up with that kind of crap.
So I poked around a bit, found the simplest services to work against and created the following class, which will try all its known services in turn until it gives you a working URL:
import urllib, urllib2, urlparse, httplib BITLY_AUTH = 'login=foo&apiKey=bar' class URLShortener: services = { 'api.bit.ly': "http://api.bit.ly/shorten?version=2.0.1&%s&format=text&longUrl=" % BITLY_AUTH, 'api.tr.im': '/api/trim_simple?url=', 'tinyurl.com': '/api-create.php?url=', 'is.gd': '/api.php?longurl=' } def query(self, url): for shortener in self.services.keys(): c = httplib.HTTPConnection(shortener) c.request("GET", self.services[shortener] + urllib.quote(url)) r = c.getresponse() shorturl = r.read().strip() if ("Error" not in shorturl) and ("http://" + urlparse.urlparse(shortener)[1] in shorturl): return shorturl else: continue raise IOError
Yes, the error handling is naïve – any network exceptions and stuff ought to be caught upstream from this – but it works fine so far.
Expanding short URLs
This is the really fun bit, because it is not immediately obvious whether or not a short URL will actually be immediately useful – there are plenty of times when you’ll actually be redirected to something else, and while fooling around with the Google Reader API (something I’ll eventually write about alter), I found that also applied (in spades) to Feedburner links and whatnot.
So I decided to build some smarts into the process and have it not only ping some known hosts twice, but also turn it into a link checker of sorts, and learning which hosts were actually redirecting to other places:
import urllib, urllib2, urlparse, httplib class URLExpander: # known shortening services shorteners = ['tr.im','is.gd','tinyurl.com','bit.ly','snipurl.com','cli.gs', 'feedproxy.google.com','feeds.arstechnica.com'] twofers = [u'\u272Adf.ws'] # learned hosts learned = [] def resolve(self, url, components): """ Try to resolve a single URL """ c = httplib.HTTPConnection(components.netloc) c.request("GET", components.path) r = c.getresponse() l = r.getheader('Location') if l == None: return url # it might be impossible to resolve, so best leave it as is else: return l def query(self, url, recurse = True): """ Resolve a URL """ components = urlparse.urlparse(url) # Check weird shortening services first if (components.netloc in self.twofers) and recurse: return self.query(self.resolve(url, components), False) # Check known shortening services first if components.netloc in self.shorteners: return self.resolve(url, components) # If we haven't seen this host before, ping it, just in case if components.netloc not in self.learned: ping = self.resolve(url, components) if ping != url: self.shorteners.append(components.netloc) self.learned.append(components.netloc) return ping # The original URL was OK return url
This one’s a bit more convoluted but has turned out to be very useful indeed, and you can simply pickle the whole object to preserve its learned hosts.
