Since everyone is using short urls these days and sometimes we just need to know where that URL leads I wrote this handy little function which finds out for us. Redirection can be a kind of tricky thing. We have 301 (“permanent”) and 302 (“temporary”) style status codes and multiple layers of redirection. I think the simplest approach to take is whenever the server returns a Location http header and the value in that location field is not the same as what you made the request to, we can pretty well be sure that it’s a redirect. The function below uses the http HEAD verb/method to request only the headers so as not to waste bandwidth and recursively calls itself until it gets a non-redirecting result. As a safeguard against infinite recursion I have a depth counter.
import urlparse import httplib # Recursively follow redirects until there isn't a location header def resolve_http_redirect(url, depth=0): if depth > 10: raise Exception("Redirected "+depth+" times, giving up.") o = urlparse.urlparse(url,allow_fragments=True) conn = httplib.HTTPConnection(o.netloc) path = o.path if o.query: path +='?'+o.query conn.request("HEAD", path) res = conn.getresponse() headers = dict(res.getheaders()) if headers.has_key('location') and headers['location'] != url: return resolve_http_redirect(headers['location'], depth+1) else: return url
6 responses to “Resolving HTTP Redirects in Python”
that’s work!!
Does the trick, thx.
just what I needed. However for Python 2.4 needed to refer to the urlparse components as [0], [1], etc
Thanks!
Two years later, this code is still useful, thanks! 🙂
Just to err on the safe side, I’d wrap the
conn.requestcall inside a try/except, in case some URL is mangled.
I agree, this is excellent code and thanks!
Nice, but some sites redirect you to error pages for non-supported browsers, so I suggest faking user agent by adding header like so:
conn.request(“HEAD’, path, headers={“User-Agent”: “Mozilla/4.0 (compatible; MSIE 9.0; Windows NT 6.1; .NET CLR 1.1.4322)”})