Zac Witte

Resolving HTTP Redirects in Python

Since everyone is using short urls these days and sometimes we just need to know where that URL leads I wrote this handy little function which finds out for us. Redirection can be a kind of tricky thing. We have 301 (“permanent”) and 302 (“temporary”) style status codes and multiple layers of redirection. I think the simplest approach to take is whenever the server returns a Location http header and the value in that location field is not the same as what you made the request to, we can pretty well be sure that it’s a redirect. The function below uses the http HEAD verb/method to request only the headers so as not to waste bandwidth and recursively calls itself until it gets a non-redirecting result. As a safeguard against infinite recursion I have a depth counter.

import urlparse
import httplib

# Recursively follow redirects until there isn't a location header
def resolve_http_redirect(url, depth=0):
    if depth > 10:
        raise Exception("Redirected "+depth+" times, giving up.")
    o = urlparse.urlparse(url,allow_fragments=True)
    conn = httplib.HTTPConnection(o.netloc)
    path = o.path
    if o.query:
        path +='?'+o.query
    conn.request("HEAD", path)
    res = conn.getresponse()
    headers = dict(res.getheaders())
    if headers.has_key('location') and headers['location'] != url:
        return resolve_http_redirect(headers['location'], depth+1)
    else:
        return url

May 8, 2010

admin

Tech

python

6 responses to “Resolving HTTP Redirects in Python”

Aaa says:

October 22, 2011 at 7:00 am

that’s work!!

Log in to Reply
etrain says:

December 6, 2011 at 12:41 pm

Does the trick, thx.

Log in to Reply
Ken says:

January 28, 2012 at 9:06 pm

just what I needed. However for Python 2.4 needed to refer to the urlparse components as [0], [1], etc

Thanks!

Log in to Reply
Zen says:

April 19, 2012 at 11:22 am

Two years later, this code is still useful, thanks! 🙂
Just to err on the safe side, I’d wrap the
conn.requestcall inside a try/except, in case some URL is mangled.

Log in to Reply
TheChrisONeil says:

April 27, 2012 at 9:51 am

I agree, this is excellent code and thanks!

Log in to Reply
some guy says:

June 8, 2012 at 9:31 pm

Nice, but some sites redirect you to error pages for non-supported browsers, so I suggest faking user agent by adding header like so:

conn.request(“HEAD’, path, headers={“User-Agent”: “Mozilla/4.0 (compatible; MSIE 9.0; Windows NT 6.1; .NET CLR 1.1.4322)”})

Log in to Reply

Resolving HTTP Redirects in Python

6 responses to “Resolving HTTP Redirects in Python”

Leave a Reply Cancel reply