python - speed - urllib2 vs urllib3




What are the differences between the urllib, urllib2, and requests module? (6)

In Python, what are the differences between the urllib, urllib2, and requests module? Why are there three? They seem to do the same thing...


urllib and urllib2 are both Python modules that do URL request related stuff but offer different functionalities.

1) urllib2 can accept a Request object to set the headers for a URL request, urllib accepts only a URL.

2) urllib provides the urlencode method which is used for the generation of GET query strings, urllib2 doesn't have such a function. This is one of the reasons why urllib is often used along with urllib2.

Requests - Requests’ is a simple, easy-to-use HTTP library written in Python.

1) Python Requests encodes the parameters automatically so you just pass them as simple arguments, unlike in the case of urllib, where you need to use the method urllib.encode() to encode the parameters before passing them.

2) It automatically decoded the response into Unicode.

3) Requests also has far more convenient error handling.If your authentication failed, urllib2 would raise a urllib2.URLError, while Requests would return a normal response object, as expected. All you have to see if the request was successful by boolean response.ok

For example reference - https://dancallahan.info/journal/python-requests/


I know its been said already, but I'd highly recommend the Requests python package: http://docs.python-requests.org/en/latest/index.html

If you've used languages other than python, you're probably thinking urllib and urllib2 are easy to use, not much code, and highly capable, that's how I used to think. But the Requests package is so unbelievably useful and short that everyone should be using it.

First, it supports a fully restful API, and is as easy as:

import requests
...

resp = requests.get('http://www.mywebsite.com/user')
resp = requests.post('http://www.mywebsite.com/user')
resp = requests.put('http://www.mywebsite.com/user/put')
resp = requests.delete('http://www.mywebsite.com/user/delete')

Regardless of whether GET/POST you never have to encode parameters again, it simply takes a dictionary as an argument and is good to go.

userdata = {"firstname": "John", "lastname": "Doe", "password": "jdoe123"}
resp = requests.post('http://www.mywebsite.com/user', data=userdata)

Plus it even has a built in json decoder (again, i know json.loads() isn't a lot more to write, but this sure is convenient):

resp.json()

Or if your response data is just text, use:

resp.text

This is just the tip of the iceberg. This is the list of features from the requests site:

  • International Domains and URLs
  • Keep-Alive & Connection Pooling
  • Sessions with Cookie Persistence
  • Browser-style SSL Verification
  • Basic/Digest Authentication
  • Elegant Key/Value Cookies
  • Automatic Decompression
  • Unicode Response Bodies
  • Multipart File Uploads
  • Connection Timeouts
  • .netrc support
  • List item
  • Python 2.6—3.4
  • Thread-safe.

Just to add to the existing answers, I don't see anyone mentioning that python requests is not a native library. If you are ok with adding dependencies, then requests is fine. However, if you are trying to avoid adding dependencies, urllib is a native python library that is already available to you.


One considerable difference is about porting Python2 to Python3. urllib2 does not exist for python3 and its methods ported to urllib. So you are using that heavily and want to migrate to Python3 in future, consider using urllib. However 2to3 tool will automatically do most of the work for you.


You should generally use urllib2, since this makes things a bit easier at times by accepting Request objects and will also raise a URLException on protocol errors. With Google App Engine though, you can't use either. You have to use the URL Fetch API that Google provides in its sandboxed Python environment.


urllib2 provides some extra functionality, namely the urlopen() function can allow you to specify headers (normally you'd have had to use httplib in the past, which is far more verbose.) More importantly though, urllib2 provides the Request class, which allows for a more declarative approach to doing a request:

r = Request(url='http://www.mysite.com')
r.add_header('User-Agent', 'awesome fetcher')
r.add_data(urllib.urlencode({'foo': 'bar'})
response = urlopen(r)

Note that urlencode() is only in urllib, not urllib2.

There are also handlers for implementing more advanced URL support in urllib2. The short answer is, unless you're working with legacy code, you probably want to use the URL opener from urllib2, but you still need to import into urllib for some of the utility functions.

Bonus answer With Google App Engine, you can use any of httplib, urllib or urllib2, but all of them are just wrappers for Google's URL Fetch API. That is, you are still subject to the same limitations such as ports, protocols, and the length of the response allowed. You can use the core of the libraries as you would expect for retrieving HTTP URLs, though.





python-2.x