Python Requests - managing cookies



Answers

You should be reusing the whole session object, not the associated cookiejar. Use self.s for all requests you make.

If your requests are still failing when reusing the session, they will be failing for a different reason, not because you are not properly returning cookies.

Note that if you need to use auth=('username', 'password') then the authentication is HTTPAuth-based, not cookie-based. You need to pass in the same authentication for all calls. The requests session can do that for you too:

s = requests.session(auth=('username', 'password'))

If, however, the login page is a form with a username and password field, you'll need to call the form target instead. Check if the form is POST or GET, and check the fieldnames:

s.post(loginTarget, {usernamefield=username, passwordfield=password, otherfield=othervalue})

and not use HTTP authentication at all.

Question

I'm trying to get some content automatically from a site using requests (and bs4)

I have a script that gets a cookie:

def getCookies(self):
    username = 'username'
    password = 'password'
    URL = 'logonURL'
    r = requests.get(URL, auth=('username', 'password'))
    cookies = r.cookies

dump of the cookies looks like:

<<class 'requests.cookies.RequestsCookieJar'>[<Cookie ASP.NET_SessionId=yqokjr55ezarqbijyrwnov45 for URL.com/>, <Cookie BIGipServerPE_Journals.lww.com_80=1440336906.20480.0000 for URL.com/>, <Cookie JournalsLockCookie=id=a5720750-3f20-4207-a500-93ae4389213c&ip=IP address for URL.com/>]>

But when I pass the cookie object to the next URL:

 soup = Soup(s.get(URL, cookies = cookies).content)

its not working out - I can see by dumping the soup that I'm not giving the webserver my credentials properly

I tried running a requests session:

def getCookies(self):
    self.s = requests.session()
    username = 'username'
    password = 'password'
    URL = 'logURL'
    r = self.s.get(URL, auth=('username', 'password'))

and I get the same no joy.

I looked at the header via liveHttp in FF when I visit the 2nd page, and see a very different form:

Cookie: WT_FPC=id=264b0aa85e0247eb4f11355304127862:lv=1355317068013:ss=1355314918680; UserInfo=Username=username; BIGipServerPE_Journals.lww.com_80=1423559690.20480.0000; PlatformAuthCookie=true; Institution=ReferrerUrl=http://logonURL.com/?wa=wsignin1.0&wtrealm=urn:adis&wctx=http://URL.com/_layouts/Authenticate.aspx?Source=%252fpecnews%252ftoc%252f2012%252f06440&token=method|ExpireAbsolute; counterSessionGuidId=6e2bd57f-b6da-4dd4-bcb0-742428e08b5e; MyListsRefresh=12/13/2012 12:59:04 AM; ASP.NET_SessionId=40a04p45zppozc45wbadah45; JournalsLockCookie=id=85d1f38f-dcbb-476a-bc2e-92f7ac1ae493&ip=10.204.217.84; FedAuth=77u/PD94bWwgdmVyc2lvbj0iMS4wIiBlbmNvZGluZz0idXRmLTgiPz48U2VjdXJpdHlDb250ZXh0VG9rZW4gcDE6SWQ9Il9mMGU5N2M3Zi1jNzQ5LTQ4ZjktYTUxNS1mODNlYjJiNGNlYzUtNEU1MDQzOEY0RTk5QURCNDFBQTA0Mjc0RDE5QzREMEEiIHhtbG5zOnAxPSJodHRwOi8vZG9jcy5vYXNpcy1vcGVuLm9yZy93c3MvMjAwNC8wMS9vYXNpcy0yMDA0MDEtd3NzLXdzc2VjdXJpdHktdXRpbGl0eS0xLjAueHNkIiB4bWxucz0iaHR0cDovL2RvY3Mub2FzaXMtb3Blbi5vcmcvd3Mtc3gvd3Mtc2VjdXJlY29udmVyc2F0aW9uLzIwMDUxMiI+PElkZW50aWZpZXI+dXJuOnV1aWQ6ZjJmNGY5MGItMmE4Yy00OTdlLTkwNzktY2EwYjM3MTBkN2I1PC9JZGVudGlmaWVyPjxJbnN0YW5jZT51cm46dXVpZDo2NzMxN2U5Ny1lMWQ3LTQ2YzUtOTg2OC05ZGJhYjA3NDkzOWY8L0luc3RhbmNlPjwvU2VjdXJpdHlDb250ZXh0VG9rZW4+

I have redacted the username, password, and URLS from the question for obvious reasons.

Am I missing something obvious? is there a different / proper way to capture the cookie - the current method I'm using is not working.

EDIT:

This is a self standing version of the sessioned code:

s = requests.session()
username = 'username'
password = 'password'
URL = 'logonURL.aspx'
r = s.get(URL, auth=('username', 'password'))
URL = r"URL.aspx"
soup = Soup(s.get(URL).content)

reading a dump of the soup, I can see in the html that its telling me I don't have access - this string only appears via browser when you're not logged in.




Using urllib instead of wget for session authentication

You should use = instead : in your query:

s.post(loginUrl, data="name={}&password={}".format(username,password))

Also, you could (and should, following the requests best practice) proceed a dict to data argument:

s.post(
    loginUrl,
    data={
        'name': username,
        'password': password,
    }
) 





Links



Tags