pythonpyramidpost-mortem

TL;DR: the standard Cookie library cannot handle the malformed cookie persisted by the 3rd party javascript library due to this bug, which fails our authentication system.

Recently, the SurveyMonkey Contribute, SMC thereafter, had an outage: a group of users could not login or take survey in our site. They were bounced back and forth between the home page and login page until the browser was tired of this game and threw the towel.

Before we jump to the technical details, let me explain how the authentication works in SMC:

• We use pyramid’s session to track the authentication: the session data is persisted to the redis, and the session cookie is set to the browser after the user successfully logins.

• As a SurveyMonkey product, we also honor the auth cookie issued from *.surveymonkey.com in the login page, we will automatically log the user in and create the session if she can be authenticated via the auth cookie.

It looked like the authentication by auth cookie worked as expected, but the session cookie was somehow corrupted. When the user was redirected from the login page to the home page, the home page failed to authenticate the user with the session cookie, thus it redirected the user to the login page again, — which incurred the redirect loop.

Our first instinct was the cookie was messed up in the client side as the issue only reproduced to specific users, it simply did not reproduce in my machine. But my colleague, Nate managed to reproduce the issue in the Chrome’s incognito mode and also Firefox. No.

Our second guess was the redis operation issue. No, we verified that the session data were correctly persisted to the redis.

We need to dig deeper.

## Debug in a sandbox

Luckily (or unluckily?), Nate also encountered this issue in the testing environment, so we could debug this issue in a sandbox with the liberty to poke around.

First we replicated virtualenv in the testing environment, then we started a paster instance with the same app.ini but bound to an unused port:

/tmp/debugging/bin/pserve /tmp/app.ini --server-name=paste httpport=8888

Thanks to the Chrome Developer Tools, we can replay the request in curl: click the Network tab in Chrome Developer Tools, and right click the request to replay, and choose Copy as cURL from the context menu, see below:

And in another terminal, run the curl command with the overridden endpoint and host header to hit the paster instance in our sandbox:

curl localhost:8888 -H 'Host: contribute.example.com' [... remaining options]

With the verbose DEBUG logging enabled, Nate pointed out that during the redirects, only the setter of the session was invoked, the getter was never called. How this could be possible?

After setting multiple breakpoints and poking around, the following snippet caught our attention:

        if self.use_cookies:
if secret:
try:
else:
self.cookie = Cookie.SimpleCookie(input=cookieheader)

This piece of code is how beaker loads the session key by parsing the cookie header, and we found that the Cookie.CookieError were always raised during the redirects! Since SignedCookie is a thin wrapper of Cookie.BaseCookie, we can reproduce the issue as:

>>> import Cookie
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python2.7/Cookie.py", line 579, in __init__
self.__ParseString(rawdata)
File "/usr/lib/python2.7/Cookie.py", line 665, in __ParseString
self.__set(K, rval, cval)
File "/usr/lib/python2.7/Cookie.py", line 585, in __set
M.set(key, real_value, coded_value)
File "/usr/lib/python2.7/Cookie.py", line 460, in set
raise CookieError("Illegal key value: %s" % key)
Cookie.CookieError: Illegal key value: proxy:srcTag:yj

Eureka! The CookieError was raised due to the malformed key proxy:srcTag:yj when beaker tries to load the session from the cookie; then the cookie was discarded, which effectively ignored the session.

But why the auth cookie is recognized in the login page, should the CookieError also be raised?

Further literature research showed that the pyramid reinvented its own wheel and did NOT depend on the standard Cookie library.

The Google search shows that the cookie might be set by BrightTag, a tracking service provider. It is later confirmed by searching “proxy:srcTag:yj” in the source code.

The cookie name is specified in the RFC6265 as a token, which is defined in the RFC2616 as:

   token          = 1*<any CHAR except CTLs or separators>
separators     = "(" | ")" | "<" | ">" | "@"
| "," | ";" | ":" | "\" | <">
| "/" | "[" | "]" | "?" | "="
| "{" | "}" | SP | HT

Though the constraints are not enforced by many browsers according to the Wikipedia. This issue is also acknowledged by the core python developers, and fixed in python 2.7.10. Later, my colleague Junkang pointed out that the bug is also fixed in beaker 1.8.0.