Python urllib2 and TLS
I was thinking about dropping support for TLSv1.0 in this webserver. All the major browser vendors have announced that they are dropping it from their browsers. And you’d think that since TLSv1.2 has been around for a decade, even very old clients ought to be able to negotiate a TLSv1.2 connection.
But, when I checked, you can imagine my surprise that this webserver receives a ton of TLSv1 connections… including from the application that powers Planet Musings. Yikes!
The latter is built around the Universal Feed Parser which uses the standard Python urrlib2 to negotiate the connection. And therein lay the problem …
At least in its default configuration, urllib2
won’t negotiate anything higher than a TLSv1.0 connection. And, sure enough, that’s a problem:
ERROR:planet.runner:Error processing http://excursionset.com/blog?format=RSS ERROR:planet.runner:URLError: <urlopen error [SSL: TLSV1_ALERT_PROTOCOL_VERSION] tlsv1 alert protocol version (_ssl.c:590)> ... ERROR:planet.runner:Error processing https://www.scottaaronson.com/blog/?feed=atom ERROR:planet.runner:URLError: <urlopen error [Errno 54] Connection reset by peer> ... ERROR:planet.runner:Error processing https://www.science20.com/quantum_diaries_survivor/feed ERROR:planet.runner:URLError: <urlopen error EOF occurred in violation of protocol (_ssl.c:590)>
Even if I’m still supporting TLSv1.0, others have already dropped support for it.
Now, you might find it strange that urllib2
defaults to a TLSv1.0 connection, when it’s certainly capable of negotiating something more secure (whatever OpenSSL supports). But, prior to Python 2.7.9, urllib2
didn’t even check the server’s SSL certificate. Any encryption was bogus (wide open to a MiTM attack). So why bother negotiating a more secure connection?
Switching from the system Python to Python 2.7.15 (installed by Fink) yielded a slew of
ERROR:planet.runner:URLError: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:726)>
errors. Apparently, no root certificate file was getting loaded.
The solution to both of these problems turned out to be:
--- a/feedparser/http.py +++ b/feedparser/http.py @@ -5,13 +5,15 @@ import gzip import re import struct import zlib +import ssl +import certifi try: import urllib.parse import urllib.request except ImportError: from urllib import splithost, splittype, splituser - from urllib2 import build_opener, HTTPDigestAuthHandler, HTTPRedirectHandler, HTTPDefaultErrorHandler, Request + from urllib2 import build_opener, HTTPSHandler, HTTPDigestAuthHandler, HTTPRedirectHandler, HTTPDefaultErrorHandler, Request from urlparse import urlparse class urllib(object): @@ -170,7 +172,9 @@ def get(url, etag=None, modified=None, agent=None, referrer=None, handlers=None, # try to open with urllib2 (to use optional headers) request = _build_urllib2_request(url, agent, ACCEPT_HEADER, etag, modified, referrer, auth, request_headers) - opener = urllib.request.build_opener(*tuple(handlers + [_FeedURLHandler()])) + context = ssl.SSLContext(ssl.PROTOCOL_TLS) + context.load_verify_locations(cafile=certifi.where()) + opener = urllib.request.build_opener(*tuple(handlers + [HTTPSHandler(context=context)] + [_FeedURLHandler()])) opener.addheaders = [] # RMK - must clear so we only send our custom User-Agent f = opener.open(request) data = f.read()
Actually, the lines in red aren’t strictly necessary. As long as you set a ssl.SSLContext()
, a suitable set of root certificates gets loaded. But, honestly, I don’t trust the internals of urllib2
to do the right thing anymore, so I want to make sure that a well-curated set of root certificates is used.
With these changes, Venus
negotiates a TLSv1.3 connection. Yay!
Now, if only everyone else would update their Python scripts …
Update:
This article goes some of the way towards explaining the brokenness of Python’s TLS implementation on MacOSX. But only some of the way …Update 2:
Another offender turned out to be the very application (MarsEdit 3) that I used to prepare this post. Upgrading to MarsEdit 4 was a bit of a bother. Apple’s App-sandboxing prevented myMarkdown+itex2MML
text filter from working. One is no longer allowed to use IPC::Open2
to pipe text through the commandline itex2MML
. So I had to create a Perl Extension Module for itex2MML
. Now there’s a MathML::itex2MML module on CPAN to go along with the Rubygem.