Skip to the Main Content

Note:These pages make extensive use of the latest XHTML and CSS Standards. They ought to look great in any standards-compliant modern browser. Unfortunately, they will probably look horrible in older browsers, like Netscape 4.x and IE 4.x. Moreover, many posts use MathML, which is, currently only supported in Mozilla. My best suggestion (and you will thank me when surfing an ever-increasing number of sites on the web which have been crafted to use the new standards) is to upgrade to the latest version of your browser. If that's not possible, consider moving to the Standards-compliant and open-source Mozilla browser.

December 27, 2018

Python urllib2 and TLS

I was thinking about dropping support for TLSv1.0 in this webserver. All the major browser vendors have announced that they are dropping it from their browsers. And you’d think that since TLSv1.2 has been around for a decade, even very old clients ought to be able to negotiate a TLSv1.2 connection.

But, when I checked, you can imagine my surprise that this webserver receives a ton of TLSv1 connections… including from the application that powers Planet Musings. Yikes!

The latter is built around the Universal Feed Parser which uses the standard Python urrlib2 to negotiate the connection. And therein lay the problem …

At least in its default configuration, urllib2 won’t negotiate anything higher than a TLSv1.0 connection. And, sure enough, that’s a problem:

ERROR:planet.runner:Error processing http://excursionset.com/blog?format=RSS
ERROR:planet.runner:URLError: <urlopen error [SSL: TLSV1_ALERT_PROTOCOL_VERSION] tlsv1 alert protocol version (_ssl.c:590)>
...
ERROR:planet.runner:Error processing https://www.scottaaronson.com/blog/?feed=atom
ERROR:planet.runner:URLError: <urlopen error [Errno 54] Connection reset by peer>
...
ERROR:planet.runner:Error processing https://www.science20.com/quantum_diaries_survivor/feed
ERROR:planet.runner:URLError: <urlopen error EOF occurred in violation of protocol (_ssl.c:590)>

Even if I’m still supporting TLSv1.0, others have already dropped support for it.

Now, you might find it strange that urllib2 defaults to a TLSv1.0 connection, when it’s certainly capable of negotiating something more secure (whatever OpenSSL supports). But, prior to Python 2.7.9, urllib2 didn’t even check the server’s SSL certificate. Any encryption was bogus (wide open to a MiTM attack). So why bother negotiating a more secure connection?

Switching from the system Python to Python 2.7.15 (installed by Fink) yielded a slew of

ERROR:planet.runner:URLError: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:726)>

errors. Apparently, no root certificate file was getting loaded.

The solution to both of these problems turned out to be:

--- a/feedparser/http.py
+++ b/feedparser/http.py
@@ -5,13 +5,15 @@ import gzip
 import re
 import struct
 import zlib
+import ssl
+import certifi

 try:
     import urllib.parse
     import urllib.request
 except ImportError:
     from urllib import splithost, splittype, splituser
-    from urllib2 import build_opener, HTTPDigestAuthHandler, HTTPRedirectHandler, HTTPDefaultErrorHandler, Request
+    from urllib2 import build_opener, HTTPSHandler, HTTPDigestAuthHandler, HTTPRedirectHandler, HTTPDefaultErrorHandler, Request
     from urlparse import urlparse

     class urllib(object):
@@ -170,7 +172,9 @@ def get(url, etag=None, modified=None, agent=None, referrer=None, handlers=None,

     # try to open with urllib2 (to use optional headers)
     request = _build_urllib2_request(url, agent, ACCEPT_HEADER, etag, modified, referrer, auth, request_headers)
-    opener = urllib.request.build_opener(*tuple(handlers + [_FeedURLHandler()]))
+    context = ssl.SSLContext(ssl.PROTOCOL_TLS)
+    context.load_verify_locations(cafile=certifi.where())
+    opener = urllib.request.build_opener(*tuple(handlers + [HTTPSHandler(context=context)] + [_FeedURLHandler()]))
     opener.addheaders = [] # RMK - must clear so we only send our custom User-Agent
     f = opener.open(request)
     data = f.read()

Actually, the lines in red aren’t strictly necessary. As long as you set a ssl.SSLContext(), a suitable set of root certificates gets loaded. But, honestly, I don’t trust the internals of urllib2 to do the right thing anymore, so I want to make sure that a well-curated set of root certificates is used.

With these changes, Venus negotiates a TLSv1.3 connection. Yay!

Now, if only everyone else would update their Python scripts …

Update:

This article goes some of the way towards explaining the brokenness of Python’s TLS implementation on MacOSX. But only some of the way …

Update 2:

Another offender turned out to be the very application (MarsEdit 3) that I used to prepare this post. Upgrading to MarsEdit 4 was a bit of a bother. Apple’s App-sandboxing prevented my Markdown+itex2MML text filter from working. One is no longer allowed to use IPC::Open2 to pipe text through the commandline itex2MML. So I had to create a Perl Extension Module for itex2MML. Now there’s a MathML::itex2MML module on CPAN to go along with the Rubygem.
Posted by distler at December 27, 2018 11:28 AM

TrackBack URL for this Entry:   https://golem.ph.utexas.edu/cgi-bin/MT-3.0/dxy-tb.fcgi/3079

1 Comment & 0 Trackbacks

Re: Python urllib2 and TLS

very nice article i was searching on web for such a grate article, keep it up bro and thanks for sharing.
for more click here to find you dream job dream jobs

Posted by: rocky on January 14, 2019 3:02 AM | Permalink | Reply to this

Post a New Comment