Research

Safety Discovers ReDoS Vulnerabilities in Top Python Packages

December 21, 2022
5 min read
This is the first of a series of posts disclosing novel security vulnerabilities found by PyUp's Cybersecurity Intelligence Team, which maintains PyUp's proprietary Python vulnerability database. In this post, we are covering ongoing research by our Cybersecurity team on an often underestimated type of vulnerability: Regular Expression Denial of Service (ReDoS) attack. Our team has found ReDoS attack vectors in some of the most popular and widely used python packages.

Safety Discovers ReDoS Vulnerabilities in Top Python Packages

This is the first of a series of posts disclosing novel security vulnerabilities found by Safety's Cybersecurity Intelligence Team, which maintains Safety's proprietary Python vulnerability database.

In this post, we are covering ongoing research by our Cybersecurity team on an often underestimated type of vulnerability: Regular Expression Denial of Service (ReDoS) attack.


Our team has found ReDoS attack vectors in some of the most popular and widely used python packages.

How Does a Regular Expression Denial of Service (ReDoS) Vulnerability Work?

This attack vector is frequently a concern when a web or other interface like a command line accepts user input. If successfully exploited, the load and resulting availability of your system or service may be affected. While interfaces are the most common concern with this type of attack, Regular Expressions can also be present in many other components of a software system stack, such as firewalls, backends, and many others.

undefined
(Source: Owasp)

A Regular Expression can be a powerful resource for matching your needed data and text. If misconfigured, Regular Expression matching can take too much time and CPU resources, resulting in a worst-case exponential time execution on the input length of the data being matched. 

When creating Regular Expressions that match the data you want, it can be easy to miss the critical step of also protecting from these potentially exponential time execution runs. This means that there can be poor-performing Regular Expressions in essential components of the software stack that are sometimes hard to detect while also being potentially exploitable.

A typical example of a vulnerable Regular Expression is ^(a+)+$

With an input like aaaX, this Regular Expression will "walk" all the possible paths for matching until finally failing in the ending character. This is called backtracking, and, in this case, the number of steps will grow exponentially with the number of a's in the input.

Depending on the programming language, different Regular Expression algorithms, engines, and implementations exist. It's a vast topic, and we won't try to cover that in detail here, but you can read more about Regular Expression matching and algorithms here.

The steps we followed

With the aid of a variety of tools, we found vulnerable Regular Expressions in many popular Python open-source projects.

We then determined if the exploitation was possible by creating proofs of concept, informing the package maintainers following the responsible disclosure process, and, in some cases, helping to issue patches.

As a result of this research, several flaws were assigned CVE identifiers, and patched versions of the affected packages were released. The first of our findings are summarized in this post.

Affected Packages

Top Python packages were examined according to their number of downloads. We analyzed all python packages with more than 20 million downloads in the last 30 days.

OAuthLib

OAuthLib is a framework that implements the logic of OAuth1 or OAuth2 without assuming a specific HTTP request object or web framework. OAuthLib is downloaded more than 85 million times each month. 

Vulnerability Details:

The IPv6 Regular Expression at uri_validate.py was vulnerable. As a result, an attacker providing a malicious redirect URI or leveraging the usage of uri_validate functions could cause DoS to OAuthlib's web application. 

Who is impacted?

The flaw was introduced in Oauthlib version 3.1.1. Affected are OAuthlib applications that use OAuth2.0 provider support or the uri_validate function directly. 

You can find more information here.

Patches

This issue was fixed in the 3.2.1 release of OAuthLib.

Workarounds

Is there a way for users to fix or remediate the vulnerability without upgrading to 3.2.1?
The redirect_uri can be verified in Web Toolkit such as  bottle-oauthlib or django-oauth-toolkit before OAuthlib is called. A sample check if : is present to reject the request can prevent the DoS, assuming no port or IPv6 is fundamentally required.

Proof of Concept:

is_absolute_uri("http://[:::::::::::::::::::::::::::::::::::::::]/path")

Assigned Identifiers:

Safety Cybersecurity: 50959

CVE: CVE-2022-36087

Wheel

Wheel is a reference implementation of the Python wheel packaging standard. Wheel is downloaded more than 155 million times each month. 

Vulnerability Details:

The Regular Expression used to verify the validity of Wheel file names was discovered to be vulnerable. This vulnerability can be exploited in two ways: through the use of Wheel as a library or through the use of the Wheel Command Line Interface (CLI). While the use of Wheel as a library is already discouraged in the documentation, the use of the CLI is a more significant concern because it is the "reference implementation of the Python Wheel packaging standard."

Wheel can also be used as an extension for setuptools (via bdist_wheel). However, this path doesn't seem to be vulnerable, as there are several parts where part of the payload string is sanitized.

Who is impacted?

Wheel versions <0.38.0 when parsing a maliciously crafted Wheel file.

Patches

Wheel 0.38.0 includes the patch. After our disclosure, the maintainers acknowledged the issue, discussed a possible fix, and then applied it in 0.38.0.

Assigned Identifiers:

Safety Cybersecurity: 51499

CVE: CVE-2022-40898

Mako

Mako is a template library written in Python. It provides a familiar, non-XML syntax that compiles into Python modules for maximum performance. Mako is downloaded more than 20 million times each month. 

Vulnerability Details:

The Regular Expression for matching tags start at the Lexer class for parsing template strings was vulnerable.

We found this to be reachable in Mako's code not only by calling the Lexer class but also through babelplugin and linguaplugin, which use process_file function of MessageExtractor class:

unnamed.webp

unnamed (1).webp
/mako/ext/extract.py

Patches

The issue was fixed, and a new patched version of mako (1.2.2) was released less than 24 hours after we sent the report to Mako's security email. 

Assigned Identifiers:

Safety Cybersecurity: 50870

CVE: CVE-2022-40023

Setuptools

Setuptools is downloaded more than 203 million times each month. 

Vulnerability Details:

The vulnerable Regular Expression is present in package_index. As a result, a user fetching malicious HTML from a package in PyPI or a custom PackageIndex page may be attacked.

undefined

It's worth mentioning that only a small portion of the user base is impacted by this flaw. Setuptools maintainers pointed out that package_index is deprecated (not formally, but “in spirit”) and the vulnerability isn't reachable through standard, recommended workflows.

Patches & Vendor Response

Maintainers acknowledged the issue, discussed a possible fix, and then applied it in 65.5.1.

Assigned Identifiers:

Safety Cybersecurity: 52495

CVE: CVE-2022-40897

Future

Future allows you to use a single, clean Python 3.x-compatible codebase to support both Python 2 and Python 3 with minimal overhead. Future is downloaded more than 33 million times each month. 

Vulnerability Details:

Here, we have a known, not patched vulnerability. LOOSE_HTTP_DATE_RE.match is called when using http.cookiejar.CookieJar to parse Set-Cookie headers returned by a server. Processing a response from a malicious HTTP server can lead to extreme CPU usage, and execution will be blocked for a long time. This issue was already found, patched, and backported in Python core in 2019.

You can find more information here.

Patches & Vendor Response

The issue was reported to Future maintainers on September 1, 2022, but there was no response. We noted that the project might be inactive because the last commit on the GitHub repository was on November 30, 2021, and the last closed issue was on November 27, 2020. There are now 182 open issues.

Assigned Identifiers:

Safety Cybersecurity: 52510

CVE: CVE-2022-40899

Acknowledgements: Special thanks to Sebastian Chnelik, Cybersecurity Analyst at Safety who researched and discovered these vulnerabilities.

Reduce vulnerability noise by 90%.
Get a demo today to learn more.