URLs would make the modern world stop. But years of inconsistent parsing specification have created an environment that is ripe for exploit, putting many businesses at serious risk.
Security researchers discovered the following: The modern internet has serious issues with how it parses URLs.Specifically, too many URL parsers have inconsistent rules which has created an easily exploitable web for savvy attackers.
It doesn’t take much to see an example of URL-parsing being manipulated in nature to devastating effects. The researchers cited the late-2021 Log4j exploit as a perfect example.
According to the report, Log4j’s popularity led to millions of servers being affected and administrators having to find Log4j within their systems and expose themselves to proof of concept attacks in nature.
SEE: Google Chrome Security and UI Tips You Need to Know (TechRepublic Premium)
Log4j is a Java program that uses a malicious code to send a payload to the victim.
Log4j initially allowed Java lookups only to whitelisted sites. Attackers quickly resorted to finding a way around this fix. By adding the localhost URL to the malicious URL and seperating it with a # symbol attackers were able confusion the parsers and continue attacking.
Log4j was serious. The fact that it relied upon something so universal as URLs only makes it more serious. URL parsing vulnerabilities are understandable, but it is helpful to know exactly what it means. This report does a great job of that.
The URL is color-coded Figure AThe address is broken down into its five components. Systems for translating URLs into machine languages were developed in 1994, when URLs were first created. Since then, several requests for comment (RFCs) have been made to further develop URL standards.
Unfortunately, not all parsers keep up with the latest standards. This means that there are many parsers and many different ways to translate URLs. Herein lies the problem.
URL parsing flaws discovered by researchers
Snyk and Team82 researchers worked together to analyze 16 URL parsing library and tool tools written in many languages.
- urllib (Python).
- urllib3 (Python)
- rfc3986 (Python)
- httptools (Python).
- curl lib (cURL).
- Chrome (Browser).
- Uri (.NET).
- URL (Java).
- URI (Java).
- parse_url (PHP)
- URL (NodeJS
- url-parse (NodeJS)
- net/url (Go)
- uri (Ruby)
- URI (Perl).
They identified five possible scenarios in which URL parsers might behave in an unexpected way.
- In order to cause scheme confusion, the attacker uses a badly formatted URL scheme
- Slash confusion is when you use an unanticipated number of slashes
- Backslash confusion is when you don’t use any backslashes () into a URL
- URL-encoded Data Confusing, which refers URLs that contain URLs that include URL-encoded content
- Scheme mixup is the process of parsing URLs with specific schemes (HTTP, HTTPS etc.).
The research revealed eight vulnerabilities that were documented and fixed. However, the team stated that Flask versions not supported by Flask still have these vulnerabilities.
You can prevent URL parsing attacks
It’s a good idea to protect yourself—proactively—against vulnerabilities with the potential to wreak havoc on the Log4j scale, but given the low-level necessity of URL parsers, it might not be easy.
According to the report authors, it is important to first identify and understand the parsers in your software. Also, learn how they behave, what URLs they support, and other details. User-supplied URLs should not be trusted. Canonize them and validate them first. Parser differences will be taken into account during validation.
SEE: Password breach: Why pop-culture and passwords don’t mix (free PDF). (TechRepublic)
There are also some good practices tips on URL parsing that will help to minimize the chance of your URL being hacked.
- Avoid using URL parsers in any way. According to the report authors, “it is easy to achieve in many instances.”
- Use microservices to parse URLs at the front end. Then, send the parsed information across different environments.
- Parsers that are involved with application logic often behave differently. These differences can be understood and how they impact other systems.
- Canonicalize before parsing. This ensures that even if there is a malicious URL, the trusted known one gets forwarded to parser and others.