Browser URL Encoding Decoding and XSS

This article was originally written in early 2010, and has been lightly updated in 2015.

Cross-site scripting attacks can be difficult to reproduce because of browser issues.  This problem is exacerbated by the fact that there is very little information regarding URL encoding and decoding.  Hopefully this will help you understand the problem and the browser oddities that can make XSS difficult to reproduce.

First, most browsers URL encode any special characters in the URL, so if you type in < in a URL, the browser converts it to %3C (as required by RFC 1738 Section 2.2). Update 11/2015: Now all browser (including IE) follow the RFC. As of IE11 URL encoding no longer an attack vector (https://msdn.microsoft.com/library/bg182625(v=vs.85).aspx#utf8). Using IE used to be a trick for this scenario, but now scenarios which require unencoded input directly in the URL are generally not exploitable.

In most instances, web applications take that input and then URL decode it so they can work with the actual user input, not a URL encoded version.  This is incidentally helpful to an attacker who can only pass in URL encoded input because the application unencodes it for the attacker.  If this is the case, an attacker does not need to worry about any browsers doing URL encoding because the application will decode the attack for them.

In some instances the application will not unencode the input.  This means that the attacker needs to find a way to bypass the browser’s URL encoding. Internet Explorer before IE11 doesn’t conform to RFC 1738 and passes along URLs without URL encoding it, although IE11 still sends URL parameter names and values without URL encoding them. Built-in XSS filters will commonly disable the attack, but you shouldn’t rely on an browser’s XSS filter to prevent XSS in your site.

A second way to prevent the browser from URL encoding the input is to use the enctype=”text/plain” tag and to submit the form as a POST.  According to the Browser Security Handbook, this is supported by current versions of IE, FF, and Opera (Update: this was written in 2009 and is now likely out of date).  To use these scenarios you have to use a POST, fortunately in almost every instance you will be able to convert GET requests to POST requests.   Here is some HTML I use to submit the attack as a POST and prevent the browser from encoding it.

Another interesting oddity is that when you copy URLs out of Firefox or Chrome they are URL encoded, which can be very annoying.  To prevent this simply type a character in the URL and erase it, before you copy the URL.