Handling User Input

Recall the fundamental security problem described in Chapter 1: all user input is untrusted. A huge variety of different attacks against web applications involve submitting unexpected input, crafted to cause behavior that was not intended by the application’s designers. Correspondingly, a key requirement for an application’s security defenses is that it must handle user input in a safe manner.

Input-based vulnerabilities can arise anywhere within an application’s functionality, and in relation to practically every type of technology in common use. “Input validation” is often cited as the necessary defense against these attacks. However, there is no single protective mechanism that can be employed everywhere, and defending against malicious input is often not as straightforward as it sounds.

Varieties of Input

A typical web application processes user-supplied data in a range of different forms. Some kinds of input validation may not be feasible or desirable for all of these forms of input. Figure -1 shows the kind of input validation often performed by a user registration function.

In many cases, an application may be able to impose very stringent validation checks on a specific item of input. For example, a username submitted to a login function may be required to have a maximum length of eight characters and contain only alphabetical letters.

In other cases, the application must tolerate a wider range of possible input. For example, an address field submitted to a personal details page might legitimately contain letters, numbers, spaces, hyphens, apostrophes, and other characters. For this item, there are still restrictions that can feasibly be imposed, however. The data should not exceed a reasonable length limit (such as 50 characters), and should not contain any HTML mark-up.

In some situations, an application may need to accept completely arbitrary input from users. For example, a user of a blogging application may create a blog whose subject is web application hacking. Posts and comments made to the blog may quite legitimately contain explicit attack strings that are being  discussed. The application may need to store this input within a database, write it to disk, and display it back to users in a safe way. It cannot simply reject the input because it looks potentially malicious without substantially diminishing the value of the application to some of its user base.

Screenshot from 2020-04-14 00:40:27

Figure -1: An application performing input validation

In addition to the various kinds of input that is entered by users via the browser interface, a typical application also receives numerous items of data that began their life on the server and that are sent to the client so that the client can transmit them back to the server on subsequent requests. This includes items such as cookies and hidden form fields, which are not seen by ordinary users of the application but which an attacker can of course view and modify.

In these cases, applications can often perform very specific validation of the data received. For example, a parameter might be required to have one of a specific set of known values, such as a cookie indicating the user’s preferred language, or to be in a specific format, such as a customer ID number. Further, when an application detects that server-generated data has been modified in a way that is not possible for an ordinary user with a standard browser, this is often an indication that the user is attempting to probe the application for vulnerabilities. In these cases, the application should reject the request and log the incident for potential investigation.

Approaches to Input Handling

There are various broad approaches that are commonly taken to the problem of handling user input. Different approaches are often preferable for different situations and different types of input, and a combination of approaches may sometimes be desirable.

“Reject Known Bad”

This approach typically employs a blacklist containing a set of literal strings or patterns that are known to be used in attacks. The validation mechanism blocks any data that matches the blacklist and allows everything else.

In general, this is regarded as the least effective approach to validating user input, for two main reasons. First, a typical vulnerability in a web application can be exploited using a wide variety of different input, which may be encoded or represented in various different ways. Except in the simplest of cases, it is likely that a blacklist will omit some patterns of input that can be used to attack the application. Second, techniques for exploitation are constantly evolving. Novel methods for exploiting existing categories of vulnerability are unlikely to be blocked by current blacklists.

“Accept Known Good”

This approach employs a white list containing a set of literal strings or patterns, or a set of criteria, that is known to match only benign input. The validation mechanism allows data that matches the white list, and blocks everything else. For example, before looking up a requested product code in the database, an application might validate that it contains only alphanumeric characters and is exactly six characters long. Given the subsequent processing that will be done on the product code, the developers know that input passing this test cannot possibly cause any problems.

In cases where this approach is feasible, it is regarded as the most effective way of handling potentially malicious input. Provided that due care is taken in constructing the white list, an attacker will not be able to use crafted input to
interfere with the application’s behavior. However, there are numerous situations in which an application must accept data for processing that does not meet any reasonable criteria for what is known to be “good.” For example, some people’s names contain the apostrophe and hyphen characters. These can be used in attacks against databases, but it may be a requirement that the application should permit anyone to register under their real name. Hence, while it is often extremely effective, the white-list-based approach does not represent an all-purpose solution to the problem of handling user input.


This approach recognizes the need to sometimes accept data that cannot be guaranteed as safe. Instead of rejecting this input, the application sanitizes it in various ways to prevent it from having any adverse effects. Potentially malicious characters may be removed from the data altogether, leaving only what is known to be safe, or they may be suitably encoded or “escaped” before further processing is performed. Approaches based on data sanitization are often highly effective, and in many situations they can be relied upon as a general solution to the problem of malicious input. For example, the usual defense against cross-site scripting attacks is to HTML-encode dangerous characters before these are embedded into pages of the application (see Chapter 12). However, effective sanitization may be difficult to achieve if several kinds of potentially malicious data need to be accommodated within one item of input. In this situation, a boundary validation approach is desirable.

Safe Data Handling

Very many web application vulnerabilities arise because user-supplied data is processed in unsafe ways. It is often the case that vulnerabilities can be avoided, not by validating the input itself but by ensuring that the processing that is performed on it is inherently safe. In some situations, there are safe programming methods available that avoid common problems. For example, SQL injection attacks can be prevented through the correct use of parameterized queries for database access (see Chapter 9). In other situations, application functionality can be designed in such a way that inherently unsafe practices, such as passing user input to an operating system command interpreter, are avoided altogether.

This approach cannot be applied to every kind of task that web applications need to perform, but where it is available it is an effective general approach to handling potentially malicious input.

Semantic Checks

The defenses described so far all address the need to defend the application against various kinds of malformed data whose content has been crafted to interfere with the application’s processing. However, with some vulnerabilities the input supplied by the attacker is identical to the input that an ordinary, non-malicious user may submit. What makes it malicious is the different circumstances in which it is submitted. For example, an attacker might seek to gain access to another user’s bank account by changing an account number transmitted in a hidden form field. No amount of syntactic validation will distinguish between the user’s data and the attacker’s. To prevent unauthorized access, the application needs to validate that the account number submitted belongs to the user who has submitted it.

Boundary Validation

The idea of validating data across trust boundaries is a familiar one. The core security problem with web applications arises because data received from users is untrusted. While input validation checks implemented on the client side may improve performance and the user’s experience, they do not provide any assurance over the data that actually reaches the server. The point at which user data is first received by the server-side application represents a huge trust boundary, at which the application needs to take measures to defend itself against malicious input.

Given the nature of the core problem, it is tempting to think of the input validation problem in terms of a frontier between the Internet, which is “bad” and untrusted, and the server-side application, which is “good” and trusted. In
this picture, the role of input validation is to clean potentially malicious data on arrival and then pass the clean data to the trusted application. From this point onwards, the data may be trusted and processed without any further checks or concern about possible attacks.

next article is ..Handling Attackers……….