NoSQL Injection
Continuing from module 3, and the analysis of other Injection methods, let’s examine NoSQL and how to perform injection attacks to it. The term NoSQL is used to refer to various data stores that break from standard relational database architectures. NoSQL data stores represent data using key/value mappings and do not rely on a fixed schema such as a conventional database table. Keys and values can be arbitrarily defined, and the format of the value generally is not relevant to the data store. A further feature of key/value storage is that a value may be a data structure itself, allowing hierarchical storage, unlike the flat data structure inside a database schema.
NoSQL advocates claim this has several advantages, mainly in handling very large data sets, where the data store’s hierarchical structure can be optimized exactly as required to reduce the overhead in retrieving data sets. In these instances, a conventional database may require complex cross-referencing of tables to retrieve information on behalf of an application. From a web application security perspective, the key consideration is how the application queries data, because this determines what forms of injection are possible. In the case of SQL injection, the SQL language is
broadly similar across different database products. NoSQL, by contrast, is a name given to a disparate range of data stores, all with their own behaviors. They don’t all use a single query language. Here are some of the common query methods used by NoSQL data stores:
● Key/value lookup
● XPath
● Programming languages such as JavaScript
Let’s now recall a simple SQL query used from a login form:
SELECT * FROM users WHERE username = '$username' AND password = '$password'
The SQL statement will be written like this in MongoDB:
db.users.find({username: username, password: password});
MongoDB is a NoSQL database program, and as we can see here, we no longer deal with a query language in the form of a string, as SQL is, therefore one would think that injection is no longer possible. But this is not the case, of course. If we assume here that the username field, or parameter, is coming from a deserialized JSON object, manipulation of the above query is not only possible but inevitable. Such as, if we supply a JSON document as the input to the application, an attacker will be able to perform the exact same login bypass that was before possible only with SQL injection:
{
"username": {"$gt": ""},
"password": {"$gt": ""}
}
The actual vulnerable handler of the request will look like this:
app.post('/', function (req, res) {
db.users.find({username: req.body.username, password: req.body.password},
function (err, users) {
});
});
Here, the username and password fields are not validated to ensure that they are strings. Therefore, when the JSON document is deserialized, those fields may contain anything but strings that can be used to manipulate the structure of the query. In MongoDB, the field $gt has a special meaning, which is used as the greater than comparator. As such, the username and the password from the database will be compared to the empty string "" and as a result return a positive outcome, i.e. a true statement. The HTTP request to exploit this vulnerability will look more or less like the one below.
POST http://target/ HTTP/1.1
Content-Type: application/json
{
"username": {"$gt": ""},
"password": {"$gt": ""}
}
Let’s see another example:
POST http://target/ HTTP/1.1
Content-Type: application/x-www-form-urlencoded
username[$gt]=&password[$gt]=
It is not unusual to see JSON documents as we saw in the previous example, but they are not as widespread as
url-encoded key-value pairs, simply known as urlencoding. Now, in this example, the string username[$gt]= is a
special syntax used by the qs module (default in ExpressJS ). This syntax is the equivalent of making a JavaScript
object/hash with a single parameter called $gt mapped to no value. In essence, the request above will result into a
JavaScript object like this:
{
"username": {"$gt": undefined},
"password": {"$gt": undefined}
}
78
Securing your DataStore (Input Validation, Output Encoding, Parameterized Queries)
There are many ways to secure our database. We have talked about some of them in this course but never analyzed them. Securing our datastore is really important, because many of users tend to only partially secure their database, which may cause it to still be vulnerable. Let’s examine some of the best ways.
Input validation is the process of testing input received by the application for compliance against a standard defined within the application. It can be as simple as strictly typing a parameter and as complex as using regular expressions or business logic to validate input.There are two different types of input validation approaches: whitelist validation (sometimes referred to as inclusion or positive validation) and blacklist validation (sometimes known as exclusion or negative validation).
Whitelist validation is the practice of only accepting input that is known to be good. This can involve validating compliance with the expected known values, type, length or size, numeric range, or other format standards before accepting the input for further processing. For example, validating that an input value is a credit card number may involve validating that the input value contains only numbers, is between 13 and 16 digits long, and passes the business logic check of correctly passing the Luhn formula (the formula for calculating the validity of a number based on the last “check” digit of the card). When using whitelist validation we should consider the following points:
● Known value
● Data type
● Data size
● Data range
● Data content
A common method of implementing content validation is to use regular expressions. Let’s see an example of a regular expression for validating a US ZIP Code contained in a string:
^\d{5}(-\d{4})?$
In this case, the regular expression matches both five-digit and five-digit + four-digit ZIP Codes as follows:
● ^\d{5} Match exactly five numeric digits at the start of the string.
● (–\d{4})? Match the dash character plus exactly four digits either once (present) or not at all (not present).
● $ This would appear at the end of the string. If there is additional content at the end of the string, the regular expression will not match.
In general, whitelist validation is the more powerful of the two input validation approaches. It can, however, be difficult to implement in scenarios where there is complex input, or where the full set of possible inputs cannot be easily determined. Difficult examples may include applications that are localized in languages with large character sets (e.g. Unicode character sets such as the various Chinese and Japanese character sets).
To continue, blacklisting is the practice of only rejecting input that is known to be bad. This commonly involves rejecting input that contains content that is specifically known to be malicious by looking through the content for a number of “known bad” characters, strings, or patterns. This approach is generally weaker than whitelist validation because the list of potentially bad characters is extremely large, and as such, any list of bad content is likely to be large, slow to run through, incomplete, and difficult to keep up to date.
A common method of implementing a blacklist is also to use regular expressions, with a list of characters or strings to disallow, such as the following example:
‘|%|--|;|/\∗|\\\∗|_|\[|@|xp_
In general, we should not use blacklisting in isolation, and we should use whitelisting if possible. However, in scenarios where we cannot use whitelisting, blacklisting can still provide a useful partial control. In these scenarios, however, it is recommended that we use blacklisting in conjunction with output encoding to ensure that input passed elsewhere (e.g. to the database) is subject to an additional check to ensure that it is correctly handled to prevent SQL injection.
For an example, in PHP it is usual to escape parameters (blacklist them) using the function
mysqli_real_escape_string(); before sending the SQL query:
$mysqli = new mysqli('hostname', 'db_username', 'db_password', 'db_name');
$query = sprintf("SELECT * FROM `Users` WHERE UserName='%s' AND Password='%s'",
$mysqli->real_escape_string($username),
$mysqli->real_escape_string($password));
$mysqli->query($query);
This function prepends backslashes to the following characters: \x00, \n, \r, \, ', " and \x1a. This function is normally used to make data safe before sending a query to MySQL.
There are other functions for many database types in PHP such as pg_escape_string() for PostgreSQL. The function addslashes(string $str) works for escaping characters, and is used especially for querying on databases that do not have escaping functions in PHP. It returns a string with backslashes before characters that need to be quoted in database queries, etc. These characters are single quote ('), double quote ("), backslash (\) and NUL (the NULL byte), as we said earlier.
Routinely passing escaped strings to SQL is error prone because it is easy to forget to escape a given string. Creating a transparent layer to secure the input can reduce this error-proneness, if not entirely eliminate it.
As we said earlier, some of the ways are not good to be implemented alone in a database, so in addition to Input Validation, most of the time it is necessary to implement Output Encoding. With this technique, we encode what is passed between different modules or parts of the application. In the context of SQL injection, this is applied as requirements to encode, or “quote,” content that is sent to the database to ensure that it is not treated inappropriately.
Even in situations where whitelist input validation is used, sometimes content may not be safe to send to the database, especially if it is to be used in dynamic SQL. For example, a last name such as O’Boyle is valid, and should be allowed through whitelist input validation. This name, however, could cause significant problems in situations where this input is used to dynamically generate a SQL query, such as the following:
String sql = “INSERT INTO names VALUES (‘“ + fname + ”‘,‘“ + lname + ”’);”
Additionally, malicious input into the first name field, such as:
’,’’); DROP TABLE names--
can be used to alter the SQL executed to the following:
INSERT INTO names VALUES (’’,’’); DROP TABLE names--’,’’);
Here, besides other methods, it will be necessary to encode (or quote) the data sent to the database. This approach has a limitation, in that it is necessary to encode values every time they are used in a database query; if one encode is missed, the application may well be vulnerable to SQL injection. Let’s now see an example of encoding in MySQL, where our practical examples are based.
MySQL Server uses the single quote as a terminator for a string literal, so it is necessary to encode the single quote when it is included in strings that will be included within dynamic SQL. In MySQL, we can do this either by replacing the single quote with two single quotes as with other database systems, or by quoting the single quote with a backslash (\). Either of these will cause the single quote to be treated as a part of the string literal, and not as a string terminator, effectively preventing an attacker from being able to exploit SQL injection on that particular query. We can do this in Java via code that is similar to the following:
sql = sql.replace(“’”, “\’”);
Additionally, PHP provides the mysql_real_escape() function, which will automatically quote the single quote with a backslash, as well as quoting other potentially harmful characters such as 0x00 (NULL), newline (\n), carriage return (\r), double quotes (“), backslash (\), and 0x1A (Ctrl+Z):
mysql_real_escape_string($user);
For example, the preceding code would cause the string O’Boyle to be quoted to the string O\’Boyle. If stored to the database, it will be stored as O’Boyle but will not cause string termination issues while being manipulated while quoted. You should be careful when doing a string replacement in stored procedure code, however. Because the single quote needs to be quoted since it is a string terminator, you need to replace a single quote with two single quotes in stored procedure code via the slightly less straightforward replacement of one quote (represented by a quoted single quote) with a quoted single quote (represented by a quoted backslash and a quoted single quote) as follows:
SET @sql = REPLACE(@sql, ‘\’’, ‘\\\’’)
which may be more logical and clearer to represent as character codes:
SET @enc = REPLACE(@input, CHAR(39), CHAR(92, 39));
For other types of SQL functionality, it may also be necessary to quote information that is submitted in dynamic SQL, namely where using wildcards in a LIKE clause. Depending on the application logic in place, it may be possible for an attacker to subvert logic by supplying wildcards in the input that is later used in the LIKE clause. In MySQL, the wildcards that follow are valid in a LIKE clause:
● % Match zero or more of any characters
● _ Match exactly one of any character
To prevent a match on one of the characters shown here, we can escape the wildcard character with the backslash character (\). For example, in Java:
sql = sql.replace(“%”, “\%”);
sql = sql.replace(“_”, “\_”);
Most databases and application development platforms provide APIs for handling untrusted input in a secure way, which prevents SQL injection vulnerabilities from arising. In parameterized queries (also known as prepared statements), the construction of a SQL statement containing user input is performed in two steps:
● The application specifies the query’s structure, leaving placeholders for each item of user input.
● The application specifies the contents of each placeholder.
Crucially, there is no way in which crafted data that is specified at the second step can interfere with the structure of the query specified in the first step. Because the query structure has already been defined, the relevant API handles any type of placeholder data in a safe manner, so it is always interpreted as data rather than part of the statement’s structure.
Here is an example of a vulnerable piece of login page pseudocode using dynamic SQL:
Username = request(“username”)
Password = request(“password”)
Sql = “SELECT ∗ FROM users WHERE username=’” + Username + “‘ AND password=’”+
Password + “’”
Result = Db.Execute(Sql)
If (Result) /∗ successful login ∗/
Let’s see an example of how we can secure our web application in Java. Java provides the Java Database Connectivity (JDBC) framework (implemented in the java.sql and javax.sql namespaces) as a vendor-independent method of accessing databases. JDBC supports a rich variety of data access methods, including the ability to use parameterized statements through the PreparedStatement class.
Here we can see the earlier vulnerable example rewritten using a JDBC prepared statement. Note that when the parameters are added (through the use of the various set<type> functions, such as etString), the index position (starting at 1) of the placeholder question mark is specified:
Connection con = DriverManager.getConnection(connectionString);
String sql = “SELECT ∗ FROM users WHERE username=? AND password=?”;
PreparedStatement lookupUser = con.prepareStatement(sql);
// Add parameters to SQL query
lookupUser.setString(1, username); // add String to position 1
lookupUser.setString(2, password); // add String to position 2
rs = lookupUser.executeQuery();
In addition to the JDBC framework that is provided with Java, additional packages are often used to access databases efficiently within J2EE applications. A commonly used persistence framework for accessing databases is Hibernate.
Although it is possible to utilize native SQL functionality, as well as the JDBC functionality shown earlier, Hibernate also provides its own functionality for binding variables to a parameterized statement. Methods are provided on the Query object to use either named parameters (specified using a colon; e.g. :parameter) or the JDBC-style question mark placeholder “?)”.
In the following example we can see the use of Hibernate with named parameters:
String sql = “SELECT ∗ FROM users WHERE username=:username AND”
+“password=:password”;
Query lookupUser = session.createQuery(sql);
// Add parameters to SQL query
lookupUser.setString(“username”, username); // add username
lookupUser.setString(“password”, password); // add password
List rs = lookupUser.list();
The final example shows us the use of Hibernate with JDBC-style question mark placeholders for the parameters.Note that Hibernate indexes parameters from 0, and not 1, as does JDBC. Therefore, the first parameter in the list will be 0 and the second will be 1:
String sql = “SELECT ∗ FROM users WHERE username=? AND password=?”;
Query lookupUser = session.createQuery(sql);
// Add parameters to SQL query
lookupUser.setString(0, username); // add username
lookupUser.setString(1, password); // add password
List rs = lookupUser.list();
If this method becomes an effective solution against SQL injection, we need to keep in mind several important provisos:
● It should be used for every database queries. First, by focusing only on input that has been immediately received from the user, it is easy to overlook second-order attacks, because data that has already been processed is assumed to be trusted. Second, it is easy to make mistakes about the specific cases in which the data being handled is user-controllable. In a large application, different items of data are held within the session or received from the client. Assumptions made by one developer may not be communicated to others. The handling of specific data items may change in the future, introducing a SQL injection flaw into previously safe queries. It is much safer to take the approach of mandating the use of parameterized queries throughout the application.
● Every item of data inserted into the query should be properly parameterized. Many times, most of a query’s parameters are handled safely, but one or two items are concatenated directly into the string used to specify the query structure. The use of parameterized queries will not prevent SQL injection if some parameters are handled in this way.
● Parameter placeholders cannot be used to specify the table and column names used in the query. In some rare cases, applications need to specify these items within a SQL query on the basis of user-supplied data. In this situation, the best approach is to use a whitelist of known good values (the list of tables and columns actually used within the database) and to reject any input that does not match an item on this list. Failing this,strict validation should be enforced on the user input — for example, allowing only alphanumeric characters, excluding whitespace, and enforcing a suitable length limit.
● Parameter placeholders cannot be used for any other parts of the query, such as the ASC or DESC keywords that appear within an ORDER BY clause, or any other SQL keyword, since these form part of the query structure. As with table and column names, if it is necessary for these items to be specified based on user-supplied data, rigorous whitelist validation should be applied to prevent attacks.
Securing your DataStore (Least Privileges, Canonicalization, Handling Sensitive Data)
The application should use the lowest possible level of privileges when accessing the database. In general, the application does not need DBAlevel permissions. It usually only needs to read and write its own data. In security-critical situations, the application may employ a different database account for performing different actions. For example, if 90 percent of its database queries require only read access, these can be performed using an account that does not have write privileges. If a particular query needs to read only a subset of data (for example, the orders table but not the user accounts table), an account with the corresponding level of access can be used. If this approach is enforced
throughout the application, any residual SQL injection flaws that may exist are likely to have their impact significantly reduced.
If we adopt a policy where we use stored procedures everywhere, and don’t allow application accounts to directly execute their own queries, then we can restrict those accounts to only be able to execute the stored procedures they need. Don’t grant them any rights directly to the tables in the database.
Continuing, SQL injection is not the only threat to your database data. Attackers can simply change the parameter values from one of the legal values they are presented with, to a value that is unauthorized for them, but the application itself might be authorized to access. As such, minimizing the privileges granted to our application will reduce the likelihood of such unauthorized access attempts, even when an attacker is not trying to use SQL injection as part of their exploit.
We should also minimize the privileges of the operating system account that the DBMS runs under, and never run our DBMS as root of our system. Most DBMSs run out of the box with a very powerful system account. For example, MySQL runs as system on Windows by default. Change the DBMS's OS account to something more appropriate, with restricted privileges. Finally, some good tricks to implement on your datastore, are:
● Multiple DB Users
The designer of web applications should not only avoid using the same owner/admin account in the web applications to connect to the database, different DB users could be used for different web applications. In general, each separate web application that requires access to the database could have a designated database user account that the web-app will use to connect to the DB. That way, the designer of the application can have good granularity in the access control, thus reducing the privileges as much as possible. Each DB user will then have select access to what it needs only, and write-access as needed. As an example, a login page requires read access to the username and password fields of a table, but no write access of any form (no insert, update, or delete). However, the sign-up page certainly requires insert privilege to that table; this restriction can only be enforced if these web apps use different DB users to connect to the database.
● Views
SQL views can further increase the granularity of access by limiting the read access to specific fields of a table or joins of tables. It could potentially have additional benefits: for example, suppose that the system is required (perhaps due to some specific legal requirements) to store the passwords of the users, instead of salted-hashed passwords. The designer could use views to compensate for this limitation; revoke all access to the table (from all DB users except the owner/admin) and create a view that outputs the hash of the password field and not the field itself. Any SQL injection attack that succeeds in stealing DB information will be restricted to stealing the
hash of the passwords (could even be a keyed hash), since no DB user for any of the web applications has access to the table itself.
A difficulty with input validation and output encoding is ensuring that the data being evaluated or transformed is in the format that will be interpreted as intended by the end user of that input. A common technique for evading input validation and output encoding controls is to encode the input before it is sent to the application in such a way that it is then decoded and interpreted to suit the attacker’s aims, as we saw in the encoding chapter of module 3. Let’s remember some alternative ways to encode the single quote character:
● %27 URL encoding
● %2527 Double URL encoding
● %%317 Nested double URL encoding
● %u0027 Unicode representation
● %u02b9 Unicode representation
● %ca%b9 Unicode representation
● ' HTML entity
● ' Decimal HTML entity
● ' Hexadecimal HTML entity
● %26apos; Mixed URL/HTML encoding
In some cases, these are alternative encodings of the character (%27 is the URL-encoded representation of the single quote), and in other cases these are double-encoded on the assumption that the data will be explicitly decoded by the application (%2527 when URLdecoded will be %27, as will %%317) or are various Unicode representations, either valid or invalid. Not all of these representations will be interpreted as a single quote normally; in most cases, they will rely on certain conditions being in place (such as decoding at the application, application server, WAF, or Web server
level), and therefore it will be very difficult to predict whether your application will interpret them this way.
For these reasons, it is important to consider canonicalization as part of your input validation approach. Canonicalization is the process of reducing input to a standard or simple form. For the single-quote examples, this would normally be a single-quote character (‘).
One method, which is often the easiest to implement, is to reject all input that is not already in a canonical format.For example, we can reject all HTML- and URL-encoded input from being accepted by the application. This is one of the most reliable methods in situations where we are not expecting encoded input. This is also the approach that is often adopted by default when we do whitelist input validation, as we may not accept unusual forms of characters when validating for known good input. At the very least, this could involve not accepting the characters used to encode data (such as %, &, and #), and therefore not allowing these characters to be input.
If rejecting input that can contain encoded forms is not possible, we need to look at ways to decode or otherwise make safe the input that we receive. This may include several decoding steps, such as URL decoding and HTML decoding, potentially repeated several times. This approach can be error-prone, however, as we will need to perform a check after each decoding step to determine whether the input still contains encoded data.
Another approach may be to decode the input once, and then reject the data if it still contains encoded characters.This approach assumes that genuine input will not contain double-encoded values, which should be a valid assumption in most cases.As we saw, a way of encoding is with Unicode, and one approach is normalization of the input. This converts the Unicode input into its simplest form, following a defined set of rules. Unicode normalization differs from canonicalization in that there may be multiple normal forms of a Unicode character according to which set of rules is followed.
Continuing, the normalization process will decompose the Unicode character into its representative components, and then reassemble the character in its simplest form. In most cases, it will transform double-width and other Unicode encodings into their ASCII equivalents, where they exist. We can normalize input in Java with the Normalizer class (since Java 6) as follows:
normalized = Normalizer.normalize(input, Normalizer.Form.NFKC);
We can also normalize input in PHP with the PEAR::I18N_UnicodeNormalizer package from the PEAR repository, as follows:
$normalized = I18N_UnicodeNormalizer::toNFKC($input, ‘UTF-8’);
Another approach is to first check that the Unicode is valid (and is not an invalid representation), and then to convert the data into a predictable format (for example, a Western European character set such as ISO-8859-1). The input would then be used in that format within the application from that point on. This is a deliberately lossy approach, as Unicode characters that cannot be represented in the character set converted to will normally be lost. However, for the purposes of making input validation decisions, it can be useful in situations where the application is not localized into languages outside Western Europe. Next, we check for Unicode validity for UTF-8 encoded Unicode by applying the
set of regular expressions:
● [x00-\x7F] - ASCII
● [\xC2-\xDF][\x80-\xBF] - Two-byte representation
● \xE0[\xA0-\xBF][\x80-\xBF] - Two-byte representation
● [\xE1-\xEC\xEE\xEF][\x80-\xBF]{2} - Three-byte representation
● \xED[\x80-\x9F][\x80-\xBF] - Three-byte representation
● \xF0[\x90-\xBF][\x80-\xBF]{2} - Planes 1–3
● [\xF1-\xF3][\x80-\xBF]{3} - Planes 4–15
● \xF4[\x80-\x8F][\x80-\xBF]{2} - Plane 16
If the input matches any of these conditions, it should be a valid UTF-8 encoding. If it doesn’t match, the input is not a valid UTF-8 encoding and should be rejected. Now that we have checked that the input is validly formed, we can convert it to a predictable format—for example, converting a Unicode UTF-8 string to another character set, such as ISO-8859-1.
In Java, we can use the CharsetEncoder class, or the simpler string method getBytes() (Java 6 and later) as follows:
string ascii = utf8.getBytes(“ISO-8859-1”);
Also, in PHP, we can do this with utf8_decode as follows:
$ascii = utf8_decode($utf8string);
A final technique for mitigating the seriousness of SQL injection is to consider the storage and access of sensitive information within the database. One of the goals of an attacker is to gain access to the data that is held within the database—often because that data will have some form of monetary value. Examples of the types of information an attacker may be interested in obtaining may include usernames and passwords, personal information, or financial information, such as credit card details. Because of this, it is worth considering additional controls over sensitive information. Some example controls or design decisions to consider might be the following:
● Passwords: Where possible, you should not store users’ passwords within the database. A more secure alternative is to store a salted one-way hash (using a secure hash algorithm such as SHA256) of each user’s password instead of the password itself. The salt, which is an additional small piece of random data, should then ideally be stored separately from the password hash. In this case, instead of comparing a user’s password to the one in the database during the login process, you would compare the salted hash calculated from the details supplied by the user to the value stored in the database. Note that this will prevent the application from being able to e- mail the user his existing password when he forgets it; in this case, it would be necessary to generate a new, secure password for the user and provide that to him instead.
● Credit card and other financial information: You should store details such as credit cards encrypted with an approved (i.e. FIPS-certified) encryption algorithm. This is a requirement of the Payment Card Industry Data Security Standards (PCI-DSS) for credit card information. However, you should also consider encrypting other financial information that may be in the application, such as bank account details. The encryption key should not be stored in the database.
● Archiving: Where an application is not required to maintain a full history of all of the sensitive information that is submitted to it (e.g. personally identifiable information), you should consider archiving or removing the unneeded information after a reasonable period of time. Where the application does not require this information after initial processing, you should archive or remove unneeded information immediately. In this case, removing information where the exposure would be a major privacy breach may reduce the impact of any future security breach by reducing the amount of customer information to which an attacker can gain access.
Securing LDAP, XPath and NoSQL
In this course, we didn’t only examine SQL Injection but some other technologies that you may use and want to secure. Many times the idea behind SQL Injection security can be implemented in them too, but let’s see some useful info about them:
In the case that it is necessary to insert user-supplied input into an LDAP query, this operation should be performed only on simple items of data that can be subjected to strict input validation. The user input should be checked against a white list of acceptable characters, which should ideally include only alphanumeric characters. Characters that may be used to interfere with the LDAP query should be blocked, including ( ) ; , * | & = and the null byte. Any input that does not match the white list should be rejected, not sanitized.
XPath Injection security is essentially similar to SQL injection security. The application must sanitize user input. Specifically, the single and double quote characters should be disallowed. This can be done either in the application itself, or in a third party product (e.g. application firewall.) Testing application susceptibility to XPath Injection can be easily performed by injecting a single quote or a double quote, and inspecting the response.
The user input should be checked against a white list of acceptable characters, which should ideally include only alphanumeric characters. Characters that may be used to interfere with the XPath query should be blocked, including ( ) = ‘ [ ] : , * / and all whitespace. Any input that does not match the white list should be rejected, not sanitized.
As we said earlier in this module, NoSQL data stores are basically vulnerable to the same security risks as traditional RDBMS data stores, so the usual best practices for storing sensitive data should be applied when developing a NoSQL-based application. These include:
● Encrypting sensitive database fields;
● Keeping unencrypted values in a sandboxed environment;
● Using sufficient input validation;
● Applying strong user authentication policies.
Of course, it would be ideal if there were an accepted standard for authentication, authorization and encryption in the yet-to-mature NoSQL space. Until such a standardized consensus can be reached, the best approach is to look at security in the middleware layer, rather than on the cluster level, as most middleware software comes with ready-made support for authentication, authorization and access control.
Continuing from module 3, and the analysis of other Injection methods, let’s examine NoSQL and how to perform injection attacks to it. The term NoSQL is used to refer to various data stores that break from standard relational database architectures. NoSQL data stores represent data using key/value mappings and do not rely on a fixed schema such as a conventional database table. Keys and values can be arbitrarily defined, and the format of the value generally is not relevant to the data store. A further feature of key/value storage is that a value may be a data structure itself, allowing hierarchical storage, unlike the flat data structure inside a database schema.
NoSQL advocates claim this has several advantages, mainly in handling very large data sets, where the data store’s hierarchical structure can be optimized exactly as required to reduce the overhead in retrieving data sets. In these instances, a conventional database may require complex cross-referencing of tables to retrieve information on behalf of an application. From a web application security perspective, the key consideration is how the application queries data, because this determines what forms of injection are possible. In the case of SQL injection, the SQL language is
broadly similar across different database products. NoSQL, by contrast, is a name given to a disparate range of data stores, all with their own behaviors. They don’t all use a single query language. Here are some of the common query methods used by NoSQL data stores:
● Key/value lookup
● XPath
● Programming languages such as JavaScript
Let’s now recall a simple SQL query used from a login form:
SELECT * FROM users WHERE username = '$username' AND password = '$password'
The SQL statement will be written like this in MongoDB:
db.users.find({username: username, password: password});
MongoDB is a NoSQL database program, and as we can see here, we no longer deal with a query language in the form of a string, as SQL is, therefore one would think that injection is no longer possible. But this is not the case, of course. If we assume here that the username field, or parameter, is coming from a deserialized JSON object, manipulation of the above query is not only possible but inevitable. Such as, if we supply a JSON document as the input to the application, an attacker will be able to perform the exact same login bypass that was before possible only with SQL injection:
{
"username": {"$gt": ""},
"password": {"$gt": ""}
}
The actual vulnerable handler of the request will look like this:
app.post('/', function (req, res) {
db.users.find({username: req.body.username, password: req.body.password},
function (err, users) {
});
});
Here, the username and password fields are not validated to ensure that they are strings. Therefore, when the JSON document is deserialized, those fields may contain anything but strings that can be used to manipulate the structure of the query. In MongoDB, the field $gt has a special meaning, which is used as the greater than comparator. As such, the username and the password from the database will be compared to the empty string "" and as a result return a positive outcome, i.e. a true statement. The HTTP request to exploit this vulnerability will look more or less like the one below.
POST http://target/ HTTP/1.1
Content-Type: application/json
{
"username": {"$gt": ""},
"password": {"$gt": ""}
}
Let’s see another example:
POST http://target/ HTTP/1.1
Content-Type: application/x-www-form-urlencoded
username[$gt]=&password[$gt]=
It is not unusual to see JSON documents as we saw in the previous example, but they are not as widespread as
url-encoded key-value pairs, simply known as urlencoding. Now, in this example, the string username[$gt]= is a
special syntax used by the qs module (default in ExpressJS ). This syntax is the equivalent of making a JavaScript
object/hash with a single parameter called $gt mapped to no value. In essence, the request above will result into a
JavaScript object like this:
{
"username": {"$gt": undefined},
"password": {"$gt": undefined}
}
78
Securing your DataStore (Input Validation, Output Encoding, Parameterized Queries)
There are many ways to secure our database. We have talked about some of them in this course but never analyzed them. Securing our datastore is really important, because many of users tend to only partially secure their database, which may cause it to still be vulnerable. Let’s examine some of the best ways.
Input Validation
Whitelist validation is the practice of only accepting input that is known to be good. This can involve validating compliance with the expected known values, type, length or size, numeric range, or other format standards before accepting the input for further processing. For example, validating that an input value is a credit card number may involve validating that the input value contains only numbers, is between 13 and 16 digits long, and passes the business logic check of correctly passing the Luhn formula (the formula for calculating the validity of a number based on the last “check” digit of the card). When using whitelist validation we should consider the following points:
● Known value
● Data type
● Data size
● Data range
● Data content
A common method of implementing content validation is to use regular expressions. Let’s see an example of a regular expression for validating a US ZIP Code contained in a string:
^\d{5}(-\d{4})?$
In this case, the regular expression matches both five-digit and five-digit + four-digit ZIP Codes as follows:
● ^\d{5} Match exactly five numeric digits at the start of the string.
● (–\d{4})? Match the dash character plus exactly four digits either once (present) or not at all (not present).
● $ This would appear at the end of the string. If there is additional content at the end of the string, the regular expression will not match.
In general, whitelist validation is the more powerful of the two input validation approaches. It can, however, be difficult to implement in scenarios where there is complex input, or where the full set of possible inputs cannot be easily determined. Difficult examples may include applications that are localized in languages with large character sets (e.g. Unicode character sets such as the various Chinese and Japanese character sets).
To continue, blacklisting is the practice of only rejecting input that is known to be bad. This commonly involves rejecting input that contains content that is specifically known to be malicious by looking through the content for a number of “known bad” characters, strings, or patterns. This approach is generally weaker than whitelist validation because the list of potentially bad characters is extremely large, and as such, any list of bad content is likely to be large, slow to run through, incomplete, and difficult to keep up to date.
A common method of implementing a blacklist is also to use regular expressions, with a list of characters or strings to disallow, such as the following example:
‘|%|--|;|/\∗|\\\∗|_|\[|@|xp_
In general, we should not use blacklisting in isolation, and we should use whitelisting if possible. However, in scenarios where we cannot use whitelisting, blacklisting can still provide a useful partial control. In these scenarios, however, it is recommended that we use blacklisting in conjunction with output encoding to ensure that input passed elsewhere (e.g. to the database) is subject to an additional check to ensure that it is correctly handled to prevent SQL injection.
For an example, in PHP it is usual to escape parameters (blacklist them) using the function
mysqli_real_escape_string(); before sending the SQL query:
$mysqli = new mysqli('hostname', 'db_username', 'db_password', 'db_name');
$query = sprintf("SELECT * FROM `Users` WHERE UserName='%s' AND Password='%s'",
$mysqli->real_escape_string($username),
$mysqli->real_escape_string($password));
$mysqli->query($query);
This function prepends backslashes to the following characters: \x00, \n, \r, \, ', " and \x1a. This function is normally used to make data safe before sending a query to MySQL.
There are other functions for many database types in PHP such as pg_escape_string() for PostgreSQL. The function addslashes(string $str) works for escaping characters, and is used especially for querying on databases that do not have escaping functions in PHP. It returns a string with backslashes before characters that need to be quoted in database queries, etc. These characters are single quote ('), double quote ("), backslash (\) and NUL (the NULL byte), as we said earlier.
Routinely passing escaped strings to SQL is error prone because it is easy to forget to escape a given string. Creating a transparent layer to secure the input can reduce this error-proneness, if not entirely eliminate it.
Output Encoding
Even in situations where whitelist input validation is used, sometimes content may not be safe to send to the database, especially if it is to be used in dynamic SQL. For example, a last name such as O’Boyle is valid, and should be allowed through whitelist input validation. This name, however, could cause significant problems in situations where this input is used to dynamically generate a SQL query, such as the following:
String sql = “INSERT INTO names VALUES (‘“ + fname + ”‘,‘“ + lname + ”’);”
Additionally, malicious input into the first name field, such as:
’,’’); DROP TABLE names--
can be used to alter the SQL executed to the following:
INSERT INTO names VALUES (’’,’’); DROP TABLE names--’,’’);
Here, besides other methods, it will be necessary to encode (or quote) the data sent to the database. This approach has a limitation, in that it is necessary to encode values every time they are used in a database query; if one encode is missed, the application may well be vulnerable to SQL injection. Let’s now see an example of encoding in MySQL, where our practical examples are based.
MySQL Server uses the single quote as a terminator for a string literal, so it is necessary to encode the single quote when it is included in strings that will be included within dynamic SQL. In MySQL, we can do this either by replacing the single quote with two single quotes as with other database systems, or by quoting the single quote with a backslash (\). Either of these will cause the single quote to be treated as a part of the string literal, and not as a string terminator, effectively preventing an attacker from being able to exploit SQL injection on that particular query. We can do this in Java via code that is similar to the following:
sql = sql.replace(“’”, “\’”);
Additionally, PHP provides the mysql_real_escape() function, which will automatically quote the single quote with a backslash, as well as quoting other potentially harmful characters such as 0x00 (NULL), newline (\n), carriage return (\r), double quotes (“), backslash (\), and 0x1A (Ctrl+Z):
mysql_real_escape_string($user);
For example, the preceding code would cause the string O’Boyle to be quoted to the string O\’Boyle. If stored to the database, it will be stored as O’Boyle but will not cause string termination issues while being manipulated while quoted. You should be careful when doing a string replacement in stored procedure code, however. Because the single quote needs to be quoted since it is a string terminator, you need to replace a single quote with two single quotes in stored procedure code via the slightly less straightforward replacement of one quote (represented by a quoted single quote) with a quoted single quote (represented by a quoted backslash and a quoted single quote) as follows:
SET @sql = REPLACE(@sql, ‘\’’, ‘\\\’’)
which may be more logical and clearer to represent as character codes:
SET @enc = REPLACE(@input, CHAR(39), CHAR(92, 39));
For other types of SQL functionality, it may also be necessary to quote information that is submitted in dynamic SQL, namely where using wildcards in a LIKE clause. Depending on the application logic in place, it may be possible for an attacker to subvert logic by supplying wildcards in the input that is later used in the LIKE clause. In MySQL, the wildcards that follow are valid in a LIKE clause:
● % Match zero or more of any characters
● _ Match exactly one of any character
To prevent a match on one of the characters shown here, we can escape the wildcard character with the backslash character (\). For example, in Java:
sql = sql.replace(“%”, “\%”);
sql = sql.replace(“_”, “\_”);
Parameterized Queries
● The application specifies the query’s structure, leaving placeholders for each item of user input.
● The application specifies the contents of each placeholder.
Crucially, there is no way in which crafted data that is specified at the second step can interfere with the structure of the query specified in the first step. Because the query structure has already been defined, the relevant API handles any type of placeholder data in a safe manner, so it is always interpreted as data rather than part of the statement’s structure.
Here is an example of a vulnerable piece of login page pseudocode using dynamic SQL:
Username = request(“username”)
Password = request(“password”)
Sql = “SELECT ∗ FROM users WHERE username=’” + Username + “‘ AND password=’”+
Password + “’”
Result = Db.Execute(Sql)
If (Result) /∗ successful login ∗/
Let’s see an example of how we can secure our web application in Java. Java provides the Java Database Connectivity (JDBC) framework (implemented in the java.sql and javax.sql namespaces) as a vendor-independent method of accessing databases. JDBC supports a rich variety of data access methods, including the ability to use parameterized statements through the PreparedStatement class.
Here we can see the earlier vulnerable example rewritten using a JDBC prepared statement. Note that when the parameters are added (through the use of the various set<type> functions, such as etString), the index position (starting at 1) of the placeholder question mark is specified:
Connection con = DriverManager.getConnection(connectionString);
String sql = “SELECT ∗ FROM users WHERE username=? AND password=?”;
PreparedStatement lookupUser = con.prepareStatement(sql);
// Add parameters to SQL query
lookupUser.setString(1, username); // add String to position 1
lookupUser.setString(2, password); // add String to position 2
rs = lookupUser.executeQuery();
In addition to the JDBC framework that is provided with Java, additional packages are often used to access databases efficiently within J2EE applications. A commonly used persistence framework for accessing databases is Hibernate.
Although it is possible to utilize native SQL functionality, as well as the JDBC functionality shown earlier, Hibernate also provides its own functionality for binding variables to a parameterized statement. Methods are provided on the Query object to use either named parameters (specified using a colon; e.g. :parameter) or the JDBC-style question mark placeholder “?)”.
In the following example we can see the use of Hibernate with named parameters:
String sql = “SELECT ∗ FROM users WHERE username=:username AND”
+“password=:password”;
Query lookupUser = session.createQuery(sql);
// Add parameters to SQL query
lookupUser.setString(“username”, username); // add username
lookupUser.setString(“password”, password); // add password
List rs = lookupUser.list();
The final example shows us the use of Hibernate with JDBC-style question mark placeholders for the parameters.Note that Hibernate indexes parameters from 0, and not 1, as does JDBC. Therefore, the first parameter in the list will be 0 and the second will be 1:
String sql = “SELECT ∗ FROM users WHERE username=? AND password=?”;
Query lookupUser = session.createQuery(sql);
// Add parameters to SQL query
lookupUser.setString(0, username); // add username
lookupUser.setString(1, password); // add password
List rs = lookupUser.list();
If this method becomes an effective solution against SQL injection, we need to keep in mind several important provisos:
● It should be used for every database queries. First, by focusing only on input that has been immediately received from the user, it is easy to overlook second-order attacks, because data that has already been processed is assumed to be trusted. Second, it is easy to make mistakes about the specific cases in which the data being handled is user-controllable. In a large application, different items of data are held within the session or received from the client. Assumptions made by one developer may not be communicated to others. The handling of specific data items may change in the future, introducing a SQL injection flaw into previously safe queries. It is much safer to take the approach of mandating the use of parameterized queries throughout the application.
● Every item of data inserted into the query should be properly parameterized. Many times, most of a query’s parameters are handled safely, but one or two items are concatenated directly into the string used to specify the query structure. The use of parameterized queries will not prevent SQL injection if some parameters are handled in this way.
● Parameter placeholders cannot be used to specify the table and column names used in the query. In some rare cases, applications need to specify these items within a SQL query on the basis of user-supplied data. In this situation, the best approach is to use a whitelist of known good values (the list of tables and columns actually used within the database) and to reject any input that does not match an item on this list. Failing this,strict validation should be enforced on the user input — for example, allowing only alphanumeric characters, excluding whitespace, and enforcing a suitable length limit.
● Parameter placeholders cannot be used for any other parts of the query, such as the ASC or DESC keywords that appear within an ORDER BY clause, or any other SQL keyword, since these form part of the query structure. As with table and column names, if it is necessary for these items to be specified based on user-supplied data, rigorous whitelist validation should be applied to prevent attacks.
Securing your DataStore (Least Privileges, Canonicalization, Handling Sensitive Data)
Least Privileges
throughout the application, any residual SQL injection flaws that may exist are likely to have their impact significantly reduced.
If we adopt a policy where we use stored procedures everywhere, and don’t allow application accounts to directly execute their own queries, then we can restrict those accounts to only be able to execute the stored procedures they need. Don’t grant them any rights directly to the tables in the database.
Continuing, SQL injection is not the only threat to your database data. Attackers can simply change the parameter values from one of the legal values they are presented with, to a value that is unauthorized for them, but the application itself might be authorized to access. As such, minimizing the privileges granted to our application will reduce the likelihood of such unauthorized access attempts, even when an attacker is not trying to use SQL injection as part of their exploit.
We should also minimize the privileges of the operating system account that the DBMS runs under, and never run our DBMS as root of our system. Most DBMSs run out of the box with a very powerful system account. For example, MySQL runs as system on Windows by default. Change the DBMS's OS account to something more appropriate, with restricted privileges. Finally, some good tricks to implement on your datastore, are:
● Multiple DB Users
The designer of web applications should not only avoid using the same owner/admin account in the web applications to connect to the database, different DB users could be used for different web applications. In general, each separate web application that requires access to the database could have a designated database user account that the web-app will use to connect to the DB. That way, the designer of the application can have good granularity in the access control, thus reducing the privileges as much as possible. Each DB user will then have select access to what it needs only, and write-access as needed. As an example, a login page requires read access to the username and password fields of a table, but no write access of any form (no insert, update, or delete). However, the sign-up page certainly requires insert privilege to that table; this restriction can only be enforced if these web apps use different DB users to connect to the database.
● Views
SQL views can further increase the granularity of access by limiting the read access to specific fields of a table or joins of tables. It could potentially have additional benefits: for example, suppose that the system is required (perhaps due to some specific legal requirements) to store the passwords of the users, instead of salted-hashed passwords. The designer could use views to compensate for this limitation; revoke all access to the table (from all DB users except the owner/admin) and create a view that outputs the hash of the password field and not the field itself. Any SQL injection attack that succeeds in stealing DB information will be restricted to stealing the
hash of the passwords (could even be a keyed hash), since no DB user for any of the web applications has access to the table itself.
Canonicalization
● %27 URL encoding
● %2527 Double URL encoding
● %%317 Nested double URL encoding
● %u0027 Unicode representation
● %u02b9 Unicode representation
● %ca%b9 Unicode representation
● ' HTML entity
● ' Decimal HTML entity
● ' Hexadecimal HTML entity
● %26apos; Mixed URL/HTML encoding
In some cases, these are alternative encodings of the character (%27 is the URL-encoded representation of the single quote), and in other cases these are double-encoded on the assumption that the data will be explicitly decoded by the application (%2527 when URLdecoded will be %27, as will %%317) or are various Unicode representations, either valid or invalid. Not all of these representations will be interpreted as a single quote normally; in most cases, they will rely on certain conditions being in place (such as decoding at the application, application server, WAF, or Web server
level), and therefore it will be very difficult to predict whether your application will interpret them this way.
For these reasons, it is important to consider canonicalization as part of your input validation approach. Canonicalization is the process of reducing input to a standard or simple form. For the single-quote examples, this would normally be a single-quote character (‘).
One method, which is often the easiest to implement, is to reject all input that is not already in a canonical format.For example, we can reject all HTML- and URL-encoded input from being accepted by the application. This is one of the most reliable methods in situations where we are not expecting encoded input. This is also the approach that is often adopted by default when we do whitelist input validation, as we may not accept unusual forms of characters when validating for known good input. At the very least, this could involve not accepting the characters used to encode data (such as %, &, and #), and therefore not allowing these characters to be input.
If rejecting input that can contain encoded forms is not possible, we need to look at ways to decode or otherwise make safe the input that we receive. This may include several decoding steps, such as URL decoding and HTML decoding, potentially repeated several times. This approach can be error-prone, however, as we will need to perform a check after each decoding step to determine whether the input still contains encoded data.
Another approach may be to decode the input once, and then reject the data if it still contains encoded characters.This approach assumes that genuine input will not contain double-encoded values, which should be a valid assumption in most cases.As we saw, a way of encoding is with Unicode, and one approach is normalization of the input. This converts the Unicode input into its simplest form, following a defined set of rules. Unicode normalization differs from canonicalization in that there may be multiple normal forms of a Unicode character according to which set of rules is followed.
Continuing, the normalization process will decompose the Unicode character into its representative components, and then reassemble the character in its simplest form. In most cases, it will transform double-width and other Unicode encodings into their ASCII equivalents, where they exist. We can normalize input in Java with the Normalizer class (since Java 6) as follows:
normalized = Normalizer.normalize(input, Normalizer.Form.NFKC);
We can also normalize input in PHP with the PEAR::I18N_UnicodeNormalizer package from the PEAR repository, as follows:
$normalized = I18N_UnicodeNormalizer::toNFKC($input, ‘UTF-8’);
Another approach is to first check that the Unicode is valid (and is not an invalid representation), and then to convert the data into a predictable format (for example, a Western European character set such as ISO-8859-1). The input would then be used in that format within the application from that point on. This is a deliberately lossy approach, as Unicode characters that cannot be represented in the character set converted to will normally be lost. However, for the purposes of making input validation decisions, it can be useful in situations where the application is not localized into languages outside Western Europe. Next, we check for Unicode validity for UTF-8 encoded Unicode by applying the
set of regular expressions:
● [x00-\x7F] - ASCII
● [\xC2-\xDF][\x80-\xBF] - Two-byte representation
● \xE0[\xA0-\xBF][\x80-\xBF] - Two-byte representation
● [\xE1-\xEC\xEE\xEF][\x80-\xBF]{2} - Three-byte representation
● \xED[\x80-\x9F][\x80-\xBF] - Three-byte representation
● \xF0[\x90-\xBF][\x80-\xBF]{2} - Planes 1–3
● [\xF1-\xF3][\x80-\xBF]{3} - Planes 4–15
● \xF4[\x80-\x8F][\x80-\xBF]{2} - Plane 16
If the input matches any of these conditions, it should be a valid UTF-8 encoding. If it doesn’t match, the input is not a valid UTF-8 encoding and should be rejected. Now that we have checked that the input is validly formed, we can convert it to a predictable format—for example, converting a Unicode UTF-8 string to another character set, such as ISO-8859-1.
In Java, we can use the CharsetEncoder class, or the simpler string method getBytes() (Java 6 and later) as follows:
string ascii = utf8.getBytes(“ISO-8859-1”);
Also, in PHP, we can do this with utf8_decode as follows:
$ascii = utf8_decode($utf8string);
Handling Sensitive Data
● Passwords: Where possible, you should not store users’ passwords within the database. A more secure alternative is to store a salted one-way hash (using a secure hash algorithm such as SHA256) of each user’s password instead of the password itself. The salt, which is an additional small piece of random data, should then ideally be stored separately from the password hash. In this case, instead of comparing a user’s password to the one in the database during the login process, you would compare the salted hash calculated from the details supplied by the user to the value stored in the database. Note that this will prevent the application from being able to e- mail the user his existing password when he forgets it; in this case, it would be necessary to generate a new, secure password for the user and provide that to him instead.
● Credit card and other financial information: You should store details such as credit cards encrypted with an approved (i.e. FIPS-certified) encryption algorithm. This is a requirement of the Payment Card Industry Data Security Standards (PCI-DSS) for credit card information. However, you should also consider encrypting other financial information that may be in the application, such as bank account details. The encryption key should not be stored in the database.
● Archiving: Where an application is not required to maintain a full history of all of the sensitive information that is submitted to it (e.g. personally identifiable information), you should consider archiving or removing the unneeded information after a reasonable period of time. Where the application does not require this information after initial processing, you should archive or remove unneeded information immediately. In this case, removing information where the exposure would be a major privacy breach may reduce the impact of any future security breach by reducing the amount of customer information to which an attacker can gain access.
Securing LDAP, XPath and NoSQL
In this course, we didn’t only examine SQL Injection but some other technologies that you may use and want to secure. Many times the idea behind SQL Injection security can be implemented in them too, but let’s see some useful info about them:
LDAP Injection
XPath Injection
The user input should be checked against a white list of acceptable characters, which should ideally include only alphanumeric characters. Characters that may be used to interfere with the XPath query should be blocked, including ( ) = ‘ [ ] : , * / and all whitespace. Any input that does not match the white list should be rejected, not sanitized.
NoSQL
● Encrypting sensitive database fields;
● Keeping unencrypted values in a sandboxed environment;
● Using sufficient input validation;
● Applying strong user authentication policies.
Of course, it would be ideal if there were an accepted standard for authentication, authorization and encryption in the yet-to-mature NoSQL space. Until such a standardized consensus can be reached, the best approach is to look at security in the middleware layer, rather than on the cluster level, as most middleware software comes with ready-made support for authentication, authorization and access control.