Kinda LIKE SQL Injection

TLDR: This post is about some late 90’s level hacking. But the fact is, that there just doesn’t exist a decent explanation of this vulnerability anywhere on the internet.. and yesterday, in 2018, I found another application vulnerable to it (to quite serious effect). I’m afraid that was the straw that broke the camel’s back. So now we’re doing this… we’re making the blog-post that should have been made 20 years ago. There is a simple zipped-up MySQL/PHP lab at the bottom of this post, feel free to skip to that if you are so inclined.

Let’s talk about a SQL feature that’s been around for longer than I’ve been breathing… the ‘LIKE‘ operator. And how, on a good day, it lets an attackers fingerprint an application database and, on a really good day, enumerate content from it.

The issue exists when passing user-supplied data into a SQL ‘LIKE’ operator. e.g:

Select * from table where column LIKE '{user input}'

For simplicity, lets assume we are using MySQL and PHP here.

Now, you’ve sanitised your user’s input through mysql_real_escape_string(), and you are using prepared statements for all your site’s queries… that’s good. But did you know, that even while doing both those things, the metacharacters for SQL’s LIKE operator ( _ and % ) still get treated specially? Let’s remind you what those do:

Ergo…

Select * from users where username LIKE 'daniel'

is the same as:

Select * from users where username LIKE 'd_niel'

is the same as:

Select * from users where username LIKE 'd_n%_'

is the same as:

Select * from users where username LIKE '%'.

So, you have identified an application where you are injecting into a LIKE query… now what? What can we do?

Fingerprint the DB

MySQL isn’t the only database to use LIKE metacharacters, MS Access uses a (?) instead of the underscore, and even NoSQL databases have equivalent functions to the LIKE operator. MongoDB’s ‘FIND’ actually uses full PCRE. In any case, the take-home is that search fields are often an interesting place for enumerating a database stack. A script which works through a decision-tree of accepted metacharacters could fingerprint common databases easily, and is left as an exercise to the reader.

A lot of web applications officially expose the use of wildcards in their searches to users, for example with ‘?’ or ‘*’ characters. These characters are often directly substituted for the DB’s native ones behind the scenes, so you can often still fingerprint the DB from the search’s behaviour, even with these odd characters.

See potentially interesting result-sets

  • You may be able to see all results in one hit (using ‘%’ or an empty string). Silly as it is, even in 2018 this is usually more taxing on databases/applications than the developers had margined for, and might cause a Denial of Service.. so be careful.
  • You may be able to find the shortest and longest (in string length) content by incrementing ‘_’ characters until the first and last result-sets are found.

Often, in queries where you are searching through a lot of data, you might find that unusually large or unusually small results (in string length), or ones which contain non-standard characters or formatting, will be test entries you probably weren’t really supposed to see. E-commerce sites often have test/dummy products that will stand out like this.

Understand the query structure and DB schema better 

Suppose you are black-box testing an e-commerce application where you can type in a product. You don’t know if you are supposed to type in the product’s code (‘#9051202716-23’), its name (‘Sony PlayStation 4’), the product’s genre (‘games console’), or part of the product’s description (‘successor to the PlayStation 3’) to get results. It might be, that your LIKE query actually searches through all of these things. Rather than making assumptions about what our input should look like, we can trivially enumerate this (via brute-force), and better understand the application’s database schema.

Get all possible queried data

On an application which reflects all queried column data, dumping all that information might be as simply as typing an empty string or percent-sign. And if not all queried column data gets reflected back then we can likely still brute-force these values and get them that way.

It might be that the application searched 10 different columns but will only display the number of results found. Even in this extreme case, with some efficient automation we can brute-force all of the queried column’s data.  In the context of an e-commerce site searching products, this may not a problem, but an application should never let you search against something that it is not prepared to show you or you don’t have access to see; i’ve seen far worse cases.

I’ve made a lab to demonstrate exploitation in a semi-realistic scenario, called ‘BlackHat Books’. Feel free to download it and let me know if you get the flag.

blackhatbooks.zip