Much Badoo About Nothing

This is just¬†a short¬†post about toying with the Badoo app for iOS, but also touches on¬†something ever-so-slightly useful about testing the app-upgrade mechanisms of mobile apps. “Urghh more dating app hacking” I hear you say. I¬†know i know, this is getting old. At some point i’ll get a real hobby, I promise.
As of version 5 of Badoo, which has been out for a while now, two things happened. Firstly, they added a forceful crash upon¬†jailbreak detection that I couldn’t be bothered to circumvent… because, secondly, they remade the UI from scratch¬†and I don’t like unnecessary changes in my life. Na Uh.
Jailbreak detection routines for older Badoo versions have been a bit laughable. You have had:
– (bool) jailbroken in class GADDevice,
+(bool) deviceIsJailbroken in class FlurryUtil and finally…
+(bool) appIsCracked also in class FlurryUtil.
These are such common methods/classes that xCon will automatically patch them out and you might never have even known they existed. But they did.
So, I¬†already HAD a solid app which I liked and worked… why can’t I just keep using it? Using App-Admin from Cydia or AppCake, lets downgrade to¬†the latest release of the 4.x branch. App-Admin thinks this is 4.57.4 and AppCake thinks this is 4.9.
4.9 seems suspiciously high. I’m always weary of AppCake, I wouldn’t be surprised if this is a maliciously-modified binary… but oh well,¬†lets install it anyway! (Mr Optimistic).
Well we are back to the glory-days of the orange Badoo icon, but this happens when you open < v5.0 of the app today:

Apparently it’s time to update

Oh no! A version check and a view which tells us to go away and upgrade. Daniel is sad¬†ūüė™
Ok well… I very much doubt the devs at Badoo are doing per-app-version API keys¬†and burning those keys used for older, now unsupported, app versions. And I doubt the app’s API calls or endpoints have changed since the V4 days either. Sooooooo… we just need to force¬†the V4 app to work again, right?

*cracks knuckles*.

OK¬†guys. This is going to be some very technical, next-level shit that this is about to go down in Cycript. I’m not sure your eyes can withstand the eliteness of what they are about to see…

root# cycript -p $(ps -A | grep "Application" | grep "Badoo" | cut -d' ' -f2)
cy# [UIApp.keyWindow setHidden:YES]

*wipes sweat away from face with forearm and presses enter*

“Sign in with Facebook”

The upgrade view has been hidden¬†and we’re at the default “Sign in with Facebook” login – Looks good so far.

Let’s see if it works…

Victory! ( Ginger Morticia aside).

Yup. That’s really it. Underneath¬†the top-level “please upgrade” view (think webpage z-indexes) the app is just chilling there, perfectly functional.

I suppose for a dating app the ramifications here are pretty “meh”, but I have seen the same “throw an upgrade page¬†over it” technique used to prevent use of an outdated (and vulnerable) MDM application on iOS… which is totally uncool. When you are testing iOS apps, try and download a few older versions, time permitting, and see exactly¬†what prevents them from functioning. If you can get these working you might have an easier time trying to introspect these, VS¬†more modern versions¬†with all the security bells and whistles (such as cert pinning, jailbreak detection).


"App Forgery" – A Modern Take on The World's Second-Oldest Profession

In this (pretty long) post, I’m going to attempt to coin a name for an application vulnerability, most commonly found in mobile apps. This is “App Forgery”.
I’ve decided it’d be better to explain the details of this vulnerability using¬†a report-style write-up¬†for an example, real, vulnerable app.
I’ve picked on the first¬†app which came to mind vulnerable to app forgery, which is the “Trainline” app for iOS. The app allows travellers in the UK to purchase digital train tickets and, in this instance, app forgery allows Mr Bad Guy the ability to travel for free indefinitely.
Trainline app usage (skip ahead¬†if you’ve used the app a few times):
For the unfamiliar, basic usage of the app is this: You link your bank card, tell the app where you want to go, type in the 3-digit CVC from the¬†card and you can now download a mobile ticket (which they insist on calling an “mTicket”). An¬†mTicket¬†shows all of the travel information you would find on a normal ticket and is even designed to look a little like a traditional physical¬†one. You tap “activate” on that ticket the day you want to use it and it goes from greyed-out and static to coloured-in and animated, showing that its active.
The guts of¬†an mTicket is¬†displayed across two views in the app, which are toggled via a “UISegmentedControl” (a button).

QR Code (left) / Traditional ticket representation (right)

View¬†one shows a QR code which contains all of the train ticket information. Scanning this QR code with a device provides¬†the only realistic means to¬†easily validate information contained on the¬†mTicket, since inspectors don’t¬†have the time or means to validate tickets manually¬†from the visual portion of the ticket. After scanning the code, the handheld devices I have seen appear¬†to light up either green for “valid” or¬†presumably red for “invalid”.
View¬†two contains a visual representation¬†of a traditional¬†ticket, with journey and ticket details. The current time¬†scrolls from left to right in <marquee> fashion, against¬†a background made from three different colours. These colours differ for each journey and there is no discernible way to determine a journey’s colours prior to activating a ticket.¬†In theory, inspectors¬†on any train should be aware of the “correct colours” for that journey and spot a fake ticket. The fact the current time shown on the ticket scrolls also thwarts people attempting to share tickets via screenshots/video.
Basic introduction to the app out of the way, here’s the¬†write-up for App Forgery.

MEDIUM – “App Forgery” Possible

It is possible for an unlawful person to create a forged/counterfeit Trainline application and mTickets, allowing the forger to travel on trains for free.
It was found that mTickets created within the¬†application are almost always¬†incorrectly validated¬Ļ; with validation occurring based on the physical appearance of mTickets, rather than¬†a proven¬†tie between data on a passenger’s¬†mTicket and server-side data.
Typical mTicket validation pre-boarding a train normally involves scanning the mTicket’s QR code at a barrier to allow passing through it. If, for any reason, the mTicket fails to validate, validation will fall back to the ticket being inspected manually by nearby staff. ¬†Alternatively, some stations without these barriers only perform manual inspection, and many¬†smaller stations perform no ticket¬†checks at all prior to boarding. Then, at some point during the journey, additional ticket checks¬†are¬†often performed by an on-board ticket inspector, using a handheld device for scanning the QR code on mTickets. This device is often unavailable to inspectors, also resulting in¬†mTicket validation being performed manually.
Manual mTicket inspection consists of the inspector assessing, by eye, the physical appearance of the second (non QR code) view of the mTicket. This portion of the mTicket contains no piece of information from which the inspector can use to appropriately validate it, given their time and resources². This permits anyone with moderate iOS development skill the chance to create forgeries of the Trainline app/mTickets.
As a proof of concept, the Trainline’s mTicket section was forged¬†from scratch. The process took around 5¬†hours, which mainly consisted of¬†determining some of the assets in use in the real Trainline app, such as fonts, and the¬†exact¬†positioning¬†of UI elements.

Trainline app forged in Xcode

Legitimate Trainline app VS forgery

The QR code and¬†background colours of the current time area will ofcourse be invalid on any forgery, although these are so infrequently checked that it really does not matter. When the QR code fails to validate at barriers or, on rare occasion, on a train… the likely outcome is¬†that inspectors will briefly assess the right segment of the mTicket and allow the passenger to continue travelling.
For this particular app forgery, it was designed such that each of the journey detailson the mTicket behave like a text-field and are modifiable by clicking on them. This allows a forger  to easily recreate the appearance of any journey details they wish and travel on that journey for free.
Some of the considerations/context which resulted in this issue being attributed with a¬†“Medium” severity risk rating were:

  • The attack does not rely on any server-side processing, and is therefore difficult to detect.
  • In theory, a person could purchase a legitimate ticket, note the ticket’s QR code and colours, refund the ticket and implant these into a forged app. Recreating a more legitimate-appearing ticket.
  • The level of social engineering involved is minimal, inspectors whom are able to spot errors in a forged app are unlikely to assume anything untoward and, at worst, would¬†likely except the ticket as the result of an error within the legitimate app.
  • The barrier to creating a forged app is relatively¬†high, although one could be easily shared/distributed afterwards.


  1. mTickets should be validated only by way of QR code. There is no appropriate alternative within Trainline mTickets currently. This will likely involve greatly increasing the numbers of available QR code scanners to station and train ticket inspectors.
  2. Consider a revision/upgrade to the handheld QR code scanners. These devices  do not appear to offer the inspector enough verbosity to allow them to determine if a scanned QR code is either fake, has been previously scanned, expired, or erroneous. These devices, if network connected, should be able to provide this information to help catch and deter forgeries.

1. This has been determined from the personal experience of the tester, having used the application for¬†a number of years. This does not account for other user’s¬†experiences.
2. A small mitigation technology to app dummying does exist by way of the coloured background where the current time is displayed. These colours change depending on journey details and should provide a means for inspectors to determine suspicious tickets. However, from experience discussing this feature with inspectors, staff are usually unaware of the correct colours for their journey.

Hopefully this explains things well enough to help you spot and report on app forgery vulnerabilities in the future.

The Happn'ing

Years ago, one of the first posts I ever wrote was about my experience scripting a bot for the dating site OKCupid. It was just a PoC bashed together over a few beers with a friend.
Since then (and becoming single) I’ve scripted bits and bobs¬†for virtually every major dating site/app… its become a bit of a weird hobby.
A while ago I wrote a reasonably feature-filled script¬†for managing a user account on the dating app Happn,¬†imaginatively called¬†”.¬†It was immediately spotted by a few Happn employees on my github, who starred the project, but then prevented it from actually working by blocking the python user-agent on the Happn servers. I made the repo¬†private and updated it to work again, with the intention of spending some more time developing it. That time never really came and I stopped using Happn a while ago, so I made the tool public and this is just a quick post to share it.
It comes with command-line options to manage just about everything you can do within the app, as well as¬†a few naughty/fun things to play with… such as the “warandpeace” exploit, which crashes the Happn app of a victim/target account. If you’re struggling to manage your 5k Happn bot swarm I can’t imagine writing a wrapper for this would be particularly hard ūüėČ

Ready the Anti-BEAM Beam! Breaking the Virgin BEAM app in 12 minutes

I’ve been travelling on Virgin trains a lot recently and finally decided to take a look at their free movie-streaming app “BEAM”.
Super-excited to ¬†be about to watch¬†Forest Gump on my journey, I found that whenever I hit play, the app’s custom video-player decided to freeze and eventually crash the app on my device of choice; an iPhone 6s.
Determined to watch Hanks’ award-winning performance, I looked at the devices’s system-log for clues. This¬†revealed that, while the BEAM app’s process was in its final moments in this world, the words “Jailbreak Detected” were logged.
Throwing the app into Hopper and searching for “Jailbreak Detected” shows the following line:

This indicates that¬†the method¬†handleNotification¬†of class¬†CapmediaDrmVideoViewController1 is likely responsible for creating the sys-log entry. ¬†“DRM”? yeah, that sounds about right. On the off-chance this method is also responsible for the jailbreak-detection routine, lets just swizzle¬†that method and null it using¬†a Theos tweak:

%hook CapmediaDrmVideoViewController1
- (void)handleNotification:(id)no{

compiling. compiling… compiling…
open BEAM -> click Forest Gump -> click play ->

Yep, that works. Bypass done ūü§† stop the clock.
Looking at the network traffic for the app afterwards, I¬†noticed¬†that a virgin-owned domain was being POST’d¬†to with a “errorid” parameter, every time I opened a video. This didn’t stop the app from working but was clearly updating¬†Virgin about my¬†jailbreak. This told me that, actually,¬†handleNotification¬†didn’t do the detection routine itself… but was probably¬†responsible for causing the purposeful crash afterwards. So while the app knew it was jailbroken and was able to dial-home and tell¬†mom… it wasn’t able to force-quit.¬†We hadn’t zapped¬†the problem at its¬†root, just kinda circumvented it.
In theory, some 31337¬†virgin… employee (I couldn’t help myself) would be able to put two and two together, noticing¬†that my phone had purportedly crashed, but was still able to successfully stream¬†the entirety of Tom Hanks’ back-catalogue, and¬†infer a bypass had been made; potentially leading to a revised security mechanism in a future update. We can’t have that!
A cheeky class-dump reveals¬†that¬†the CapmediaDrmVideoController1 class¬†also has a sendPostRequestForPlaybackError¬†method. Honestly people, it doesn’t get any easier than this. Lets null this guy too:

%hook CapmediaDrmVideoViewController1
- (void)sendPostRequestForPlaybackError:(id)err{
- (void)handleNotification:(id)no{

compiling. compiling… compiling…
So now the app¬†is both bypassed and flying under the radar ūüôĆ
This really isn’t good enough Virgin.
Lessons for your developers:

  • Don’t log things that don’t need logging. (“Jailbreak Detected” ūü§¶)
  • Ideally, don’t detect a jailbreak with one routine and do something about it in another.
  • Do your JB detection at the start of application launch! The biggest deterrent for me trying to bypass an app’s JB-detection, is repeatedly watching the app crash before the app’s launch-screen¬†has even ended. I’ll give up if it happens enough.
  • Don’t just force-crash/exit() when you detect a jailbreak… start crippling the app from the inside. Delete critical app files so it wont work again, set flags which force the app into a code-path that prevents it¬†from working and store it¬†somewhere that’ll persist even if the app is deleted and re-downloaded (ala keychain). Be sneaky. Very sneaky.


PentestCTF – Another CTF Framework

Instead of doing my final-year project at University, I made (another) open-source CTF/Lab framework, primarily for my own learning benefit during its development, but also because I realised how powerful a group learning environment like a CTF is and I wanted to deploy one at my University.
Its pretty much the same as all the others I’ve seen, although it has a few extra features. Namely:

  • The ability to detect when its own labs are down and show that on the labs page.
  • It has its own accompanying iOS app. Flags from¬†iOS labs¬†can automatically open the main PentestCTF app and submit them.
  • An API. Make your own CLI tool or interface for it, see if I care.
  • A¬†good/simple schema. A lot of CTFs have¬†messy schemas and small improvements to the app are a bit of a headfu** to implement.
  • Achievements. Theres only a few added by default, but they are there and you can make rules to add new ones.
  • Admins don‚Äôt need to give you a default password or let you type a password into their ‚Äúsuper-user‚ÄĚ session. They just send you a link with a token¬†to create your account yourself, and you request another link with a token¬†to make any changes to your account.
  • Game modes. So, the idea is that you instigate a ‚Äúgame mode‚ÄĚ to get people in the hacking mood if things slow down. You can create bonus points for things like ‚Äúmost improved over a month‚ÄĚ or ‚Äúmost scoreboard overtakes‚ÄĚ.

A demo of the site should be live on if you want to take a look. Contact me if you want the source. FYI: On the demo site, the labs were thrown on there from various sources; @Strawp, a BAE CTF in 2015 and my friend Guga who has no online presence… props to those guys.

JTAGulator 3D Printed Case

Super-quick post.
I 3D printed a case for my friend’s JTAGulator and it came out pretty well so I thought i’d share. JTAGulator provide 2D drawing/design files to easily cut out¬†an acrylic case for the board (JTAGulator: Acrylic Case Assembly), so I just stencilled over that in CAD software, extruded the two (top/bottom) pieces an eighth of an inch and traced the JTAGulator logo onto the¬†top face.
The files are scaled to inches rather than mm so if, like me, you use a slicer like Cura that talks in mm, you will need to scale the model in Cura/whatever by 25.4.
The writing¬†isn’t very tall so make sure you’re quick to pause the print and swap filament if you want to text to be a different colour.
OBJ & SketchUp files: jtagulator_case_3dprint
screen-shot-2016-12-07-at-15-58-30 screen-shot-2016-12-07-at-15-59-40

SQL Injection using System Variables in MySQL

For¬†BSides Manchester 2015 the UK pen-testing company aptly named ‘Pentest’ held a SQL injection challenge where the injection point required structuring¬†the payload in a specific manner with MySQL voodoo to keep the payload under 90 characters, and bypass a basic WAF.
I was fairly certain the lab could also be accomplished using MySQL variables, but was unable to get the job done. Low and behold, it totally was possible and it turned out I overcomplicated the solution which they revealed could be achieved with the following:'or@:=(select table_name
    from information_schema.tables limit 40,1)union select 1,2,@%23

This was something I’d never looked at before, and it just¬†didn’t cross my mind to store the query result and retrieve¬†it using¬†a variable with¬†one hit using a UNION. I was trying to do this over two queries and therefore¬†my variable would always be empty/null when i tried to retrieve it, as MySQL variables are scoped to a SESSION ( a single database connection ), and are emptied after the first query completes and the application closes its connection.
This lead me to going completely down the wrong rabbit hole trying to solve the challenge, but also into discovering something reasonably interesting: SQLi with System Variables.
Ok so this is effectively a new technique of SQLi, specific to MySQL (i think). However, its also most often useless; since it requires one of two rather outlandish conditions to be met for an application to be exploitable via this method, both of which would usually mean that the application can be exploited much more trivially. These conditions are:

  • You need to be able to inject at the very begging of the query

OR¬†(as was pointed out to me by SQLMap’s¬†Miroslav Stampar)

  • The MySQL database needs to be configured to allow stacked queries

In the event you can do this though, this method should provide a few benefits over a more tradition SQLi technique:

  1. The ability to inject into small input-length areas
  2. The ability to execute arbitrarily long queries into these areas
  3. The ability to bypass a decent amount of WAFs or poor SQLi defence mechanisms (blacklists)
  4. The ability to add a level of stealth to an otherwise fairly obvious attack
  5. The ability to make forensic analysis of the attack potentially more complicated

Ok so if you read the MySQL SYSTEM reference documentation here, you’ll see that there are a lot of these SYSTEM variables which alter the configuration of the database. 40 of these variables hold string values and are either a GLOBAL¬†(permanent) variable or can be set to either a GLOBAL¬†or SESSION variable.
The online reference doesn’t say what the constraints are on each of these 40 SYSTEM/GLOBAL string variables; and I soon found some only accept specifically formatted string data, and that others are capped to short¬†lengths… so I made a short application which just tried to set each one¬†to a longer and longer length arbitrary string to find out which of these variables could be of use in storing arbitrary, and arbitrarily long, string data.
It turns out there is just one SYSTEM/GLOBAL variable out of the lot which is fit for this purpose and wont break MySQL if you fill it with crap => init_slave
My idea was not to store the result of a query within the init_slave variable and then select it, like in the BSides challenge, but to actually store a query there, prepare and execute it.

SET GLOBAL init_slave='select table_name from information_schema.tables limit 40,1';
PREPARE stmt1 FROM @@init_slave;
EXECUTE stmt1;

If you can inject the above queries into an application, you ultimately run the contents of init_slave.

select table_name from information_schema.tables limit 40,1

Because of neither prerequisites were met for these types of queries to work in the challenge, I gave up on this line of thinking. Although, for fun, I revisited it afterwards to see what would have been possible if those prerequisites were met.
In the event user input is restricted via a basic WAF or similar, like in the challenge, the identified SQL words might render the injection point realistically exploitable. We can potentially get around this scenario with some creative concatenation:

SET GLOBAL init_slave="selec"
SET GLOBAL init_slave=concat(@@init_slave,"t fro")
SET GLOBAL init_slave=concat(@@init_slave,"m inforamtion_schem")
SET GLOBAL init_slave=concat(@@init_slave,"a.table")
SET GLOBAL init_slave=concat(@@init_slave,"s limi")
SET GLOBAL init_slave=concat(@@init_slave,"t 40,1")
PREPARE stmt1 FROM @@init_slave;
EXECUTE stmt1;

All we need to get past the filter is “SET”, “GLOBAL”, “CONCAT”, “PREPARE” and “EXECUTE”. Thats a fairly reasonable ask.
In the event our input is also length restricted, again like the challenge, we can take this method to the extreme and concat a single character at a time.

SET GLOBAL init_slave="selec"
SET GLOBAL init_slave=concat(@@init_slave,"t")
SET GLOBAL init_slave=concat(@@init_slave," ")
SET GLOBAL init_slave=concat(@@init_slave,"*")
SET GLOBAL init_slave=concat(@@init_slave," ")
SET GLOBAL init_slave=concat(@@init_slave,"f")

The longest required query for this method when adding one character at a time is 46 characters; almost half of the BSides challenge. Although in reality your payload will need a bit of manipulating to fit the particular injection point.
The only issue with this technique is that it would require a lot more requests to be made to the application; which may or may not be an issue.
However, this could be  seen as a positive, since being able to trickle one, relatively dumb looking, payload per week to a vulnerable application could help the attack go unnoticed.
Ultimately, this is a rather obscure method of SQLi, but it might be the only available method in some extreme edge-cases. So it might be worth adding it to the memory banks for a rainy day.
A ‘FIX’?
At first I wasn’t convinced this was a bug which needed fixing, and that it might just be abuse of normal operations. The more I think about it though, the more ridiculous it seems to me that SYSTEM variables can be used in a prepared statement in the first place. I’ve found no legitimate use-case for this whatsoever.
Since the init_slave value cannot be copied into a SESSION variable and then executed over two queries (the variable would empty), we only have to worry about not letting init_slave (or any other ‘vulnerable’ SYSTEM variables) from being used as part of a prepared statement. So a fix could easily be created by prohibiting that from happening.
Edit: I’ve submitted a MySQL bug report,¬†with the above as the recommended behaviour:¬† ¬†Lets see if it gets laughed at.
Edit2:¬†As expected, the powers that be at MySQL/Oracle refused to acknowledge this as bug in MySQL, let alone one with security implications (because executing system configuration strings as a query is a solid feature, right?). Full transcript of communication is here:¬†81986.¬†I pretty much did a mic-drop at the end, there could be more correspondence from Oracle but I won’t be bothering to look. Its probably the only time I’ll ever submit anything directly to them, the process was slow, painful and the engineer assigned to my case was¬†the human embodiment of why we have vulnerable software.

Gotta Captcha'm All – Automating Image (and Audio!) Captchas.

A captcha serves one purpose. To ensure that a human has performed a task, and not a machine.
In web applications, they attempt to prevent attackers from creating automated bits of code to brute-force forms, fuzz user input or cause a denial of service.
Its very much a non-trivial task these days to differentiate the man from the machine using these¬†image ( and sometimes audio ) “challenges”, as the logical steps a human brain takes to decipher characters from a captcha can almost always¬†be replicated, often more effectively, in code. The types of people you deploy a captcha to shield yourself against are unlikely to be thwarted by something that can be programatically broken. You’re often just adding another hurdle with a captcha. Some people like hurdles.
With this in mind, if you have chosen to use a captcha to protect a mission-critical application from attack… I am of the opinion you’re already a little bit screwed.¬†A captcha is suitable for stopping a casual WordPress blog like this from being overrun by spam comments from¬†knock-off Barbour jacket merchants, nothing more.
On a recent test, a mission critical application for a bank was indeed vulnerable to a nasty DoS, caused by using the¬†‘RadCaptcha‘ captcha system, which is built into the commercial ‘Telerik’ .net framework. Its a particularly crappy captcha. A previous pentest from another company had already highlighted this,¬†but without demonstrating¬†how it could be broken the bank¬†were reluctant to swap it out.
For the rest of this post, I’ll detail some of the steps I took, and tools I used, to create a PoC for bypassing Telerik RadCaptcha. At the end of it you should¬†have a reasonable idea of how to incorporate captcha-beating functionality¬†into your own scripts. The secondary take-home should be to not use RadCaptcha.
A RadCaptcha protected form typically incorporates both an image captcha, and alternative audio captcha for the visually impaired.
The image captcha looks like this:


…and the audio captcha sounds like this:


Right, lets break these bad boys. Starting with that image captcha.
Here are some of the problems with it:

  • The characters are evenly spaced (the image can be perfectly divided into five segments of 55pixels in width, each containing one character). In an ideal world the characters would be at uneven spacing to make the process of determining where characters start and end more complex for a program. Most OCR tools designed to break captchas wont have a problem figuring this out, but if they did, we could programatically chop the image into five segments and perform OCR on each character separately.
  • The image¬†only seems to have two colours; white and a shade of grey. If character edges are coloured similarly to their¬†backgrounds, it can be tricky for OCR tools to distinguish character edges.
  • The “dust” effect is terrible. Tiny speckles, none of which obstruct the characters in any way.
  • No other effects. No lines scrawling all over the text, nothing.
  • Character warping. There basically is none. It just looks like a quirky font :/

Step 1 РWhile the image is basically made of two colours anyway, lets convert it to a greyscale .pnm file. Most OCR tools like working with .pnm files but dont include the ability to do the conversion themselves:

djpeg -pnm -grey Imgcaptcha.jpg > Imgcaptcha.pnm

Step 2 – Install gocr ( one of many free ORC tools for *nix ) and read the manual:

apt-get/brew install gocr && man gocr

Step 3 – Win:

gocr -d 50 -C a-zA-Z0-9 -a 85 -m 16 -i Imgcaptcha.pnm

The -d 50 tells gocr to attempt to remove clusters of pixels less than 50pixels in size (the “dust”). This completely removes the¬†effect.
-C a-zA-Z0-9 defines the character set to use, which should aid accuracy.
-a 85 specifies the certainty level we want for a character. If our output from this command contains less than 5 characters, we know that there was a < 85% chance that one or more characters were right. So we can skip that captcha and grab another one. Although you can ramp it up to about 95 with RadCaptcha and never miss a character (doh).
-m 16 tells gocr to work in a mode whereby  it wont attempt to separate overlapping characters. Since there wont ever be any in RadCaptcha, this could improve things.
Done. We can turn this process into a one-liner and integrate it into any tool we want to attack a RadCaptcha form. Absurdly easy. Daniel 1 РTelerik 0.
Now for that audio captcha.
We didn’t really have to design a system to break the image captcha, we just used off the shelf tools, actually designed for the job. Here though, we need to construct a process for defeating the audio captcha ourselves (since the closest off the shelf tools for this are for clear¬†audio recognition, and they don’t much like captchas).
Lets pick apart its bad bits and have a think. Listen to it one more time ūüėõ

  • The voice uses¬†the NATO phonetic alphabet. This makes the length of each letter last longer, and creates a signature that may be easier to detect.
  • By getting my hands on a copy of the Telerik framework, I could see that the way this system works is it has one¬†audio recording for each character A-Z, 0-9. The framework stitches combinations of these .WAV¬†files together, adds some noise and then dumps the result as a captcha. The fact each character has only one recording is obviously poor. Its the equivalent of having no character warping in an image captcha.

Here’s the¬†process I ended up going with:
Firstly: Create some baseline files:

  1. Obtain enough of the audio captchas files, so that we have all letters A-Z and all digits 0-9 somewhere in at least one of them.
  2. Remove the “noise” effect from these captcha files.
  3. Cut out each character from the captcha files programmatically by detecting the small silences in-between characters, and save them into their own file. e.g. 9.wav, alfa.wav, bravo.wav etc.

Note: If you have access to a Telerik installation, you could  just rip the raw character sound files out of the framework and use those. Although I tried this and actually found it made the process less accurate. (Its also less hacker-like and you lose cool points.)
Then: To perform the character recognition from a captcha:

  1. Take an audio captcha with unknown characters.
  2. Strip the noise.
  3. Split the .WAV on the silence in the same way as before to separate the unknown characters into individual files.
  4. Use an audio fingerprinting tool to match the similarity of our unknown character files against each of the baseline files (alfa.wav etc).
  5. On a reasonably high match, store the matching character and process the next one, etc etc.
  6. Script this whole process. WIN.

Ok so lets create those baseline files for starters:
Firstly, save ¬†copies of captchas containing A –¬†Z and 0 – 9. This should be as simple as refreshing the protected page a number of times and saving¬†the .WAVs.
We’ll use the *nix tool sox for most of the audio processing. Its apparently “the swiss-army-knife of sound¬†processing programs”. That¬†sounds good, I’m sold.

apt-get install sox

To remove the noise from a captcha we first need to create a “noise profile” for it in sox. We can later use this profile to tell sox how to effectively negate the noise and output¬†a “clean” version of the captcha.

sox RadCaptcha_Audio_4ec6deb0.wav -n noiseprof
sox RadCaptcha_Audio_4ec6deb0.wav CLEANED_RadCaptcha_Audio_4ec6deb0.wav noisered 0.21

Now we need to split the captcha at each moment of silence in-between characters, so we can get just the characters we need out of it.

sox -V3 CLEANED_RadCaptcha_Audio_4ec6deb0 CHARACTER.wav silence 1 3.0 0.1% 1 0.3 0.1% : newfile : restart

Rename the files to something more sensible (in this case foxtrot.wav), and you should have a bunch of files like this:


Repeat this enough times and Bingo! we have all the characters of the alphabet (in NATO phonetic form) and all digits; de-noised, and contained in their own files.
Ok so now we want to be able to use these files to detect the characters in a new captcha programmatically. Lets grab a new captcha (RadCaptcha_Audio_1bc2eaa5.wav) and perform pretty much the same process on that file as we did to generate our base files; strip the noise and separate at the silence. This will give us the unknown characters from a captcha in separate files.

sox RadCaptcha_Audio_1bc2eaa5.wav -n noiseprof
sox RadCaptcha_Audio_1bc2eaa5.wav CLEANED_RadCaptcha_Audio_1bc2eaa5.wav noisered 0.21
sox -V3 CLEANED_RadCaptcha_Audio_1bc2eaa5 UNKNOWN_CHARACTERS.wav silence 1 3.0 0.1% 1 0.3 0.1% : newfile : restart

To compare these unknown character audio files with our baseline ones and determine which characters make up the captcha, I found a pretty good audio fingerprinting perl script (
Either download the above .zip and extract the script, or grab it from the above URL.
Quickly install its dependancies:

apt-get/brew install libchromaprint*
CPAN install 'Statistics::LineFit, Data::Dumper, Capture::Tiny'

Audio similarity tools, I have learned, don’t like comparing 1 second audio files like my foxtrot.wav and a potential character from an audio captcha. Amazingly, we can defeat¬†this problem by just stretching the audio files. So we do this to all our [A-Z0-9] .WAV files and make them at-least 5 seconds a piece (the longer the better).

sox foxtrot.wav foxtrot_slow.wav speed 0.3

The base files wont sound like numbers or the NATA Phonetic Letters anymore, but as long as we do the same thing to the unknown characters from a captcha, that doesnt matter.

sox UNKNOWN_CHARACTERS1of5.wav UNKNOWN_CHARACTERS1of5_slow.wav speed 0.3

They sound quite similar right? Good. They should, the unknown character is also foxtrot, or “F“.
Time to test the perl script:

root@hiburn8:# perl foxtrot_slow.WAV UNKNOWN_CHARACTERS1of5_slow_slow.wav | tail -1 |
Goodnees of fit R2 for File1 & File2 = 0.99214093

The files match with over 99% certainty. Seems like a success to me!
All we do now is create a loop to check each unknown character against our base files, and move on to the next character when we get a match somewhere around the 80% mark. Its not terrifically efficient, but it works.
Daniel 2 – Telerik 0
Its worth stringing processes like this together on a test if they¬†don’t take you more than a few hours. If a developer has made a conscience decision to include a captcha somewhere, he/she has obviously put it there to add security. If you can prove that it doesn’t really work… that’s a finding, regardless of whether you then find other vulnerabilities¬†with the form.
– hiburn8

"Bypassing" CSP's Data-Exfiltration Protections

A long time ago now, I tweeted a challenge to see of anyone knew what the following URL would attempt to do:';$.ajax({url:'/wp-login.php?action=register',type:'POST',data:"user_login='dr'&user_email=''&gclient_id=&gredirect_uri='123456'&ws_plugin__s2member_custom_reg_field_user_pass2='123456'&ws_plugin__s2member_custom_reg_field_first_name='d'&ws_plugin__s2member_custom_reg_field_last_name='r'&ws_plugin__s2member_custom_reg_field_address_1='1'&ws_plugin__s2member_custom_reg_field_address_2=&ws_plugin__s2member_custom_reg_field_city=s&ws_plugin__s2member_custom_reg_field_country=u&ws_plugin__s2member_custom_reg_field_mobile_devices='" encodeURI(document.cookie) "'&ws_plugin__s2member_custom_reg_field_mobile_devices2=Apple&redirect_to=&wp-submit=Register"});var lol='a

Don’t worry, I don’t expect you to stare at that monstrosity. Instead I’ll just tell you;
So a friend of mine was competing in¬†WhiteHatRally¬†last year, which is a sort of “solve the clues to figure out where to go”-style car race for charity, and he realised that it might be possible to stalk the other competitors, who were displaying there progress on FB¬†via¬†GPS tracking site, in order to determine where they were going and beat them there. So initially we looked at spending a weekend¬†making an app to cheat at a charity race by scraping the other contestant’s locations in semi-real-time… we’re that cool. But I ended up spending the weekend learning the ins and outs of Content Security Policy (CSP) instead, which actually is quite cool.
In failing to make an app to cheat at a charity race I noticed that the site had pretty dyer input validation and pretty much everything was vulnerable to XSS. This was a prime example:‘;alert();
We realised that an even more effective form of cheating would be to compromise the insta-mapper accounts of the other contestants using this XSS, and get a better look at where they were and where they were going.. and so we went from making an app to cheat at a charity race to directly attacking its competitors :/ Its weird how comfortable I was with this.
The site had “connect-src” & “form-action” directives in the CSP headers, with values pointing to the site’s own origin. What on earth do they mean? I thought CSP was for blocking XSS?! Well… take a look at these:

  • "connect-src" limits the origins to which you can connect (via XHR, WebSockets, and EventSource).
  • "form-action" lists valid endpoints for submission from <form> tags.
  • "child-src"¬†lists the URLs for workers and embedded frame contents. For example: child-src would enable embedding videos from YouTube but not from other origins.
  • "img-src"¬†defines the origins from which images can be loaded.
  • "media-src"¬†restricts the origins allowed to deliver video and audio.
  • "font-src"¬†specifies the origins that can serve web fonts.
So utilising ALL of these directives, a site can, quite effectively, prevent itself from being used as a base for CSRF against another site, and can prevent data exfiltration from its own pages. Cool!
Now, the insta-mapper app only uses two of these directives, and the others give us clues as to how we might still be able to get data out. We could, for example, inject an image whose location is at¬† and the file-name is the victim’s session cookie, then simply scrape that from our access logs. But, as a thought experiment, lets assume that insta-mapper used ALL of these directives… then what?
This would mean¬†that, while we have XSS which can grab the session cookies, we can’t actually ex-filtrate them¬†out of the application to, drats! Although, and this is the semi-obvious, yet not so obvious, bit… we don’t hardly ever need to get data OUT of an application, we just have to COPY¬†it somewhere we can see it. With that in mind, this is what the URL above does:
Makes and AJAX post to the registration page for (same origin, totally allowed), creating a new account with a User/pass that we define.. and registers a tracking device for that account with the name of document.cookie.
BOOM! All we have to do is send a victim (aka; a nice person who has spent their weekend doing something good for charity) this link, log-in to the account we made them set-up, and look at the account’s device name, which¬†will be the session cookie for¬†the victim!
as a side: If the site you’re pwning uses JQuery or you’re able to bring it in yourself, do yourself a favour and do it. The AJAX functions will¬†make your¬†payloads/attacks smaller and work better across different browsers, rather than including a mess of 10-year-old JS you copied from the web. Its just as reliable as injecting an auto-submitting¬†form these days and wont redirect the browser ūüôā
Update: I sent Google’s Mike West my thanks for his CSP write-up on and pointed him to my post. He directed me to a blog/paper¬†Michal Zalewski (@lcamtuf)¬†published back in December of 2011, which talks about the many ways in which an attacker can perform content exfiltration. It looks like, at the time, CSP was really only an XSS defence.. but has since grown to include all of the above directives which, one-by-one, work to solve the data egress methods Michal talks about. We both came to the same conclusion¬†in our posts though, that same-origin content exfiltration is going to be damn-near impossible to protect against. ¬†I’d thoroughly recommend reading his paper:¬†

Hunting bad regex with good regex.

On a pentest for a web-app a few months ago I saw something quite ridiculous. A regular expression, similar to the one below, was being used in client-side JavaScript to validate the format/appearance of a user-supplied email address:


If the address matched the regex, the validation passed, and the email address was sent to the server via AJAX. if not, the application threw a hissy-fit and asked the user to try again.
This regex is very basic, but does kinda¬†represent a stripped-down version of¬†an email address correctly. Let’s step through its key parts:

  1. It starts off okay: ([a-zA-Z0-9]+\.)* => “match any alphanumeric followed by a full-stop, zero to infinite times” (eg.¬†i.have.a.huge.).
  2. Then:¬†[a-zA-Z0-9]+ =>¬†“at least onealphanumeric character, once or more” (eg.¬†
  3. @ =>¬†A¬†mandatory “@” symbol (eg.¬†
  4. And then the first step repeats again: ([a-zA-Z0-9]+\.)* => (eg.

BUT WAIT ONE MOMENT BATMAN! what the bloody hell is that at the end of this regex? A cheeky (.*) !!!

([a-zA-Z0-9]+\.)*[a-zA-Z0-9]+@([a-zA-Z0-9]+\.)*.*<----- ŗ≤†_ŗ≤†

The unescaped (¬†.¬†) followed by a (¬†*¬†) means “match anything, zero or more times”. Anything.
I imagine that, at some point in time, the regex originally looked something like this and matched predefined TLDs:


…but a lazy developer got tired of¬†manually adding all the fancy TLDs available these days (¬†.williamhill is a legit TLD! ) and decided to just slap this dot-star in there instead to match anything. “future-proofing” they probably thought. What harm could come of that eh?
A LOT. Now the regex matches all manner of naughty input. E.g:
daniel@company.<script src=”foo”> ,¬†daniel@company.’ OR 1=1–
Sure enough, starting the input with a valid looking email address and then inputting anything under the sun in the TLD portion,¬†bypassed the client-side validation check and submitted the form. Even better, our lazy dev must have copy-pasted the ludicrous new regex onto the server-side validation and BOOM! This was a stored XSS attack (who needs to output encode letters, numbers and fullstops right? ;P). Our lazy dev earned (him|her)self the¬†new moniker of “regex noob” that day.
I found this so absurd¬†that I wondered if bad regexes like this were actually more common that I’d have expected, and OH.MY.GOD. When you start¬†actively looking for bad regexes in applications, not just web, they are¬†everywhere!
The problem is though that, today, even basic¬†webpages can often pull in¬†over 10MB of JavaScript, which will mostly be “corner-rounding” rubbish-script.¬†So tracking down the¬†potential¬†.* can be tedious.
So, I took a look at what combinations of regex meta-characters could lead a poorly thought-out regular expression to inadvertently match malicious strings. This it the list I came up with:

[^ <- character blacklist logic, obviously bad
.+ <- anything, one or more times
.* <- anything, zero or more times
.{ <- anything, a designated amount of times
\D+ <- not a digit, one or more times
\D* <- not a digit, zero or more times
\D{ <- not a digit, a designated amount of times
\S+ <- not a whitespace char, one or more times
\S* <- not a whitespace char, zero or more times
\S{ <- not a whitespace char, a designated amount of times
\W+ <- not a word char, one or more times
\W* <- not a word char, zero or more times
\W{ <- not a work char, a designated amount of times
.)+ <- as above, optional group close
.)* <- as above, optional group close
.){ <- as above, optional group close
\D)+ <- as above, optional group close
\D)* <- as above, optional group close
\D){ <- as above, optional group close
\S)+ <- as above, optional group close
\S)* <- as above, optional group close
\S){ <- as above, optional group close
\W)+ <- as above, optional group close
\W)* <- as above, optional group close
\W){ <- as above, optional group close

The keen-eyed of you will have noticed that the list includes things like ( \W+ ) which is the equivalent of (¬†[^a-zA-Z0-9_] ) or “not a word character”. If you’re wondering how on earth it’d be possible to execute arbitrary JavaScript code without a word character, check this out:¬†
Anyway,¬†assuming we have some data that may contain regexes, and that these regexes may¬†contain ‘bad’ sequences of meta-characters,¬†the below regex matches all of the scenarios in the list to try and hunt¬†those sequences out.


Simply throwing this regex as a search query against your Burp history/data parsing tool, will often find interesting results in input validation areas of applications.
And if you wanted to push the boat out you could begin the regex with ( ((^|[^\\])(\\\\)* ) this would ensure that any matches did not have escaping backslash characters in-front like ( \.* ) or ( \\\\\\\\\\\\\\\W+ ).
You can stop reading here, copy that regex, and start trying to see if you find anything interesting with it. Or carry on to see some caveats of this regex, another bad example and how it can be exploited ūüôā
Part 2
Some caveats to success using regex to match regex;

  1. Because of the complex nature of regex, you would need a large series of regexes to attempt to capture every possible place where bad sequences of meta-characters could exist in another regex; so things ARE missed using the above.
  2. Because¬†of the different implementations of regex in JS, Java etc.. trying to create a one-fits all regex that can be used within all tools (ie. a browser or burp) and scenarios is nigh on¬†impossible. Burp’s Java implementation, for example, hates complicated regexes and has less features then say PERL’s, meanwhile MySQL regexes don’t even¬†support lookarounds. This limits the effectiveness of any regex you could design as you need to stick with basic regex constructs¬†and not get fancy.

Imagine a web application using this regex for email appearance validation:


The problem here is, again, that our regex noob has not escaped his ( . ) characters. However, at first glance, it might not appear exploitable as it would seem that we can only insert a single arbitrary character in defined parts of the email address (the places where a literal . should be).
The regex allows the attacker to insert any number of alphanumeric characters, followed by any character, followed by any number of alphanumeric characters again, then has an @, and finally, repeats the pre-@ phase at the tail end. So normal-looking emails like this work: , these work.
as well as, wrongly, these: daniel$reece@admin| ,¬†d!a”n¬£i$e%l@h^i&b*u(r)¬†also works.
What it does not allow however, is multiple non-alphanumeric characters next to one another, so:
dan<script src=””>, won’t work.
How is this exploitable then? Easy:


Most modern browsers will see that the above is an illegal way to reference a src in a script tag, as there are no quotation marks, but will just fix it for you in the DOM. A friend of mine pointed out that, if the application also accepted URL-encoded input, the following would allow those quotes to still be sent:


This is a slightly trickier regex issue to find and exploit,¬†we can’t simply match all (.)’s in source-code as practically everything would be a false positive, and (¬†.)?¬†)¬†isn’t likely to be an issue in 99% of cases, so¬†is purposefully not matched¬†by our¬†regex-hunting regex.
So, what does this mean? well, you can often automate the process of finding that “needle in a haystack” vulnerability, but you won’t do it every time ūüôā and that’s the reason we all have jobs!