Gotta Captcha’m All – Automating Image (and Audio!) Captchas.

A captcha serves one purpose. To ensure that a human has performed a task, and not a machine.
In web applications, they attempt to prevent attackers from creating automated bits of code to brute-force forms, fuzz user input or cause a denial of service.
Its very much a non-trivial task these days to differentiate the man from the machine using these image ( and sometimes audio ) “challenges”, as the logical steps a human brain takes to decipher characters from a captcha can almost always be replicated, often more effectively, in code. The types of people you deploy a captcha to shield yourself against are unlikely to be thwarted by something that can be programatically broken. You’re often just adding another hurdle with a captcha. Some people like hurdles.
With this in mind, if you have chosen to use a captcha to protect a mission-critical application from attack… I am of the opinion you’re already a little bit screwed. A captcha is suitable for stopping a casual WordPress blog like this from being overrun by spam comments from knock-off Barbour jacket merchants, nothing more.
On a recent test, a mission critical application for a bank was indeed vulnerable to a nasty DoS, caused by using the ‘RadCaptcha‘ captcha system, which is built into the commercial ‘Telerik’ .net framework. Its a particularly crappy captcha. A previous pentest from another company had already highlighted this, but without demonstrating how it could be broken the bank were reluctant to swap it out.
For the rest of this post, I’ll detail some of the steps I took, and tools I used, to create a PoC for bypassing Telerik RadCaptcha. At the end of it you should have a reasonable idea of how to incorporate captcha-beating functionality into your own scripts. The secondary take-home should be to not use RadCaptcha.
A RadCaptcha protected form typically incorporates both an image captcha, and alternative audio captcha for the visually impaired.
The image captcha looks like this:


…and the audio captcha sounds like this:


Right, lets break these bad boys. Starting with that image captcha.
Here are some of the problems with it:

  • The characters are evenly spaced (the image can be perfectly divided into five segments of 55pixels in width, each containing one character). In an ideal world the characters would be at uneven spacing to make the process of determining where characters start and end more complex for a program. Most OCR tools designed to break captchas wont have a problem figuring this out, but if they did, we could programatically chop the image into five segments and perform OCR on each character separately.
  • The image only seems to have two colours; white and a shade of grey. If character edges are coloured similarly to their backgrounds, it can be tricky for OCR tools to distinguish character edges.
  • The “dust” effect is terrible. Tiny speckles, none of which obstruct the characters in any way.
  • No other effects. No lines scrawling all over the text, nothing.
  • Character warping. There basically is none. It just looks like a quirky font :/

Step 1 – While the image is basically made of two colours anyway, lets convert it to a greyscale .pnm file. Most OCR tools like working with .pnm files but dont include the ability to do the conversion themselves:

Step 2 – Install gocr ( one of many free ORC tools for *nix ) and read the manual:

Step 3 – Win:

The -d 50 tells gocr to attempt to remove clusters of pixels less than 50pixels in size (the “dust”). This completely removes the effect.
-C a-zA-Z0-9 defines the character set to use, which should aid accuracy.
-a 85 specifies the certainty level we want for a character. If our output from this command contains less than 5 characters, we know that there was a < 85% chance that one or more characters were right. So we can skip that captcha and grab another one. Although you can ramp it up to about 95 with RadCaptcha and never miss a character (doh).
-m 16 tells gocr to work in a mode whereby  it wont attempt to separate overlapping characters. Since there wont ever be any in RadCaptcha, this could improve things.
Done. We can turn this process into a one-liner and integrate it into any tool we want to attack a RadCaptcha form. Absurdly easy. Daniel 1 – Telerik 0.
Now for that audio captcha.
We didn’t really have to design a system to break the image captcha, we just used off the shelf tools, actually designed for the job. Here though, we need to construct a process for defeating the audio captcha ourselves (since the closest off the shelf tools for this are for clear audio recognition, and they don’t much like captchas).
Lets pick apart its bad bits and have a think. Listen to it one more time 😛

  • The voice uses the NATO phonetic alphabet. This makes the length of each letter last longer, and creates a signature that may be easier to detect.
  • By getting my hands on a copy of the Telerik framework, I could see that the way this system works is it has one audio recording for each character A-Z, 0-9. The framework stitches combinations of these .WAV files together, adds some noise and then dumps the result as a captcha. The fact each character has only one recording is obviously poor. Its the equivalent of having no character warping in an image captcha.

Here’s the process I ended up going with:
Firstly: Create some baseline files:

  1. Obtain enough of the audio captchas files, so that we have all letters A-Z and all digits 0-9 somewhere in at least one of them.
  2. Remove the “noise” effect from these captcha files.
  3. Cut out each character from the captcha files programmatically by detecting the small silences in-between characters, and save them into their own file. e.g. 9.wav, alfa.wav, bravo.wav etc.

Note: If you have access to a Telerik installation, you could  just rip the raw character sound files out of the framework and use those. Although I tried this and actually found it made the process less accurate. (Its also less hacker-like and you lose cool points.)
Then: To perform the character recognition from a captcha:

  1. Take an audio captcha with unknown characters.
  2. Strip the noise.
  3. Split the .WAV on the silence in the same way as before to separate the unknown characters into individual files.
  4. Use an audio fingerprinting tool to match the similarity of our unknown character files against each of the baseline files (alfa.wav etc).
  5. On a reasonably high match, store the matching character and process the next one, etc etc.
  6. Script this whole process. WIN.

Ok so lets create those baseline files for starters:
Firstly, save  copies of captchas containing A – Z and 0 – 9. This should be as simple as refreshing the protected page a number of times and saving the .WAVs.
We’ll use the *nix tool sox for most of the audio processing. Its apparently “the swiss-army-knife of sound processing programs”. That sounds good, I’m sold.

To remove the noise from a captcha we first need to create a “noise profile” for it in sox. We can later use this profile to tell sox how to effectively negate the noise and output a “clean” version of the captcha.


Now we need to split the captcha at each moment of silence in-between characters, so we can get just the characters we need out of it.

Rename the files to something more sensible (in this case foxtrot.wav), and you should have a bunch of files like this:


Repeat this enough times and Bingo! we have all the characters of the alphabet (in NATO phonetic form) and all digits; de-noised, and contained in their own files.
Ok so now we want to be able to use these files to detect the characters in a new captcha programmatically. Lets grab a new captcha (RadCaptcha_Audio_1bc2eaa5.wav) and perform pretty much the same process on that file as we did to generate our base files; strip the noise and separate at the silence. This will give us the unknown characters from a captcha in separate files.

To compare these unknown character audio files with our baseline ones and determine which characters make up the captcha, I found a pretty good audio fingerprinting perl script (
Either download the above .zip and extract the script, or grab it from the above URL.
Quickly install its dependancies:

Audio similarity tools, I have learned, don’t like comparing 1 second audio files like my foxtrot.wav and a potential character from an audio captcha. Amazingly, we can defeat this problem by just stretching the audio files. So we do this to all our [A-Z0-9] .WAV files and make them at-least 5 seconds a piece (the longer the better).


The base files wont sound like numbers or the NATA Phonetic Letters anymore, but as long as we do the same thing to the unknown characters from a captcha, that doesnt matter.


They sound quite similar right? Good. They should, the unknown character is also foxtrot, or “F“.
Time to test the perl script:

The files match with over 99% certainty. Seems like a success to me!
All we do now is create a loop to check each unknown character against our base files, and move on to the next character when we get a match somewhere around the 80% mark. Its not terrifically efficient, but it works.
Daniel 2 – Telerik 0
Its worth stringing processes like this together on a test if they don’t take you more than a few hours. If a developer has made a conscience decision to include a captcha somewhere, he/she has obviously put it there to add security. If you can prove that it doesn’t really work… that’s a finding, regardless of whether you then find other vulnerabilities with the form.
– hiburn8