And now if you think that some of your pixelated pictures on Instagram where you present to the internet public your newly obtained driver’s license doesn’t bring you in danger, we have to warn you: some threat actors can “depixelate” your “you thought so” thoroughly secured by the pixels the driver’s license info.
Dan Petro, Lead Researcher at Bishop Fox showed us why the best tool for securing sensitive information will be black bars. He wrote a tool named Unredacter to explain why. Also security researchers from Jumpsec Labs working on the same subject made a challenge to everyone that Petro accepted to solve their “depixelation” task.
In a post published on Bishop Fox he explained in extensive details the whole trick behind and tried to solve the given task.
Jumpsec Labs’ challenge
In reality one can find a whole bunch of tools for redacting sensitive information on the Internet and successfully get rid of your blurring, swirling and pixelation. But we,of course, won’t name any names here and instead tell you how the “depixelation” works according to the research made by the mentioned Dan Petro from Bishop Fox.
On GitHub there is a tool called Depix that tries to look up what permutations of pixels can result in certain pixelated blocks, given De Bruijn sequence of the correct font. Petro admitted he personally likes the theory a lot but one colleague from Jumpsec argued that perhaps in practice it would not work as you’d like it to be. He notes that in real life you will likely get obstacles like noise and minor variations. These things can significantly influence the tool’s work. Then they announced a challenge to everyone who can un-redact their pixelated image.
How Pixelation Works?
The principal of pixelation the researcher explained in the next way:
One divides an image into a grid of a given block size. For each block one sets the redacted image’s color equal to the average color of the original for that same area. In such a way the information of the image across each block gets “smeared”.
But while some information may get lost in the process some of it will still be left. Because of the simplicity of the described method it is widely standardized across many dedicated software like Photoshop or GiMP.
How to solve the Jumpsec Challenge Text?
So having written a tool the researcher Dan Petro decided with its help to try to solve the given task. Before describing the actual process he explained the difficulties a potential threat actor might “encounter” in solving one.
First he says that the redaction process is essentially local. Speaking in cryptographic terms it means there`s no diffusion. To put it simply one has to guess in a pixelated text character by character. He explained further that because a change of one pixel somewhere in the image affects only the redacted block it belongs to, the one way to depixelate the text would be to guess one character by one.
A recursive depth-first search will be done on each character trying to guess a match to the part of the redacted text. Principally we guess for example a letter “b” then pixelate that letter and look at how close it matches to the part of the redacted text. Then we try to guess again, for example, a letter “ d” and so on. The researcher adds that it may not look so hard at first sight but as it was already mentioned the potential threat actor may “come across” certain difficulties.
Bleed-over and whitespace
The first difficulty is the character bleed-over. The characters of pixelated text, it turns out to be, don’t line up 1:1 with the blocks of the redaction. And according to the researcher`s words this means that a given correct guess of a character might be wrong due to the presence of wrong blocks on the right-most edge. He ilustarted his words by well described example but to say it in simple and short words because of the said bleed-over one might mistake one character for another.
Another problem that comes directly from the first is the whitespace thing. That’s when after one character goes completely blank white space. If such happens the pixelated block will be completely overtaken by the next character and so makes the guess harder.
Fonts “difficulties”
The next problems can be on a whole grouped into fonts “difficulties”. Potential threat actors can “come across” such difficulties as the variable-width font, the font inconsistency and the pixelation offset. If some threat actor might spend significant time trying to guess the character because of bleed-over and whitespace, they might spend additional several hours because of the mentioned “obstacles” above.
The Variable-Width Font means the amount of horizontal space that each letter takes depending on the letter itself. For example, an “m” takes up more space than an “i “. For the threat actors this means trying to guess each character in like a cascading effect to the right of it. Not to view every character individually but keeping in mind the others as well.
With the font inconsistency one can have slightly different images for the exact same font. Sans Serif looks different on the close up comparing between FireFox and GiMP. It would be quite a relief for the threat actors if you use some standard rendering program but if it’s some awry one then their guesses might fail.
To talk about an offset, consider a static grid, where there’s 64 distinct locations for you to place the text on that grid. Specialists call this the x and y “offset”. And depending on the offset one chooses it significantly affects different images of pixelated text.
With such knowledge in mind the researcher successfully solved the task given by the Jumpsec. At the end he one more time advised users in order to conceal some sensitive information use only black bars that cover the whole text. But be attentive and do not use just changing your Word document so that it has black background with black text. A threat actor can easily see what’s behind such black bar by simply highlighting it. Edit your text as an image.