Didier Stevens Releases base64dump.py --stats to Reverse Custom BASE64 Encodings in Malicious JPEGs
SANS handler Didier Stevens introduces a new --stats option for his base64dump.py tool, enabling analysts to detect and reverse custom BASE64 encodings used in steganographic malware delivery, demonstrated on a malicious JPEG hiding a PE executable.

SANS Internet Storm Center handler Didier Stevens has released a new feature for his base64dump.py tool — a --stats option designed to help malware analysts reverse custom BASE64 encodings commonly used in steganographic payload delivery. The update was demonstrated on a malicious JPEG file that concealed a Windows executable using a modified BASE64 alphabet and string reversal, a technique that would otherwise require access to the original extraction script.
Stevens began his analysis by examining the suspicious JPEG with his byte-stats.py tool, which revealed that nearly half of the file's content (45.65%) consisted of BASE64 characters, with the longest contiguous BASE64 string spanning almost one million characters. However, when he ran the file through base64dump.py with standard settings, the longest detected BASE64 string was only 1,000 characters and failed to decode into anything recognizable — a clear sign that a custom encoding was in play.
To identify the custom encoding, Stevens used the new --stats option, which provides a character frequency analysis of detected BASE64 strings. The statistics showed that the letter 'A' appeared significantly less frequently than other BASE64 characters, and when the minimum string length was increased, 'A' disappeared entirely. The '#' character, meanwhile, was the most frequent non-BASE64 character in the file, suggesting that '#' had replaced 'A' in the alphabet.
Even after substituting '#' for 'A', the decoded output remained garbled. Stevens then noticed that the BASE64 string began with == — the padding character normally found at the end — and ended with qVT, which is TVq reversed, the marker for a Windows PE executable (MZ). This indicated that the entire encoded string had been reversed before being embedded in the JPEG. After reversing the string with his translate.py tool, the decoded payload matched the hash of the executable previously extracted by fellow handler Xavier in an earlier diary entry.
The --stats option provides analysts with a statistical fingerprint of the encoding characters, making it easier to spot substitutions and other transformations without needing to reverse-engineer the attacker's extraction code. This is particularly valuable in incident response scenarios where only the steganographic carrier file is available.
Steganographic malware delivery — hiding executables inside images, audio files, or other media — remains a popular technique among threat actors because it can bypass network security controls that inspect file types or scan for known signatures. The Evil MSI Background technique, which inspired this analysis, involves embedding malicious payloads in JPEG files used as MSI installer backgrounds, a method that has been observed in recent phishing campaigns.
Stevens' update to base64dump.py is available on his blog at DidierStevens.com. The tool is part of a broader suite of open-source utilities he maintains for malware analysis, including byte-stats.py and translate.py. The SANS Internet Storm Center diary entry provides the full technical walkthrough, including command-line examples and output screenshots.