Burnyard: Open-Source Tool Keeps Malware Analysis Local to Avoid Alerting Attackers

Submitting a suspicious file to VirusTotal or MalwareBazaar places a copy of that file on a platform other people can search. Analysts across the industry rely on these services to get a quick verdict on whether a binary is dangerous. The convenience carries a condition many overlook. Once a sample reaches a public repository, the person who wrote it can locate it there. Skilled operators watch these platforms for the hashes of their own tools, and a match tells them their campaign has been detected. Files tied to a targeted intrusion can also carry sensitive material from the victim, which then sits on a third-party system.

Burnyard, a research project from The Ohio State University, takes aim at this condition. It runs suspicious binaries on the analyst's own hardware and keeps each sample local for the duration of the analysis. The tool performs dynamic analysis through user-space emulation, executing a sample one instruction at a time and intercepting every system call and Windows API call the program issues. A custom hook framework records each event with its decoded parameters and return value, producing a chronological trace in CSV form. That trace becomes the input to a classifier, which assigns the sample a label of benign or one of 43 known malware families. A transformer-based language model adds a plain-language description of the observed behavior.

The emulation layer operates at the instruction level and avoids the hypervisor stack that a sandbox depends on. Burnyard supports Windows, Linux, and Mach-O binaries across several CPU architectures. A supplied root filesystem provides the libraries, directories, and registry stubs a binary expects at runtime, which removes the need for a host operating system. The design allows deployment on commodity hardware with no network connection. The team ran its evaluation on a Dell Optiplex Micro 3050 with a 7th-generation Intel i5 processor and 16 GB of memory.

The authors timed Burnyard against VirusTotal and Sophos Intelix across 100 samples for each operating system category. For Windows samples, Burnyard averaged 22.41 seconds, compared with 32.36 seconds for VirusTotal and 182.88 seconds for Intelix. For Linux samples, Burnyard averaged 5.47 seconds, against 16.27 seconds for VirusTotal and 80.85 seconds for Intelix. The three platforms measure different things. VirusTotal sends each sample to more than 70 engines, most of which perform static scanning, and its reported time reflects that aggregate response. Intelix provisions a dedicated sandbox for every submission and absorbs the cost of starting, running, and tearing down that environment. Burnyard's figure covers its local pipeline from metadata extraction through emulation and classification.

The classification pipeline covers 44 classes, comprising 43 malware families and one benign class. Families with larger sample counts, including Adware.Neoreklami, GCleaner, WannaCry, Socks5Systemz, and CobaltStrike, reach high recall. Families with thin training data, including QNAPCrypt with 10 samples, salty with 15, REvil with 21, and RemcosRAT with 22, reach lower recall. The errors cluster among families that share behavior. LockBit and Hive trade places because both produce encryption-heavy file operations. A group of remote access trojans, among them WarZoneRAT, njrat, nanocore, and netwire, overlap on process injection, keylogging-related calls, and command-and-control traffic. WannaCry stays well separated on the strength of its SMB-based spread.

There is a catch worth sitting with. The tests measure speed, and speed is the part Burnyard wins on. They skip the harder question of whether it gets the answer right. Nobody checked Burnyard's verdicts against the ones VirusTotal and Intelix hand back, so we still do not know if all three agree on what a given file is. Emulation comes with a weakness of its own. A careful piece of malware can sense when it is running inside a stripped-down environment. It watches the clock, it probes for API calls that should exist, and when something feels off, it goes quiet and hides what it really does. There is a second snag underneath that one. When the emulator lacks a call the binary wants, the binary can stall partway through, and the trace ends early. The authors themselves flag this: incomplete coverage of system and API calls can keep a binary from finishing, leaving the trace a partial picture of what the program actually does.

None of this sinks the idea. Burnyard is chasing something people want. Air-gapped sites, government labs, and privacy-sensitive shops all need a way to study malware that keeps the file on a local disk and the whole setup in a closet. A used desktop pulling that off is a real result. The job from here is to prove the verdict it produces holds up next to the tools analysts already lean on.