Trojan Detection Sharpened by Behavioral Feature Selection, Not Bigger Models

Malware analysts spend a lot of time deciding which signals from a sandbox run are worth keeping. A sample executed in a controlled environment can generate hundreds of measurable attributes covering file structure, registry edits, process behavior, and network traffic. Most of those attributes add noise. A recent study works through this problem in detail, and the part that earns attention from working defenders is the feature selection, not the deep learning model attached to it.

The team built a detection framework for Windows-based IoT and industrial IoT gateways. They assembled 3,000 Windows executables, ran each one through the ANY.RUN sandbox, and recorded behavioral, static, and network-level data for every sample. The samples were labeled benign, suspicious, or malicious. From the raw output, they pulled an initial pool of 146 features and reduced it to a working set of 33. A custom neural network they call TrDNN then classified the samples, and they compared it against ten common machine learning and deep learning models. The classification results came out strong. For a cybersecurity reader, the more useful material sits in how the 33 features were chosen and what those features say about current Trojan tradecraft.

The retained features map to the stages of a Trojan compromise. Persistence shows up through registry autorun keys, scheduled tasks, Windows service installation, and startup-folder edits. Execution and evasion appear through process injection into trusted processes such as explorer.exe and svchost.exe, memory-allocation calls, hidden-window execution, and User Account Control tampering. Command-and-control activity comes through in low-jitter beaconing intervals, HTTP POST and PUT patterns that point to data exfiltration, encrypted outbound bursts, and traffic concentrated on a small number of endpoints. Binary-level signals round it out, including PE header anomalies, high section entropy, and unsigned executables sitting in system directories.

The exclusions are equally informative. The team dropped privilege-token manipulation, generic HTTP communication chains, and abuse of living-off-the-land binaries such as PowerShell and regsvr32. These behaviors carry real weight in an investigation, and they appear across ransomware, worms, and red-team tooling, which lowers their value for separating Trojans from everything else. That reasoning is a reminder that a signal common to many threat types can still be a poor discriminator for one of them.

The researchers ran the framework as a continuous monitoring loop driven by the Windows command line, using built-in utilities such as tasklist, netstat, and wmic to enumerate processes, extract the 33 features, and pass them to the trained model. They report stable operation on a standard enterprise workstation with an Intel Core i7 processor and 32 GB of RAM, with no GPU or specialized hardware. The loop runs on a three-minute cycle, which they settled on after stress testing. That setup matters for environments with operator workstations, human-machine interfaces, and supervisory systems, where Windows is common and spare compute is limited.

The researchers are direct about the constraints. The dataset is moderate in size and comes from a single sandbox source, which raises the question of how well the model generalizes to samples it has never seen. Trojans engineered to stay dormant may never surface during a given monitoring window, since the system depends on observing live behavior. Sophisticated malware that detects sandbox conditions can suppress its activity and feed the model misleading data. The platform constraint carries the most operational weight: the pipeline targets Windows, while many IoT devices run embedded Linux, real-time operating systems, or microcontroller firmware.

The transferable lesson runs deeper than one model. Strong detection came from disciplined, domain-informed feature work that isolated behaviors specific to Trojan activity. Defenders can apply that thinking to their own pipelines: identify the signals tied to a threat's lifecycle, discard the ones that fire across every category, and keep the detection logic understandable to the analysts who maintain it.