CVE-2021-42694
Description
Unicode Standard homoglyph definitions allow deceptive source code identifiers visually identical to targets, enabling supply-chain attacks via adversarial identifier injection.
AI Insight
LLM-synthesized narrative grounded in this CVE's description and references.
Unicode Standard homoglyph definitions allow deceptive source code identifiers visually identical to targets, enabling supply-chain attacks via adversarial identifier injection.
Vulnerability
The Unicode Specification through version 14.0 contains character definitions that permit the creation of homoglyphs — characters that render visually identical to a target identifier but are semantically distinct [1]. This issue affects all software relying on Unicode identifiers unless mitigations are applied. An adversary can define function or variable names using homoglyph characters that appear exactly like legitimate identifiers in source code, but are processed differently by compilers or interpreters [2][4]. In particular, Unicode control characters such as Right-to-Left Override (U+202E) can be used to reorder tokens visually, making commented-out code appear executable, as demonstrated in JavaScript examples [4].
Exploitation
To exploit this vulnerability, an attacker needs the ability to contribute or modify source code in an upstream software dependency, often via a public repository or package manager [1]. The attacker inserts homoglyph identifiers into the dependency, taking advantage of the fact that human code reviewers see the same visual representation as the legitimate identifier [2][4]. No special network position or authentication is required beyond standard contribution access; the attack vector relies on social deception during code review. The attacker may also use Unicode control characters to visually rearrange lines of code, making comments appear active or altering control flow [4].
Impact
Successful exploitation can lead to arbitrary code injection in downstream software that includes the compromised dependency. The attacker may achieve full application compromise depending on the context of the injected code [1][4]. The impact spans all aspects of confidentiality, integrity, and availability, as injected code could steal credentials, modify data, or enable remote code execution. The attack exploits trust in human-readable identifiers and is particularly dangerous because the deceptive code appears legitimate in most development environments [2][4].
Mitigation
The Unicode Consortium provides guidance in Unicode Technical Standard #39 (UTS #39) and Unicode Technical Report #36 [1]. Recommended mitigations include using identifier security profiles that restrict allowed Unicode characters, performing normalization of identifiers (e.g., NFKC), and implementing visual spoofing detection mechanisms. Developers should use tools that flag homoglyph characters and control characters during code review and CI/CD pipelines. As of Unicode 14.0.0 (September 2021), the underlying character definitions enabling homoglyphs remain present, but software vendors must apply the recommended filters [3]. No single patch exists for the Unicode standard itself; individual applications and build systems must enforce proper identifier validation.
AI Insight generated on May 26, 2026. Synthesized from this CVE's description and the cited reference URLs; citations are validated against the source bundle.
Affected products
2- The Unicode Consortium/Unicode Specificationdescription
- Range: <=14.0
Patches
0No patches discovered yet.
Vulnerability mechanics
No source-code context for this CVE — mechanics is only generated when we can read the actual fix diff. Without that, the four sections (root cause, attack vector, affected code, fix) would be speculation rather than analysis.
References
10- security.gentoo.org/glsa/202210-09mitrevendor-advisory
- www.kb.cert.org/vuls/id/999008mitrethird-party-advisory
- www.openwall.com/lists/oss-security/2021/11/01/1mitremailing-list
- www.openwall.com/lists/oss-security/2021/11/01/6mitremailing-list
- www.unicode.org/versions/Unicode14.0.0/mitre
- cwe.mitre.org/data/definitions/1007.htmlmitre
- trojansource.codesmitre
- www.scyon.nl/post/trojans-in-your-source-codemitre
- www.unicode.org/reports/tr36/mitre
- www.unicode.org/reports/tr39/mitre
News mentions
0No linked articles in our index yet.