VYPR
Moderate severityNVD Advisory· Published Dec 2, 2018· Updated Dec 18, 2025

CVE-2018-19787

CVE-2018-19787

Description

An issue was discovered in lxml before 4.2.5. lxml/html/clean.py in the lxml.html.clean module does not remove javascript: URLs that use escaping, allowing a remote attacker to conduct XSS attacks, as demonstrated by "j a v a s c r i p t:" in Internet Explorer. This is a similar issue to CVE-2014-3146.

AI Insight

LLM-synthesized narrative grounded in this CVE's description and references.

lxml before 4.2.5 fails to sanitize escaped javascript: URLs in lxml.html.clean, enabling XSS attacks.

Vulnerability

The lxml library before version 4.2.5 contains a cross-site scripting (XSS) vulnerability in the lxml.html.clean module, specifically in lxml/html/clean.py. The module does not properly remove javascript: URLs that use escaping, such as "j a v a s c r i p t:", which Internet Explorer interprets as a valid javascript: scheme. This is a similar issue to CVE-2014-3146 [1].

Exploitation

An attacker can exploit this vulnerability by crafting a link or HTML element that contains an escaped javascript: URL. The attacker would need to deliver this content to a user who is using Internet Explorer because older versions of IE interpret escaped whitespace in protocol schemes. No authentication or special privileges are required; the attacker merely needs to include the malicious link in a web page or email that the victim loads with lxml processing enabled [1][2].

Impact

Successful exploitation allows a remote attacker to conduct XSS attacks. The attacker can execute arbitrary JavaScript in the context of the victim's browser session, potentially stealing cookies, session tokens, or performing actions on behalf of the user. The impact is limited to the browser's security context and the web application using lxml for HTML sanitization [1][2].

Mitigation

The vulnerability is fixed in lxml version 4.2.5, released on November 25, 2018. Users should upgrade to lxml 4.2.5 or later. Ubuntu provided an update for Ubuntu 12.04 ESM as USN-3841-2 [1][4]. As of the CVE publication date, there is no known workaround other than upgrading or disabling the use of lxml.html.clean on untrusted HTML content [2][4].

AI Insight generated on May 22, 2026. Synthesized from this CVE's description and the cited reference URLs; citations are validated against the source bundle.

Affected packages

Versions sourced from the GitHub Security Advisory.

PackageAffected versionsPatched versions
lxmlPyPI
< 4.2.54.2.5

Affected products

64

Patches

1
6be1d081b49c

Fix: make the cleaner also remove javascript URLs that use escaping.

https://github.com/lxml/lxmlStefan BehnelSep 9, 2018via ghsa
2 files changed · +6 5
  • src/lxml/html/clean.py+3 2 modified
    @@ -8,9 +8,10 @@
     import copy
     try:
         from urlparse import urlsplit
    +    from urllib import unquote_plus
     except ImportError:
         # Python 3
    -    from urllib.parse import urlsplit
    +    from urllib.parse import urlsplit, unquote_plus
     from lxml import etree
     from lxml.html import defs
     from lxml.html import fromstring, XHTML_NAMESPACE
    @@ -482,7 +483,7 @@ def _kill_elements(self, doc, condition, iterate=None):
     
         def _remove_javascript_link(self, link):
             # links like "j a v a s c r i p t:" might be interpreted in IE
    -        new = _substitute_whitespace('', link)
    +        new = _substitute_whitespace('', unquote_plus(link))
             if _is_javascript_scheme(new):
                 # FIXME: should this be None to delete?
                 return ''
    
  • src/lxml/html/tests/test_clean.txt+3 3 modified
    @@ -18,7 +18,7 @@
     ...   <body onload="evil_function()">
     ...     <!-- I am interpreted for EVIL! -->
     ...     <a href="javascript:evil_function()">a link</a>
    -...     <a href="j\x01a\x02v\x03a\x04s\x05c\x06r\x07i\x0Ep t:evil_function()">a control char link</a>
    +...     <a href="j\x01a\x02v\x03a\x04s\x05c\x06r\x07i\x0Ep t%20:evil_function()">a control char link</a>
     ...     <a href="data:text/html;base64,PHNjcmlwdD5hbGVydCgidGVzdCIpOzwvc2NyaXB0Pg==">data</a>
     ...     <a href="#" onclick="evil_function()">another link</a>
     ...     <p onclick="evil_function()">a paragraph</p>
    @@ -51,7 +51,7 @@
       <body onload="evil_function()">
         <!-- I am interpreted for EVIL! -->
         <a href="javascript:evil_function()">a link</a>
    -    <a href="javascrip t:evil_function()">a control char link</a>
    +    <a href="javascrip t%20:evil_function()">a control char link</a>
         <a href="data:text/html;base64,PHNjcmlwdD5hbGVydCgidGVzdCIpOzwvc2NyaXB0Pg==">data</a>
         <a href="#" onclick="evil_function()">another link</a>
         <p onclick="evil_function()">a paragraph</p>
    @@ -84,7 +84,7 @@
       <body onload="evil_function()">
         <!-- I am interpreted for EVIL! -->
         <a href="javascript:evil_function()">a link</a>
    -    <a href="javascrip%20t:evil_function()">a control char link</a>
    +    <a href="javascrip%20t%20:evil_function()">a control char link</a>
         <a href="data:text/html;base64,PHNjcmlwdD5hbGVydCgidGVzdCIpOzwvc2NyaXB0Pg==">data</a>
         <a href="#" onclick="evil_function()">another link</a>
         <p onclick="evil_function()">a paragraph</p>
    

Vulnerability mechanics

Generated on May 9, 2026. Inputs: CWE entries + fix-commit diffs from this CVE's patches. Citations validated against bundle.

References

10

News mentions

0

No linked articles in our index yet.