VYPR
Medium severity6.1NVD Advisory· Published May 14, 2014· Updated May 6, 2026

CVE-2014-3146

CVE-2014-3146

Description

Incomplete blacklist vulnerability in the lxml.html.clean module in lxml before 3.3.5 allows remote attackers to conduct cross-site scripting (XSS) attacks via control characters in the link scheme to the clean_html function.

Affected packages

Versions sourced from the GitHub Security Advisory.

PackageAffected versionsPatched versions
lxmlPyPI
< 3.3.53.3.5

Affected products

95
  • Lxml/Lxml95 versions
    cpe:2.3:a:lxml:lxml:2.1.2:*:*:*:*:*:*:*+ 94 more
    • cpe:2.3:a:lxml:lxml:2.1.2:*:*:*:*:*:*:*
    • cpe:2.3:a:lxml:lxml:2.1.3:*:*:*:*:*:*:*
    • cpe:2.3:a:lxml:lxml:2.1.4:*:*:*:*:*:*:*
    • cpe:2.3:a:lxml:lxml:2.2:-:*:*:*:*:*:*
    • cpe:2.3:a:lxml:lxml:2.2:alpha1:*:*:*:*:*:*
    • cpe:2.3:a:lxml:lxml:2.2:beta1:*:*:*:*:*:*
    • cpe:2.3:a:lxml:lxml:2.2:beta2:*:*:*:*:*:*
    • cpe:2.3:a:lxml:lxml:2.2:beta3:*:*:*:*:*:*
    • cpe:2.3:a:lxml:lxml:2.2:beta4:*:*:*:*:*:*
    • cpe:2.3:a:lxml:lxml:2.2.1:*:*:*:*:*:*:*
    • cpe:2.3:a:lxml:lxml:2.2.2:*:*:*:*:*:*:*
    • cpe:2.3:a:lxml:lxml:2.2.3:*:*:*:*:*:*:*
    • cpe:2.3:a:lxml:lxml:2.2.4:*:*:*:*:*:*:*
    • cpe:2.3:a:lxml:lxml:*:*:*:*:*:*:*:*range: <=3.3.4
    • cpe:2.3:a:lxml:lxml:0.5:*:*:*:*:*:*:*
    • cpe:2.3:a:lxml:lxml:0.5.1:*:*:*:*:*:*:*
    • cpe:2.3:a:lxml:lxml:0.6:*:*:*:*:*:*:*
    • cpe:2.3:a:lxml:lxml:0.7:*:*:*:*:*:*:*
    • cpe:2.3:a:lxml:lxml:0.8:*:*:*:*:*:*:*
    • cpe:2.3:a:lxml:lxml:0.9:*:*:*:*:*:*:*
    • cpe:2.3:a:lxml:lxml:0.9.1:*:*:*:*:*:*:*
    • cpe:2.3:a:lxml:lxml:0.9.2:*:*:*:*:*:*:*
    • cpe:2.3:a:lxml:lxml:1.0:*:*:*:*:*:*:*
    • cpe:2.3:a:lxml:lxml:1.0.1:*:*:*:*:*:*:*
    • cpe:2.3:a:lxml:lxml:1.0.2:*:*:*:*:*:*:*
    • cpe:2.3:a:lxml:lxml:1.0.3:*:*:*:*:*:*:*
    • cpe:2.3:a:lxml:lxml:1.0.4:*:*:*:*:*:*:*
    • cpe:2.3:a:lxml:lxml:1.1:*:*:*:*:*:*:*
    • cpe:2.3:a:lxml:lxml:1.1.1:*:*:*:*:*:*:*
    • cpe:2.3:a:lxml:lxml:1.1.2:*:*:*:*:*:*:*
    • cpe:2.3:a:lxml:lxml:1.2:*:*:*:*:*:*:*
    • cpe:2.3:a:lxml:lxml:1.2.1:*:*:*:*:*:*:*
    • cpe:2.3:a:lxml:lxml:1.3:*:*:*:*:*:*:*
    • cpe:2.3:a:lxml:lxml:1.3.1:*:*:*:*:*:*:*
    • cpe:2.3:a:lxml:lxml:1.3.2:*:*:*:*:*:*:*
    • cpe:2.3:a:lxml:lxml:1.3.3:*:*:*:*:*:*:*
    • cpe:2.3:a:lxml:lxml:1.3.4:*:*:*:*:*:*:*
    • cpe:2.3:a:lxml:lxml:1.3.5:*:*:*:*:*:*:*
    • cpe:2.3:a:lxml:lxml:1.3.6:*:*:*:*:*:*:*
    • cpe:2.3:a:lxml:lxml:2.0:*:*:*:*:*:*:*
    • cpe:2.3:a:lxml:lxml:2.0.1:*:*:*:*:*:*:*
    • cpe:2.3:a:lxml:lxml:2.0.2:*:*:*:*:*:*:*
    • cpe:2.3:a:lxml:lxml:2.0.3:*:*:*:*:*:*:*
    • cpe:2.3:a:lxml:lxml:2.0.4:*:*:*:*:*:*:*
    • cpe:2.3:a:lxml:lxml:2.0.5:*:*:*:*:*:*:*
    • cpe:2.3:a:lxml:lxml:2.0.6:*:*:*:*:*:*:*
    • cpe:2.3:a:lxml:lxml:2.0.7:*:*:*:*:*:*:*
    • cpe:2.3:a:lxml:lxml:2.0.8:*:*:*:*:*:*:*
    • cpe:2.3:a:lxml:lxml:2.0.9:*:*:*:*:*:*:*
    • cpe:2.3:a:lxml:lxml:2.0.10:*:*:*:*:*:*:*
    • cpe:2.3:a:lxml:lxml:2.0.11:*:*:*:*:*:*:*
    • cpe:2.3:a:lxml:lxml:2.1:alpha1:*:*:*:*:*:*
    • cpe:2.3:a:lxml:lxml:2.1:beta1:*:*:*:*:*:*
    • cpe:2.3:a:lxml:lxml:2.1:beta2:*:*:*:*:*:*
    • cpe:2.3:a:lxml:lxml:2.1:beta3:*:*:*:*:*:*
    • cpe:2.3:a:lxml:lxml:2.1.1:*:*:*:*:*:*:*
    • cpe:2.3:a:lxml:lxml:3.2.4:*:*:*:*:*:*:*
    • cpe:2.3:a:lxml:lxml:3.2.5:*:*:*:*:*:*:*
    • cpe:2.3:a:lxml:lxml:3.3.0:-:*:*:*:*:*:*
    • cpe:2.3:a:lxml:lxml:3.3.0:beta1:*:*:*:*:*:*
    • cpe:2.3:a:lxml:lxml:3.3.0:beta2:*:*:*:*:*:*
    • cpe:2.3:a:lxml:lxml:3.3.0:beta3:*:*:*:*:*:*
    • cpe:2.3:a:lxml:lxml:3.3.0:beta4:*:*:*:*:*:*
    • cpe:2.3:a:lxml:lxml:3.3.0:beta5:*:*:*:*:*:*
    • cpe:2.3:a:lxml:lxml:3.3.1:*:*:*:*:*:*:*
    • cpe:2.3:a:lxml:lxml:3.3.2:*:*:*:*:*:*:*
    • cpe:2.3:a:lxml:lxml:3.3.3:*:*:*:*:*:*:*
    • cpe:2.3:a:lxml:lxml:2.2.5:*:*:*:*:*:*:*
    • cpe:2.3:a:lxml:lxml:2.2.6:*:*:*:*:*:*:*
    • cpe:2.3:a:lxml:lxml:2.2.7:*:*:*:*:*:*:*
    • cpe:2.3:a:lxml:lxml:2.2.8:*:*:*:*:*:*:*
    • cpe:2.3:a:lxml:lxml:2.3:-:*:*:*:*:*:*
    • cpe:2.3:a:lxml:lxml:2.3:alpha1:*:*:*:*:*:*
    • cpe:2.3:a:lxml:lxml:2.3:alpha2:*:*:*:*:*:*
    • cpe:2.3:a:lxml:lxml:2.3:beta1:*:*:*:*:*:*
    • cpe:2.3:a:lxml:lxml:2.3.1:*:*:*:*:*:*:*
    • cpe:2.3:a:lxml:lxml:2.3.2:*:*:*:*:*:*:*
    • cpe:2.3:a:lxml:lxml:2.3.3:*:*:*:*:*:*:*
    • cpe:2.3:a:lxml:lxml:2.3.4:*:*:*:*:*:*:*
    • cpe:2.3:a:lxml:lxml:2.3.5:*:*:*:*:*:*:*
    • cpe:2.3:a:lxml:lxml:2.3.6:*:*:*:*:*:*:*
    • cpe:2.3:a:lxml:lxml:3.0:-:*:*:*:*:*:*
    • cpe:2.3:a:lxml:lxml:3.0:alpha1:*:*:*:*:*:*
    • cpe:2.3:a:lxml:lxml:3.0:alpha2:*:*:*:*:*:*
    • cpe:2.3:a:lxml:lxml:3.0:beta1:*:*:*:*:*:*
    • cpe:2.3:a:lxml:lxml:3.0.1:*:*:*:*:*:*:*
    • cpe:2.3:a:lxml:lxml:3.0.2:*:*:*:*:*:*:*
    • cpe:2.3:a:lxml:lxml:3.1:beta1:*:*:*:*:*:*
    • cpe:2.3:a:lxml:lxml:3.1.0:*:*:*:*:*:*:*
    • cpe:2.3:a:lxml:lxml:3.1.1:*:*:*:*:*:*:*
    • cpe:2.3:a:lxml:lxml:3.1.2:*:*:*:*:*:*:*
    • cpe:2.3:a:lxml:lxml:3.2.0:*:*:*:*:*:*:*
    • cpe:2.3:a:lxml:lxml:3.2.1:*:*:*:*:*:*:*
    • cpe:2.3:a:lxml:lxml:3.2.2:*:*:*:*:*:*:*
    • cpe:2.3:a:lxml:lxml:3.2.3:*:*:*:*:*:*:*

Patches

3
3f3082e0a678

Merge branch lxml-4.2 into master.

https://github.com/lxml/lxmlStefan BehnelSep 9, 2018via ghsa
6 files changed · +29 14
  • CHANGES.txt+9 3 modified
    @@ -3,14 +3,20 @@ lxml changelog
     ==============
     
     4.3.0 (2018-??-??)
    -==================
    -
     Features added
     --------------
    -
     * The module ``lxml.sax`` is compiled using Cython in order to speed it up.
     
     
    +4.2.5 (2018-09-09)
    +==================
    +
    +Bugs fixed
    +----------
    +
    +* Javascript URLs that used URL escaping were not removed by the HTML cleaner.
    +  Security problem found by Omar Eissa.
    +
     
     4.2.4 (2018-08-03)
     ==================
    
  • doc/main.txt+7 3 modified
    @@ -157,8 +157,8 @@ Index <http://pypi.python.org/pypi/lxml/>`_ (PyPI).  It has the source
     that compiles on various platforms.  The source distribution is signed
     with `this key <pubkey.asc>`_.
     
    -The latest version is `lxml 4.2.4`_, released 2018-08-03
    -(`changes for 4.2.4`_).  `Older versions <#old-versions>`_
    +The latest version is `lxml 4.2.5`_, released 2018-09-09
    +(`changes for 4.2.5`_).  `Older versions <#old-versions>`_
     are listed below.
     
     Please take a look at the
    @@ -250,7 +250,9 @@ See the websites of lxml
     ..
        and the `latest in-development version <http://lxml.de/dev/>`_.
     
    -.. _`PDF documentation`: lxmldoc-4.2.4.pdf
    +.. _`PDF documentation`: lxmldoc-4.2.5.pdf
    +
    +* `lxml 4.2.5`_, released 2018-09-09 (`changes for 4.2.5`_)
     
     * `lxml 4.2.4`_, released 2018-08-03 (`changes for 4.2.4`_)
     
    @@ -272,6 +274,7 @@ See the websites of lxml
     
     * `older releases <http://lxml.de/3.7/#old-versions>`_
     
    +.. _`lxml 4.2.5`: /files/lxml-4.2.5.tgz
     .. _`lxml 4.2.4`: /files/lxml-4.2.4.tgz
     .. _`lxml 4.2.3`: /files/lxml-4.2.3.tgz
     .. _`lxml 4.2.2`: /files/lxml-4.2.2.tgz
    @@ -282,6 +285,7 @@ See the websites of lxml
     .. _`lxml 4.0.0`: /files/lxml-4.0.0.tgz
     .. _`lxml 3.8.0`: /files/lxml-3.8.0.tgz
     
    +.. _`changes for 4.2.5`: /changes-4.2.5.html
     .. _`changes for 4.2.4`: /changes-4.2.4.html
     .. _`changes for 4.2.3`: /changes-4.2.3.html
     .. _`changes for 4.2.2`: /changes-4.2.2.html
    
  • doc/rest2html.py+1 1 modified
    @@ -38,7 +38,7 @@ def pygments_directive(name, arguments, options, content, lineno,
                            content_offset, block_text, state, state_machine):
         try:
             lexer = get_lexer_by_name(arguments[0])
    -    except ValueError, e:
    +    except ValueError:
             # no lexer found - use the text one instead of an exception
             lexer = TextLexer()
         # take an arbitrary option if more than one is given
    
  • src/lxml/html/clean.py+3 2 modified
    @@ -8,9 +8,10 @@
     import copy
     try:
         from urlparse import urlsplit
    +    from urllib import unquote_plus
     except ImportError:
         # Python 3
    -    from urllib.parse import urlsplit
    +    from urllib.parse import urlsplit, unquote_plus
     from lxml import etree
     from lxml.html import defs
     from lxml.html import fromstring, XHTML_NAMESPACE
    @@ -477,7 +478,7 @@ def _kill_elements(self, doc, condition, iterate=None):
     
         def _remove_javascript_link(self, link):
             # links like "j a v a s c r i p t:" might be interpreted in IE
    -        new = _substitute_whitespace('', link)
    +        new = _substitute_whitespace('', unquote_plus(link))
             if _is_javascript_scheme(new):
                 # FIXME: should this be None to delete?
                 return ''
    
  • src/lxml/html/tests/test_clean.txt+3 3 modified
    @@ -18,7 +18,7 @@
     ...   <body onload="evil_function()">
     ...     <!-- I am interpreted for EVIL! -->
     ...     <a href="javascript:evil_function()">a link</a>
    -...     <a href="j\x01a\x02v\x03a\x04s\x05c\x06r\x07i\x0Ep t:evil_function()">a control char link</a>
    +...     <a href="j\x01a\x02v\x03a\x04s\x05c\x06r\x07i\x0Ep t%20:evil_function()">a control char link</a>
     ...     <a href="data:text/html;base64,PHNjcmlwdD5hbGVydCgidGVzdCIpOzwvc2NyaXB0Pg==">data</a>
     ...     <a href="#" onclick="evil_function()">another link</a>
     ...     <p onclick="evil_function()">a paragraph</p>
    @@ -51,7 +51,7 @@
       <body onload="evil_function()">
         <!-- I am interpreted for EVIL! -->
         <a href="javascript:evil_function()">a link</a>
    -    <a href="javascrip t:evil_function()">a control char link</a>
    +    <a href="javascrip t%20:evil_function()">a control char link</a>
         <a href="data:text/html;base64,PHNjcmlwdD5hbGVydCgidGVzdCIpOzwvc2NyaXB0Pg==">data</a>
         <a href="#" onclick="evil_function()">another link</a>
         <p onclick="evil_function()">a paragraph</p>
    @@ -84,7 +84,7 @@
       <body onload="evil_function()">
         <!-- I am interpreted for EVIL! -->
         <a href="javascript:evil_function()">a link</a>
    -    <a href="javascrip%20t:evil_function()">a control char link</a>
    +    <a href="javascrip%20t%20:evil_function()">a control char link</a>
         <a href="data:text/html;base64,PHNjcmlwdD5hbGVydCgidGVzdCIpOzwvc2NyaXB0Pg==">data</a>
         <a href="#" onclick="evil_function()">another link</a>
         <p onclick="evil_function()">a paragraph</p>
    
  • tools/manylinux/build-wheels.sh+6 2 modified
    @@ -24,12 +24,16 @@ build_wheel() {
                 -w /io/$WHEELHOUSE
     }
     
    -assert_importable() {
    +run_tests() {
         # Install packages and test
         for PYBIN in /opt/python/*/bin/; do
             ${PYBIN}/pip install $PACKAGE --no-index -f /io/$WHEELHOUSE
     
    +        # check import as a quick test
             (cd $HOME; ${PYBIN}/python -c 'import lxml.etree, lxml.objectify')
    +
    +        # run tests
    +        (cd $HOME; ${PYBIN}/python /io/test.py)
         done
     }
     
    @@ -74,5 +78,5 @@ show_wheels() {
     prepare_system
     build_wheels
     repair_wheels
    -assert_importable
    +run_tests
     show_wheels
    
86e81ab393ba

changelog

https://github.com/lxml/lxmlStefan BehnelApr 17, 2014via ghsa
1 file changed · +10 0
  • CHANGES.txt+10 0 modified
    @@ -2,6 +2,16 @@
     lxml changelog
     ==============
     
    +3.3.5 (???)
    +==================
    +
    +Bugs fixed
    +----------
    +
    +* HTML cleaning could fail to strip javascript links that mix control
    +  characters into the link scheme.
    +
    +
     3.3.4 (2014-04-03)
     ==================
     
    
e86b294f1f81

strip control characters before looking for evil text content in Cleaner

https://github.com/lxml/lxmlStefan BehnelApr 17, 2014via ghsa
2 files changed · +13 5
  • src/lxml/html/clean.py+5 4 modified
    @@ -70,9 +70,10 @@
     
     # All kinds of schemes besides just javascript: that can cause
     # execution:
    -_javascript_scheme_re = re.compile(
    -    r'\s*(?:javascript|jscript|livescript|vbscript|data|about|mocha):', re.I)
    -_substitute_whitespace = re.compile(r'\s+').sub
    +_is_javascript_scheme = re.compile(
    +    r'(?:javascript|jscript|livescript|vbscript|data|about|mocha):',
    +    re.I).search
    +_substitute_whitespace = re.compile(r'[\s\x00-\x08\x0B\x0C\x0E-\x19]+').sub
     # FIXME: should data: be blocked?
     
     # FIXME: check against: http://msdn2.microsoft.com/en-us/library/ms537512.aspx
    @@ -466,7 +467,7 @@ def _kill_elements(self, doc, condition, iterate=None):
         def _remove_javascript_link(self, link):
             # links like "j a v a s c r i p t:" might be interpreted in IE
             new = _substitute_whitespace('', link)
    -        if _javascript_scheme_re.search(new):
    +        if _is_javascript_scheme(new):
                 # FIXME: should this be None to delete?
                 return ''
             return link
    
  • src/lxml/html/tests/test_clean.txt+8 1 modified
    @@ -1,3 +1,4 @@
    +>>> import re
     >>> from lxml.html import fromstring, tostring
     >>> from lxml.html.clean import clean, clean_html, Cleaner
     >>> from lxml.html import usedoctest
    @@ -17,6 +18,7 @@
     ...   <body onload="evil_function()">
     ...     <!-- I am interpreted for EVIL! -->
     ...     <a href="javascript:evil_function()">a link</a>
    +...     <a href="j\x01a\x02v\x03a\x04s\x05c\x06r\x07i\x0Ep t:evil_function()">a control char link</a>
     ...     <a href="data:text/html;base64,PHNjcmlwdD5hbGVydCgidGVzdCIpOzwvc2NyaXB0Pg==">data</a>
     ...     <a href="#" onclick="evil_function()">another link</a>
     ...     <p onclick="evil_function()">a paragraph</p>
    @@ -33,7 +35,7 @@
     ...   </body>
     ... </html>'''
     
    ->>> print(doc)
    +>>> print(re.sub('[\x00-\x07\x0E]', '', doc))
     <html>
       <head>
         <script type="text/javascript" src="evil-site"></script>
    @@ -49,6 +51,7 @@
       <body onload="evil_function()">
         <!-- I am interpreted for EVIL! -->
         <a href="javascript:evil_function()">a link</a>
    +    <a href="javascrip t:evil_function()">a control char link</a>
         <a href="data:text/html;base64,PHNjcmlwdD5hbGVydCgidGVzdCIpOzwvc2NyaXB0Pg==">data</a>
         <a href="#" onclick="evil_function()">another link</a>
         <p onclick="evil_function()">a paragraph</p>
    @@ -81,6 +84,7 @@
       <body onload="evil_function()">
         <!-- I am interpreted for EVIL! -->
         <a href="javascript:evil_function()">a link</a>
    +    <a href="javascrip%20t:evil_function()">a control char link</a>
         <a href="data:text/html;base64,PHNjcmlwdD5hbGVydCgidGVzdCIpOzwvc2NyaXB0Pg==">data</a>
         <a href="#" onclick="evil_function()">another link</a>
         <p onclick="evil_function()">a paragraph</p>
    @@ -104,6 +108,7 @@
       </head>
       <body>
         <a href="">a link</a>
    +    <a href="">a control char link</a>
         <a href="">data</a>
         <a href="#">another link</a>
         <p>a paragraph</p>
    @@ -123,6 +128,7 @@
       </head>
       <body>
         <a href="">a link</a>
    +    <a href="">a control char link</a>
         <a href="">data</a>
         <a href="#">another link</a>
         <p>a paragraph</p>
    @@ -146,6 +152,7 @@
       </head>
       <body>
         <a href="">a link</a>
    +    <a href="">a control char link</a>
         <a href="">data</a>
         <a href="#">another link</a>
         <p>a paragraph</p>
    

Vulnerability mechanics

Generated by null/stub on May 9, 2026. Inputs: CWE entries + fix-commit diffs from this CVE's patches. Citations validated against bundle.

References

27

News mentions

0

No linked articles in our index yet.