CVE-2018-3740
Description
A specially crafted HTML fragment can cause Sanitize gem for Ruby to allow non-whitelisted attributes to be used on a whitelisted HTML element.
AI Insight
LLM-synthesized narrative grounded in this CVE's description and references.
Sanitize gem for Ruby allows non-whitelisted attributes on whitelisted HTML elements, potentially leading to XSS.
Vulnerability
Sanitize is an allowlist-based HTML and CSS sanitizer for Ruby [1]. CVE-2018-3740 describes a flaw where a specially crafted HTML fragment can cause Sanitize to allow non-whitelisted attributes on a whitelisted HTML element. This occurs due to improper attribute validation in the CleanElement transformer. All versions prior to the fix are affected.
Exploitation
An attacker can craft an HTML fragment containing non-whitelisted attributes on whitelisted elements. When this fragment is processed by Sanitize, the attribute allowlist is bypassed. No special privileges or network position are required; the attacker only needs to supply the crafted HTML input to an application using Sanitize.
Impact
Successful exploitation allows an attacker to inject arbitrary attributes into HTML output. This can lead to cross-site scripting (XSS) attacks if the injected attributes (e.g., onload, href with javascript:) are not properly filtered. The scope of the compromise depends on the context where the sanitized output is used.
Mitigation
The issue is fixed in Sanitize version 4.6.2 [4]. Users should update to this version or later. GitLab also released version 11.0.1 which includes the fix [2]. No workarounds are available; updating is the recommended mitigation.
AI Insight generated on May 22, 2026. Synthesized from this CVE's description and the cited reference URLs; citations are validated against the source bundle.
Affected packages
Versions sourced from the GitHub Security Advisory.
| Package | Affected versions | Patched versions |
|---|---|---|
sanitizeRubyGems | >= 3.0.0, < 4.6.3 | 4.6.3 |
Affected products
2- Ryan Grove/sanitize (ruby gem)v5Range: < 4.6.3
Patches
201629a162e44fix: Prevent code injection due to improper escaping in libxml2 >= 2.9.2
3 files changed · +149 −20
lib/sanitize/transformers/clean_element.rb+74 −19 modified@@ -18,6 +18,31 @@ class Sanitize; module Transformers; class CleanElement # http://www.whatwg.org/specs/web-apps/current-work/multipage/elements.html#embedding-custom-non-visible-data-with-the-data-*-attributes REGEX_DATA_ATTR = /\Adata-(?!xml)[a-z_][\w.\u00E0-\u00F6\u00F8-\u017F\u01DD-\u02AF-]*\z/u + # Attributes that need additional escaping on `<a>` elements due to unsafe + # libxml2 behavior. + UNSAFE_LIBXML_ATTRS_A = Set.new(%w[ + name + ]) + + # Attributes that need additional escaping on all elements due to unsafe + # libxml2 behavior. + UNSAFE_LIBXML_ATTRS_GLOBAL = Set.new(%w[ + action + href + src + ]) + + # Mapping of original characters to escape sequences for characters that + # should be escaped in attributes affected by unsafe libxml2 behavior. + UNSAFE_LIBXML_ESCAPE_CHARS = { + ' ' => '%20', + '"' => '%22' + } + + # Regex that matches any single character that needs to be escaped in + # attributes affected by unsafe libxml2 behavior. + UNSAFE_LIBXML_ESCAPE_REGEX = /[ "]/ + def initialize(config) @add_attributes = config[:add_attributes] @attributes = config[:attributes].dup @@ -92,31 +117,61 @@ def call(env) node.attribute_nodes.each do |attr| attr_name = attr.name.downcase - if attr_whitelist.include?(attr_name) - # The attribute is whitelisted. + unless attr_whitelist.include?(attr_name) + # The attribute isn't whitelisted. + + if allow_data_attributes && attr_name.start_with?('data-') + # Arbitrary data attributes are allowed. If this is a data + # attribute, continue. + next if attr_name =~ REGEX_DATA_ATTR + end + + # Either the attribute isn't a data attribute or arbitrary data + # attributes aren't allowed. Remove the attribute. + attr.unlink + next + end + + # The attribute is whitelisted. - # Remove any attributes that use unacceptable protocols. - if @protocols.include?(name) && @protocols[name].include?(attr_name) - attr_protocols = @protocols[name][attr_name] + # Remove any attributes that use unacceptable protocols. + if @protocols.include?(name) && @protocols[name].include?(attr_name) + attr_protocols = @protocols[name][attr_name] - if attr.value =~ REGEX_PROTOCOL - attr.unlink unless attr_protocols.include?($1.downcase) - else - attr.unlink unless attr_protocols.include?(:relative) + if attr.value =~ REGEX_PROTOCOL + unless attr_protocols.include?($1.downcase) + attr.unlink + next end - end - else - # The attribute isn't whitelisted. - if allow_data_attributes && attr_name.start_with?('data-') - # Arbitrary data attributes are allowed. Verify that the attribute - # is a valid data attribute. - attr.unlink unless attr_name =~ REGEX_DATA_ATTR else - # Either the attribute isn't a data attribute, or arbitrary data - # attributes aren't allowed. Remove the attribute. - attr.unlink + unless attr_protocols.include?(:relative) + attr.unlink + next + end end + + # Leading and trailing whitespace around URLs is ignored at parse + # time. Stripping it here prevents it from being escaped by the + # libxml2 workaround below. + attr.value = attr.value.strip + end + + # libxml2 >= 2.9.2 doesn't escape comments within some attributes, in an + # attempt to preserve server-side includes. This can result in XSS since + # an unescaped double quote can allow an attacker to inject a + # non-whitelisted attribute. + # + # Sanitize works around this by implementing its own escaping for + # affected attributes, some of which can exist on any element and some + # of which can only exist on `<a>` elements. + # + # The relevant libxml2 code is here: + # <https://github.com/GNOME/libxml2/commit/960f0e275616cadc29671a218d7fb9b69eb35588> + if UNSAFE_LIBXML_ATTRS_GLOBAL.include?(attr_name) || + (name == 'a' && UNSAFE_LIBXML_ATTRS_A.include?(attr_name)) + + attr.value = attr.value.gsub(UNSAFE_LIBXML_ESCAPE_REGEX, UNSAFE_LIBXML_ESCAPE_CHARS) end end end
test/test_clean_element.rb+11 −1 modified@@ -234,7 +234,7 @@ it 'should not choke on valueless attributes' do @s.fragment('foo <a href>foo</a> bar') - .must_equal 'foo <a href="" rel="nofollow">foo</a> bar' + .must_equal 'foo <a href rel="nofollow">foo</a> bar' end it 'should downcase attribute names' do @@ -300,6 +300,16 @@ }).must_equal input end + it "should not allow relative URLs when relative URLs aren't whitelisted" do + input = '<a href="/foo/bar">Link</a>' + + Sanitize.fragment(input, + :elements => ['a'], + :attributes => {'a' => ['href']}, + :protocols => {'a' => {'href' => ['http']}} + ).must_equal '<a>Link</a>' + end + it 'should allow relative URLs containing colons when the colon is not in the first path segment' do input = '<a href="/wiki/Special:Random">Random Page</a>'
test/test_malicious_html.rb+64 −0 modified@@ -125,4 +125,68 @@ must_equal '<alert("XSS");//<' end end + + # libxml2 >= 2.9.2 doesn't escape comments within some attributes, in an + # attempt to preserve server-side includes. This can result in XSS since an + # unescaped double quote can allow an attacker to inject a non-whitelisted + # attribute. Sanitize works around this by implementing its own escaping for + # affected attributes. + # + # The relevant libxml2 code is here: + # <https://github.com/GNOME/libxml2/commit/960f0e275616cadc29671a218d7fb9b69eb35588> + describe 'unsafe libxml2 server-side includes in attributes' do + tag_configs = [ + { + tag_name: 'a', + escaped_attrs: %w[ action href src name ], + unescaped_attrs: [] + }, + + { + tag_name: 'div', + escaped_attrs: %w[ action href src ], + unescaped_attrs: %w[ name ] + } + ] + + before do + @s = Sanitize.new({ + elements: %w[ a div ], + + attributes: { + all: %w[ action href src name ] + } + }) + end + + tag_configs.each do |tag_config| + tag_name = tag_config[:tag_name] + + tag_config[:escaped_attrs].each do |attr_name| + input = %[<#{tag_name} #{attr_name}='examp<!--" onmouseover=alert(1)>-->le.com'>foo</#{tag_name}>] + + it 'should escape unsafe characters in attributes' do + @s.fragment(input).must_equal(%[<#{tag_name} #{attr_name}="examp<!--%22%20onmouseover=alert(1)>-->le.com">foo</#{tag_name}>]) + end + + it 'should round-trip to the same output' do + output = @s.fragment(input) + @s.fragment(output).must_equal(output) + end + end + + tag_config[:unescaped_attrs].each do |attr_name| + input = %[<#{tag_name} #{attr_name}='examp<!--" onmouseover=alert(1)>-->le.com'>foo</#{tag_name}>] + + it 'should not escape characters unnecessarily' do + @s.fragment(input).must_equal(input) + end + + it 'should round-trip to the same output' do + output = @s.fragment(input) + @s.fragment(output).must_equal(output) + end + end + end + end end
93feeb38e218Various minor efficiency tweaks to improve perf on large documents.
4 files changed · +50 −43
lib/sanitize/transformers/clean_cdata.rb+1 −3 modified@@ -3,11 +3,9 @@ class Sanitize; module Transformers CleanCDATA = lambda do |env| - return if env[:is_whitelisted] - node = env[:node] - if node.cdata? + if node.type == Nokogiri::XML::Node::CDATA_SECTION_NODE node.replace(Nokogiri::XML::Text.new(node.text, node.document)) end end
lib/sanitize/transformers/clean_comment.rb+5 −2 modified@@ -3,8 +3,11 @@ class Sanitize; module Transformers CleanComment = lambda do |env| - return if env[:is_whitelisted] - env[:node].unlink if env[:node].comment? + node = env[:node] + + if node.type == Nokogiri::XML::Node::COMMENT_NODE + node.unlink unless env[:is_whitelisted] + end end end; end
lib/sanitize/transformers/clean_doctype.rb+5 −2 modified@@ -3,8 +3,11 @@ class Sanitize; module Transformers CleanDoctype = lambda do |env| - return if env[:is_whitelisted] - env[:node].unlink if env[:node].type == Nokogiri::XML::Node::DTD_NODE + node = env[:node] + + if node.type == Nokogiri::XML::Node::DTD_NODE + node.unlink unless env[:is_whitelisted] + end end end; end
lib/sanitize/transformers/clean_element.rb+39 −36 modified@@ -1,5 +1,7 @@ # encoding: utf-8 +require 'set' + class Sanitize; module Transformers; class CleanElement # Matches a valid HTML5 data attribute name. The unicode ranges included here @@ -24,21 +26,28 @@ class Sanitize; module Transformers; class CleanElement REGEX_PROTOCOL = /\A([^\/#]*?)(?:\:|�*58|�*3a)/i def initialize(config) - @config = config - - # For faster lookups. @add_attributes = config[:add_attributes] - @allowed_elements = Set.new(config[:elements]) - @attributes = config[:attributes] + @attributes = config[:attributes].dup + @elements = Set.new(config[:elements]) @protocols = config[:protocols] @remove_all_contents = false @remove_element_contents = Set.new - @whitespace_elements = Hash.new + @whitespace_elements = {} + + if @attributes.include?(:all) + @attributes[:all] = Set.new(@attributes[:all]) + end + + @attributes.each do |element_name, attrs| + unless element_name == :all + @attributes[element_name] = Set.new(attrs).merge(@attributes[:all] || []) + end + end - # Converting :whitespace_element into a Hash for backwards compatibility. + # Backcompat: if :whitespace_elements is an array, convert it to a hash. if config[:whitespace_elements].is_a?(Array) config[:whitespace_elements].each do |element| - @whitespace_elements[element] = { :before => ' ', :after => ' ' } + @whitespace_elements[element] = {:before => ' ', :after => ' '} end else @whitespace_elements = config[:whitespace_elements] @@ -55,10 +64,10 @@ def call(env) name = env[:node_name] node = env[:node] - return if env[:is_whitelisted] || !node.element? + return if node.type != Nokogiri::XML::Node::ELEMENT_NODE || env[:is_whitelisted] # Delete any element that isn't in the config whitelist. - unless @allowed_elements.include?(name) + unless @elements.include?(name) # Elements like br, div, p, etc. need to be replaced with whitespace in # order to preserve readability. if @whitespace_elements.include?(name) @@ -77,21 +86,33 @@ def call(env) return end - attr_whitelist = Set.new((@attributes[name] || []) + - (@attributes[:all] || [])) - - allow_data_attributes = attr_whitelist.include?(:data) + attr_whitelist = @attributes[name] || @attributes[:all] - if attr_whitelist.empty? + if attr_whitelist.nil? # Delete all attributes from elements with no whitelisted attributes. node.attribute_nodes.each {|attr| attr.unlink } else + allow_data_attributes = attr_whitelist.include?(:data) + # Delete any attribute that isn't allowed on this element. node.attribute_nodes.each do |attr| attr_name = attr.name.downcase - unless attr_whitelist.include?(attr_name) - # The attribute isn't explicitly whitelisted. + if attr_whitelist.include?(attr_name) + # The attribute is whitelisted. + + # Remove any attributes that use unacceptable protocols. + if @protocols.include?(name) && @protocols[name].include?(attr_name) + attr_protocols = @protocols[name][attr_name] + + if attr.value.to_s.downcase =~ REGEX_PROTOCOL + attr.unlink unless attr_protocols.include?($1.downcase) + else + attr.unlink unless attr_protocols.include?(:relative) + end + end + else + # The attribute isn't whitelisted. if allow_data_attributes && attr_name.start_with?('data-') # Arbitrary data attributes are allowed. Verify that the attribute @@ -104,28 +125,10 @@ def call(env) end end end - - # Delete remaining attributes that use unacceptable protocols. - if @protocols.has_key?(name) - protocol = @protocols[name] - - node.attribute_nodes.each do |attr| - attr_name = attr.name.downcase - next false unless protocol.has_key?(attr_name) - - del = if attr.value.to_s.downcase =~ REGEX_PROTOCOL - !protocol[attr_name].include?($1.downcase) - else - !protocol[attr_name].include?(:relative) - end - - attr.unlink if del - end - end end # Add required attributes. - if @add_attributes.has_key?(name) + if @add_attributes.include?(name) @add_attributes[name].each {|key, val| node[key] = val } end end
Vulnerability mechanics
Generated on May 9, 2026. Inputs: CWE entries + fix-commit diffs from this CVE's patches. Citations validated against bundle.
References
9- github.com/advisories/GHSA-7f42-p84j-f58pghsaADVISORY
- nvd.nist.gov/vuln/detail/CVE-2018-3740ghsaADVISORY
- www.debian.org/security/2018/dsa-4358ghsavendor-advisoryx_refsource_DEBIANWEB
- about.gitlab.com/2018/06/25/security-release-gitlab-11-dot-0-dot-1-releasedghsaWEB
- about.gitlab.com/2018/06/25/security-release-gitlab-11-dot-0-dot-1-released/mitrex_refsource_CONFIRM
- github.com/rgrove/sanitize/commit/01629a162e448a83d901456d0ba8b65f3b03d46eghsax_refsource_CONFIRMWEB
- github.com/rgrove/sanitize/commit/93feeb38e21864146bb29191792b971dbe1ec62eghsaWEB
- github.com/rgrove/sanitize/issues/176ghsax_refsource_CONFIRMWEB
- github.com/rubysec/ruby-advisory-db/blob/master/gems/sanitize/CVE-2018-3740.ymlghsaWEB
News mentions
0No linked articles in our index yet.