VYPR
High severityNVD Advisory· Published Mar 30, 2018· Updated Aug 5, 2024

CVE-2018-3740

CVE-2018-3740

Description

A specially crafted HTML fragment can cause Sanitize gem for Ruby to allow non-whitelisted attributes to be used on a whitelisted HTML element.

AI Insight

LLM-synthesized narrative grounded in this CVE's description and references.

Sanitize gem for Ruby allows non-whitelisted attributes on whitelisted HTML elements, potentially leading to XSS.

Vulnerability

Sanitize is an allowlist-based HTML and CSS sanitizer for Ruby [1]. CVE-2018-3740 describes a flaw where a specially crafted HTML fragment can cause Sanitize to allow non-whitelisted attributes on a whitelisted HTML element. This occurs due to improper attribute validation in the CleanElement transformer. All versions prior to the fix are affected.

Exploitation

An attacker can craft an HTML fragment containing non-whitelisted attributes on whitelisted elements. When this fragment is processed by Sanitize, the attribute allowlist is bypassed. No special privileges or network position are required; the attacker only needs to supply the crafted HTML input to an application using Sanitize.

Impact

Successful exploitation allows an attacker to inject arbitrary attributes into HTML output. This can lead to cross-site scripting (XSS) attacks if the injected attributes (e.g., onload, href with javascript:) are not properly filtered. The scope of the compromise depends on the context where the sanitized output is used.

Mitigation

The issue is fixed in Sanitize version 4.6.2 [4]. Users should update to this version or later. GitLab also released version 11.0.1 which includes the fix [2]. No workarounds are available; updating is the recommended mitigation.

AI Insight generated on May 22, 2026. Synthesized from this CVE's description and the cited reference URLs; citations are validated against the source bundle.

Affected packages

Versions sourced from the GitHub Security Advisory.

PackageAffected versionsPatched versions
sanitizeRubyGems
>= 3.0.0, < 4.6.34.6.3

Affected products

2
  • ghsa-coords
    Range: >= 3.0.0, < 4.6.3
  • Ryan Grove/sanitize (ruby gem)v5
    Range: < 4.6.3

Patches

2
01629a162e44

fix: Prevent code injection due to improper escaping in libxml2 >= 2.9.2

https://github.com/rgrove/sanitizeRyan GroveMar 20, 2018via ghsa
3 files changed · +149 20
  • lib/sanitize/transformers/clean_element.rb+74 19 modified
    @@ -18,6 +18,31 @@ class Sanitize; module Transformers; class CleanElement
       # http://www.whatwg.org/specs/web-apps/current-work/multipage/elements.html#embedding-custom-non-visible-data-with-the-data-*-attributes
       REGEX_DATA_ATTR = /\Adata-(?!xml)[a-z_][\w.\u00E0-\u00F6\u00F8-\u017F\u01DD-\u02AF-]*\z/u
     
    +  # Attributes that need additional escaping on `<a>` elements due to unsafe
    +  # libxml2 behavior.
    +  UNSAFE_LIBXML_ATTRS_A = Set.new(%w[
    +    name
    +  ])
    +
    +  # Attributes that need additional escaping on all elements due to unsafe
    +  # libxml2 behavior.
    +  UNSAFE_LIBXML_ATTRS_GLOBAL = Set.new(%w[
    +    action
    +    href
    +    src
    +  ])
    +
    +  # Mapping of original characters to escape sequences for characters that
    +  # should be escaped in attributes affected by unsafe libxml2 behavior.
    +  UNSAFE_LIBXML_ESCAPE_CHARS = {
    +    ' ' => '%20',
    +    '"' => '%22'
    +  }
    +
    +  # Regex that matches any single character that needs to be escaped in
    +  # attributes affected by unsafe libxml2 behavior.
    +  UNSAFE_LIBXML_ESCAPE_REGEX = /[ "]/
    +
       def initialize(config)
         @add_attributes          = config[:add_attributes]
         @attributes              = config[:attributes].dup
    @@ -92,31 +117,61 @@ def call(env)
           node.attribute_nodes.each do |attr|
             attr_name = attr.name.downcase
     
    -        if attr_whitelist.include?(attr_name)
    -          # The attribute is whitelisted.
    +        unless attr_whitelist.include?(attr_name)
    +          # The attribute isn't whitelisted.
    +
    +          if allow_data_attributes && attr_name.start_with?('data-')
    +            # Arbitrary data attributes are allowed. If this is a data
    +            # attribute, continue.
    +            next if attr_name =~ REGEX_DATA_ATTR
    +          end
    +
    +          # Either the attribute isn't a data attribute or arbitrary data
    +          # attributes aren't allowed. Remove the attribute.
    +          attr.unlink
    +          next
    +        end
    +
    +        # The attribute is whitelisted.
     
    -          # Remove any attributes that use unacceptable protocols.
    -          if @protocols.include?(name) && @protocols[name].include?(attr_name)
    -            attr_protocols = @protocols[name][attr_name]
    +        # Remove any attributes that use unacceptable protocols.
    +        if @protocols.include?(name) && @protocols[name].include?(attr_name)
    +          attr_protocols = @protocols[name][attr_name]
     
    -            if attr.value =~ REGEX_PROTOCOL
    -              attr.unlink unless attr_protocols.include?($1.downcase)
    -            else
    -              attr.unlink unless attr_protocols.include?(:relative)
    +          if attr.value =~ REGEX_PROTOCOL
    +            unless attr_protocols.include?($1.downcase)
    +              attr.unlink
    +              next
                 end
    -          end
    -        else
    -          # The attribute isn't whitelisted.
     
    -          if allow_data_attributes && attr_name.start_with?('data-')
    -            # Arbitrary data attributes are allowed. Verify that the attribute
    -            # is a valid data attribute.
    -            attr.unlink unless attr_name =~ REGEX_DATA_ATTR
               else
    -            # Either the attribute isn't a data attribute, or arbitrary data
    -            # attributes aren't allowed. Remove the attribute.
    -            attr.unlink
    +            unless attr_protocols.include?(:relative)
    +              attr.unlink
    +              next
    +            end
               end
    +
    +          # Leading and trailing whitespace around URLs is ignored at parse
    +          # time. Stripping it here prevents it from being escaped by the
    +          # libxml2 workaround below.
    +          attr.value = attr.value.strip
    +        end
    +
    +        # libxml2 >= 2.9.2 doesn't escape comments within some attributes, in an
    +        # attempt to preserve server-side includes. This can result in XSS since
    +        # an unescaped double quote can allow an attacker to inject a
    +        # non-whitelisted attribute.
    +        #
    +        # Sanitize works around this by implementing its own escaping for
    +        # affected attributes, some of which can exist on any element and some
    +        # of which can only exist on `<a>` elements.
    +        #
    +        # The relevant libxml2 code is here:
    +        # <https://github.com/GNOME/libxml2/commit/960f0e275616cadc29671a218d7fb9b69eb35588>
    +        if UNSAFE_LIBXML_ATTRS_GLOBAL.include?(attr_name) ||
    +            (name == 'a' && UNSAFE_LIBXML_ATTRS_A.include?(attr_name))
    +
    +          attr.value = attr.value.gsub(UNSAFE_LIBXML_ESCAPE_REGEX, UNSAFE_LIBXML_ESCAPE_CHARS)
             end
           end
         end
    
  • test/test_clean_element.rb+11 1 modified
    @@ -234,7 +234,7 @@
     
         it 'should not choke on valueless attributes' do
           @s.fragment('foo <a href>foo</a> bar')
    -        .must_equal 'foo <a href="" rel="nofollow">foo</a> bar'
    +        .must_equal 'foo <a href rel="nofollow">foo</a> bar'
         end
     
         it 'should downcase attribute names' do
    @@ -300,6 +300,16 @@
           }).must_equal input
         end
     
    +    it "should not allow relative URLs when relative URLs aren't whitelisted" do
    +      input = '<a href="/foo/bar">Link</a>'
    +
    +      Sanitize.fragment(input,
    +        :elements   => ['a'],
    +        :attributes => {'a' => ['href']},
    +        :protocols  => {'a' => {'href' => ['http']}}
    +      ).must_equal '<a>Link</a>'
    +    end
    +
         it 'should allow relative URLs containing colons when the colon is not in the first path segment' do
           input = '<a href="/wiki/Special:Random">Random Page</a>'
     
    
  • test/test_malicious_html.rb+64 0 modified
    @@ -125,4 +125,68 @@
             must_equal '&lt;alert("XSS");//&lt;'
         end
       end
    +
    +  # libxml2 >= 2.9.2 doesn't escape comments within some attributes, in an
    +  # attempt to preserve server-side includes. This can result in XSS since an
    +  # unescaped double quote can allow an attacker to inject a non-whitelisted
    +  # attribute. Sanitize works around this by implementing its own escaping for
    +  # affected attributes.
    +  #
    +  # The relevant libxml2 code is here:
    +  # <https://github.com/GNOME/libxml2/commit/960f0e275616cadc29671a218d7fb9b69eb35588>
    +  describe 'unsafe libxml2 server-side includes in attributes' do
    +    tag_configs = [
    +      {
    +        tag_name: 'a',
    +        escaped_attrs: %w[ action href src name ],
    +        unescaped_attrs: []
    +      },
    +
    +      {
    +        tag_name: 'div',
    +        escaped_attrs: %w[ action href src ],
    +        unescaped_attrs: %w[ name ]
    +      }
    +    ]
    +
    +    before do
    +      @s = Sanitize.new({
    +        elements: %w[ a div ],
    +
    +        attributes: {
    +          all: %w[ action href src name ]
    +        }
    +      })
    +    end
    +
    +    tag_configs.each do |tag_config|
    +      tag_name = tag_config[:tag_name]
    +
    +      tag_config[:escaped_attrs].each do |attr_name|
    +        input = %[<#{tag_name} #{attr_name}='examp<!--" onmouseover=alert(1)>-->le.com'>foo</#{tag_name}>]
    +
    +        it 'should escape unsafe characters in attributes' do
    +          @s.fragment(input).must_equal(%[<#{tag_name} #{attr_name}="examp<!--%22%20onmouseover=alert(1)>-->le.com">foo</#{tag_name}>])
    +        end
    +
    +        it 'should round-trip to the same output' do
    +          output = @s.fragment(input)
    +          @s.fragment(output).must_equal(output)
    +        end
    +      end
    +
    +      tag_config[:unescaped_attrs].each do |attr_name|
    +        input = %[<#{tag_name} #{attr_name}='examp<!--" onmouseover=alert(1)>-->le.com'>foo</#{tag_name}>]
    +
    +        it 'should not escape characters unnecessarily' do
    +          @s.fragment(input).must_equal(input)
    +        end
    +
    +        it 'should round-trip to the same output' do
    +          output = @s.fragment(input)
    +          @s.fragment(output).must_equal(output)
    +        end
    +      end
    +    end
    +  end
     end
    
93feeb38e218

Various minor efficiency tweaks to improve perf on large documents.

https://github.com/rgrove/sanitizeRyan GroveMay 19, 2014via ghsa
4 files changed · +50 43
  • lib/sanitize/transformers/clean_cdata.rb+1 3 modified
    @@ -3,11 +3,9 @@
     class Sanitize; module Transformers
     
       CleanCDATA = lambda do |env|
    -    return if env[:is_whitelisted]
    -
         node = env[:node]
     
    -    if node.cdata?
    +    if node.type == Nokogiri::XML::Node::CDATA_SECTION_NODE
           node.replace(Nokogiri::XML::Text.new(node.text, node.document))
         end
       end
    
  • lib/sanitize/transformers/clean_comment.rb+5 2 modified
    @@ -3,8 +3,11 @@
     class Sanitize; module Transformers
     
       CleanComment = lambda do |env|
    -    return if env[:is_whitelisted]
    -    env[:node].unlink if env[:node].comment?
    +    node = env[:node]
    +
    +    if node.type == Nokogiri::XML::Node::COMMENT_NODE
    +        node.unlink unless env[:is_whitelisted]
    +    end
       end
     
     end; end
    
  • lib/sanitize/transformers/clean_doctype.rb+5 2 modified
    @@ -3,8 +3,11 @@
     class Sanitize; module Transformers
     
       CleanDoctype = lambda do |env|
    -    return if env[:is_whitelisted]
    -    env[:node].unlink if env[:node].type == Nokogiri::XML::Node::DTD_NODE
    +    node = env[:node]
    +
    +    if node.type == Nokogiri::XML::Node::DTD_NODE
    +        node.unlink unless env[:is_whitelisted]
    +    end
       end
     
     end; end
    
  • lib/sanitize/transformers/clean_element.rb+39 36 modified
    @@ -1,5 +1,7 @@
     # encoding: utf-8
     
    +require 'set'
    +
     class Sanitize; module Transformers; class CleanElement
     
       # Matches a valid HTML5 data attribute name. The unicode ranges included here
    @@ -24,21 +26,28 @@ class Sanitize; module Transformers; class CleanElement
       REGEX_PROTOCOL = /\A([^\/#]*?)(?:\:|&#0*58|&#x0*3a)/i
     
       def initialize(config)
    -    @config = config
    -
    -    # For faster lookups.
         @add_attributes          = config[:add_attributes]
    -    @allowed_elements        = Set.new(config[:elements])
    -    @attributes              = config[:attributes]
    +    @attributes              = config[:attributes].dup
    +    @elements                = Set.new(config[:elements])
         @protocols               = config[:protocols]
         @remove_all_contents     = false
         @remove_element_contents = Set.new
    -    @whitespace_elements     = Hash.new
    +    @whitespace_elements     = {}
    +
    +    if @attributes.include?(:all)
    +      @attributes[:all] = Set.new(@attributes[:all])
    +    end
    +
    +    @attributes.each do |element_name, attrs|
    +      unless element_name == :all
    +        @attributes[element_name] = Set.new(attrs).merge(@attributes[:all] || [])
    +      end
    +    end
     
    -    # Converting :whitespace_element into a Hash for backwards compatibility.
    +    # Backcompat: if :whitespace_elements is an array, convert it to a hash.
         if config[:whitespace_elements].is_a?(Array)
           config[:whitespace_elements].each do |element|
    -        @whitespace_elements[element] = { :before => ' ', :after => ' ' }
    +        @whitespace_elements[element] = {:before => ' ', :after => ' '}
           end
         else
           @whitespace_elements = config[:whitespace_elements]
    @@ -55,10 +64,10 @@ def call(env)
         name = env[:node_name]
         node = env[:node]
     
    -    return if env[:is_whitelisted] || !node.element?
    +    return if node.type != Nokogiri::XML::Node::ELEMENT_NODE || env[:is_whitelisted]
     
         # Delete any element that isn't in the config whitelist.
    -    unless @allowed_elements.include?(name)
    +    unless @elements.include?(name)
           # Elements like br, div, p, etc. need to be replaced with whitespace in
           # order to preserve readability.
           if @whitespace_elements.include?(name)
    @@ -77,21 +86,33 @@ def call(env)
           return
         end
     
    -    attr_whitelist = Set.new((@attributes[name] || []) +
    -        (@attributes[:all] || []))
    -
    -    allow_data_attributes = attr_whitelist.include?(:data)
    +    attr_whitelist = @attributes[name] || @attributes[:all]
     
    -    if attr_whitelist.empty?
    +    if attr_whitelist.nil?
           # Delete all attributes from elements with no whitelisted attributes.
           node.attribute_nodes.each {|attr| attr.unlink }
         else
    +      allow_data_attributes = attr_whitelist.include?(:data)
    +
           # Delete any attribute that isn't allowed on this element.
           node.attribute_nodes.each do |attr|
             attr_name = attr.name.downcase
     
    -        unless attr_whitelist.include?(attr_name)
    -          # The attribute isn't explicitly whitelisted.
    +        if attr_whitelist.include?(attr_name)
    +          # The attribute is whitelisted.
    +
    +          # Remove any attributes that use unacceptable protocols.
    +          if @protocols.include?(name) && @protocols[name].include?(attr_name)
    +            attr_protocols = @protocols[name][attr_name]
    +
    +            if attr.value.to_s.downcase =~ REGEX_PROTOCOL
    +              attr.unlink unless attr_protocols.include?($1.downcase)
    +            else
    +              attr.unlink unless attr_protocols.include?(:relative)
    +            end
    +          end
    +        else
    +          # The attribute isn't whitelisted.
     
               if allow_data_attributes && attr_name.start_with?('data-')
                 # Arbitrary data attributes are allowed. Verify that the attribute
    @@ -104,28 +125,10 @@ def call(env)
               end
             end
           end
    -
    -      # Delete remaining attributes that use unacceptable protocols.
    -      if @protocols.has_key?(name)
    -        protocol = @protocols[name]
    -
    -        node.attribute_nodes.each do |attr|
    -          attr_name = attr.name.downcase
    -          next false unless protocol.has_key?(attr_name)
    -
    -          del = if attr.value.to_s.downcase =~ REGEX_PROTOCOL
    -            !protocol[attr_name].include?($1.downcase)
    -          else
    -            !protocol[attr_name].include?(:relative)
    -          end
    -
    -          attr.unlink if del
    -        end
    -      end
         end
     
         # Add required attributes.
    -    if @add_attributes.has_key?(name)
    +    if @add_attributes.include?(name)
           @add_attributes[name].each {|key, val| node[key] = val }
         end
       end
    

Vulnerability mechanics

Generated on May 9, 2026. Inputs: CWE entries + fix-commit diffs from this CVE's patches. Citations validated against bundle.

References

9

News mentions

0

No linked articles in our index yet.