VYPR
High severityNVD Advisory· Published Apr 21, 2021· Updated Aug 3, 2024

CVE-2021-28965

CVE-2021-28965

Description

The REXML gem before 3.2.5 in Ruby before 2.6.7, 2.7.x before 2.7.3, and 3.x before 3.0.1 does not properly address XML round-trip issues. An incorrect document can be produced after parsing and serializing.

AI Insight

LLM-synthesized narrative grounded in this CVE's description and references.

REXML in Ruby incorrectly handles XML round-trips, allowing crafted XML to be parsed and serialized into an invalid document.

Vulnerability

The REXML gem in Ruby (versions before 3.2.5 on 2.6.x, before 2.7.3 on 2.7.x, and before 3.0.1 on 3.x) contains a vulnerability in its XML parser that does not properly address XML round-trip issues. This allows an attacker to craft a malformed XML document that, after parsing and re-serialization, produces an invalid document that may not comply with the XML specification. The issue specifically affects the handling of DOCTYPE and notation declarations, where the parser would accept certain invalid constructs (e.g., missing names or invalid ID types) and generate incorrect output [1][2][3].

Exploitation

An attacker can exploit this vulnerability by providing a specially crafted XML document to an application that uses the vulnerable REXML parser to parse and then re-serialize XML. The attacker does not require authentication or any special privileges; the only requirement is that the application processes the malicious XML input. The attack vector is typically through user-supplied XML data, such as file uploads, SOAP messages, or XML-based configuration files. The attacker submits a malformed XML with invalid DOCTYPE or notation syntax, and when the application re-serializes the parsed document, a structurally invalid XML is produced [1][2][3].

Impact

Successful exploitation leads to the production of an invalid XML document after parsing and serialization (a "round-trip" issue). This can cause data integrity problems downstream, as downstream consumers of the generated XML may reject or misinterpret the data. The core impact is on data integrity and availability, as applications relying on valid XML serialization may fail or behave unexpectedly. The vulnerability does not directly lead to remote code execution or privilege escalation, but it can disrupt XML processing in security-sensitive contexts where correct XML structure is assumed [4].

Mitigation

Fixed versions: REXML gem 3.2.5, Ruby 2.6.7, 2.7.3, and 3.0.1. Users should upgrade to these versions or later. For Ruby installations, updating to the latest patch release is recommended. If immediate upgrade is not possible, avoid processing untrusted XML with the vulnerable REXML parser, or sanitize XML input to strip DOCTYPE declarations before parsing. No known workaround exists that fully mitigates the flaw without patching. The vulnerability was fixed in April 2021 [1][4].

AI Insight generated on May 21, 2026. Synthesized from this CVE's description and the cited reference URLs; citations are validated against the source bundle.

Affected packages

Versions sourced from the GitHub Security Advisory.

PackageAffected versionsPatched versions
rexmlRubyGems
< 3.2.53.2.5

Affected products

16

Patches

7
3c137eb11955

Fix a parser bug that some data may be ignored before DOCTYPE

https://github.com/ruby/rexmlSutou KouheiFeb 27, 2021via ghsa
3 files changed · +27 8
  • lib/rexml/parsers/baseparser.rb+8 7 modified
    @@ -195,11 +195,9 @@ def pull_event
             return [ :end_document ] if empty?
             return @stack.shift if @stack.size > 0
             #STDERR.puts @source.encoding
    -        @source.read if @source.buffer.size<2
             #STDERR.puts "BUFFER = #{@source.buffer.inspect}"
             if @document_status == nil
    -          #@source.consume( /^\s*/um )
    -          word = @source.match( /^((?:\s+)|(?:<[^>]*>))/um )
    +          word = @source.match( /\A((?:\s+)|(?:<[^>]*>))/um )
               word = word[1] unless word.nil?
               #STDERR.puts "WORD = #{word.inspect}"
               case word
    @@ -257,18 +255,16 @@ def pull_event
                   @stack << [ :end_doctype ]
                 end
                 return args
    -          when /^\s+/
    +          when /\A\s+/
               else
                 @document_status = :after_doctype
    -            @source.read if @source.buffer.size<2
    -            md = @source.match(/\s*/um, true)
                 if @source.encoding == "UTF-8"
                   @source.buffer.force_encoding(::Encoding::UTF_8)
                 end
               end
             end
             if @document_status == :in_doctype
    -          md = @source.match(/\s*(.*?>)/um)
    +          md = @source.match(/\A\s*(.*?>)/um)
               case md[1]
               when SYSTEMENTITY
                 match = @source.match( SYSTEMENTITY, true )[1]
    @@ -349,7 +345,11 @@ def pull_event
                 return [ :end_doctype ]
               end
             end
    +        if @document_status == :after_doctype
    +          @source.match(/\A\s*/um, true)
    +        end
             begin
    +          @source.read if @source.buffer.size<2
               if @source.buffer[0] == ?<
                 if @source.buffer[1] == ?/
                   @nsstack.shift
    @@ -392,6 +392,7 @@ def pull_event
                   unless md
                     raise REXML::ParseException.new("malformed XML: missing tag start", @source)
                   end
    +              @document_status = :in_element
                   prefixes = Set.new
                   prefixes << md[2] if md[2]
                   @nsstack.unshift(curr_ns=Set.new)
    
  • test/parser/test_ultra_light.rb+0 1 modified
    @@ -16,7 +16,6 @@ def test_entity_declaration
                            nil,
                            [:entitydecl, "name", "value"]
                          ],
    -                     [:text, "\n"],
                          [:start_element, :parent, "root", {}],
                          [:text, "\n"],
                        ],
    
  • test/parse/test_processing_instruction.rb+19 0 modified
    @@ -20,6 +20,25 @@ def test_no_name
     <??>
             DETAIL
           end
    +
    +      def test_garbage_text
    +        # TODO: This should be parse error.
    +        # Create test/parse/test_document.rb or something and move this to it.
    +        doc = parse(<<-XML)
    +x<?x y
    +<!--?><?x -->?>
    +<r/>
    +        XML
    +        pi = doc.children[1]
    +        assert_equal([
    +                       "x",
    +                       "y\n<!--",
    +                     ],
    +                     [
    +                       pi.target,
    +                       pi.content,
    +                     ])
    +      end
         end
       end
     end
    
9b311e59ae05

Fix a bug that invalid document declaration may be accepted

https://github.com/ruby/rexmlSutou KouheiFeb 23, 2021via ghsa
3 files changed · +326 95
  • lib/rexml/parsers/baseparser.rb+126 74 modified
    @@ -50,7 +50,6 @@ class BaseParser
     
           DOCTYPE_START = /\A\s*<!DOCTYPE\s/um
           DOCTYPE_END = /\A\s*\]\s*>/um
    -      DOCTYPE_PATTERN = /\s*<!DOCTYPE\s+(.*?)(\[|>)/um
           ATTRIBUTE_PATTERN = /\s*(#{QNAME_STR})\s*=\s*(["'])(.*?)\4/um
           COMMENT_START = /\A<!--/u
           COMMENT_PATTERN = /<!--(.*?)-->/um
    @@ -69,7 +68,6 @@ class BaseParser
           STANDALONE = /\bstandalone\s*=\s*["'](.*?)['"]/um
     
           ENTITY_START = /\A\s*<!ENTITY/
    -      IDENTITY = /^([!\*\w\-]+)(\s+#{NCNAME_STR})?(\s+["'](.*?)['"])?(\s+['"](.*?)["'])?/u
           ELEMENTDECL_START = /\A\s*<!ELEMENT/um
           ELEMENTDECL_PATTERN = /\A\s*(<!ELEMENT.*?)>/um
           SYSTEMENTITY = /\A\s*(%.*?;)\s*$/um
    @@ -101,8 +99,9 @@ class BaseParser
           ENTITYDECL = /\s*(?:#{GEDECL})|(?:#{PEDECL})/um
     
           NOTATIONDECL_START = /\A\s*<!NOTATION/um
    -      PUBLIC = /\A\s*<!NOTATION\s+#{NAME}\s+(PUBLIC)\s+#{PUBIDLITERAL}(?:\s+#{SYSTEMLITERAL})?\s*>/um
    -      SYSTEM = /\A\s*<!NOTATION\s+#{NAME}\s+(SYSTEM)\s+#{SYSTEMLITERAL}\s*>/um
    +      EXTERNAL_ID_PUBLIC = /\A\s*PUBLIC\s+#{PUBIDLITERAL}\s+#{SYSTEMLITERAL}\s*/um
    +      EXTERNAL_ID_SYSTEM = /\A\s*SYSTEM\s+#{SYSTEMLITERAL}\s*/um
    +      PUBLIC_ID = /\A\s*PUBLIC\s+#{PUBIDLITERAL}\s*/um
     
           EREFERENCE = /&(?!#{NAME};)/
     
    @@ -225,24 +224,37 @@ def pull_event
               when INSTRUCTION_START
                 return process_instruction
               when DOCTYPE_START
    -            md = @source.match( DOCTYPE_PATTERN, true )
    +            base_error_message = "Malformed DOCTYPE"
    +            @source.match(DOCTYPE_START, true)
                 @nsstack.unshift(curr_ns=Set.new)
    -            identity = md[1]
    -            close = md[2]
    -            identity =~ IDENTITY
    -            name = $1
    -            raise REXML::ParseException.new("DOCTYPE is missing a name") if name.nil?
    -            pub_sys = $2.nil? ? nil : $2.strip
    -            long_name = $4.nil? ? nil : $4.strip
    -            uri = $6.nil? ? nil : $6.strip
    -            args = [ :start_doctype, name, pub_sys, long_name, uri ]
    -            if close == ">"
    +            name = parse_name(base_error_message)
    +            if @source.match(/\A\s*\[/um, true)
    +              id = [nil, nil, nil]
    +              @document_status = :in_doctype
    +            elsif @source.match(/\A\s*>/um, true)
    +              id = [nil, nil, nil]
                   @document_status = :after_doctype
    -              @source.read if @source.buffer.size<2
    -              md = @source.match(/^\s*/um, true)
    -              @stack << [ :end_doctype ]
                 else
    -              @document_status = :in_doctype
    +              id = parse_id(base_error_message,
    +                            accept_external_id: true,
    +                            accept_public_id: false)
    +              if id[0] == "SYSTEM"
    +                # For backward compatibility
    +                id[1], id[2] = id[2], nil
    +              end
    +              if @source.match(/\A\s*\[/um, true)
    +                @document_status = :in_doctype
    +              elsif @source.match(/\A\s*>/um, true)
    +                @document_status = :after_doctype
    +              else
    +                message = "#{base_error_message}: garbage after external ID"
    +                raise REXML::ParseException.new(message, @source)
    +              end
    +            end
    +            args = [:start_doctype, name, *id]
    +            if @document_status == :after_doctype
    +              @source.match(/\A\s*/um, true)
    +              @stack << [ :end_doctype ]
                 end
                 return args
               when /^\s+/
    @@ -313,27 +325,24 @@ def pull_event
                 end
                 return [ :attlistdecl, element, pairs, contents ]
               when NOTATIONDECL_START
    -            md = nil
    -            if @source.match( PUBLIC )
    -              md = @source.match( PUBLIC, true )
    -              pubid = system = nil
    -              pubid_literal = md[3]
    -              pubid = pubid_literal[1..-2] if pubid_literal # Remove quote
    -              system_literal = md[4]
    -              system = system_literal[1..-2] if system_literal # Remove quote
    -              vals = [md[1], md[2], pubid, system]
    -            elsif @source.match( SYSTEM )
    -              md = @source.match( SYSTEM, true )
    -              system = nil
    -              system_literal = md[3]
    -              system = system_literal[1..-2] if system_literal # Remove quote
    -              vals = [md[1], md[2], nil, system]
    -            else
    -              details = notation_decl_invalid_details
    -              message = "Malformed notation declaration: #{details}"
    +            base_error_message = "Malformed notation declaration"
    +            unless @source.match(/\A\s*<!NOTATION\s+/um, true)
    +              if @source.match(/\A\s*<!NOTATION\s*>/um)
    +                message = "#{base_error_message}: name is missing"
    +              else
    +                message = "#{base_error_message}: invalid declaration name"
    +              end
    +              raise REXML::ParseException.new(message, @source)
    +            end
    +            name = parse_name(base_error_message)
    +            id = parse_id(base_error_message,
    +                          accept_external_id: true,
    +                          accept_public_id: true)
    +            unless @source.match(/\A\s*>/um, true)
    +              message = "#{base_error_message}: garbage before end >"
                   raise REXML::ParseException.new(message, @source)
                 end
    -            return [ :notationdecl, *vals ]
    +            return [:notationdecl, name, *id]
               when DOCTYPE_END
                 @document_status = :after_doctype
                 @source.match( DOCTYPE_END, true )
    @@ -488,6 +497,85 @@ def need_source_encoding_update?(xml_declaration_encoding)
             true
           end
     
    +      def parse_name(base_error_message)
    +        md = @source.match(/\A\s*#{NAME}/um, true)
    +        unless md
    +          if @source.match(/\A\s*\S/um)
    +            message = "#{base_error_message}: invalid name"
    +          else
    +            message = "#{base_error_message}: name is missing"
    +          end
    +          raise REXML::ParseException.new(message, @source)
    +        end
    +        md[1]
    +      end
    +
    +      def parse_id(base_error_message,
    +                   accept_external_id:,
    +                   accept_public_id:)
    +        if accept_external_id and (md = @source.match(EXTERNAL_ID_PUBLIC, true))
    +          pubid = system = nil
    +          pubid_literal = md[1]
    +          pubid = pubid_literal[1..-2] if pubid_literal # Remove quote
    +          system_literal = md[2]
    +          system = system_literal[1..-2] if system_literal # Remove quote
    +          ["PUBLIC", pubid, system]
    +        elsif accept_public_id and (md = @source.match(PUBLIC_ID, true))
    +          pubid = system = nil
    +          pubid_literal = md[1]
    +          pubid = pubid_literal[1..-2] if pubid_literal # Remove quote
    +          ["PUBLIC", pubid, nil]
    +        elsif accept_external_id and (md = @source.match(EXTERNAL_ID_SYSTEM, true))
    +          system = nil
    +          system_literal = md[1]
    +          system = system_literal[1..-2] if system_literal # Remove quote
    +          ["SYSTEM", nil, system]
    +        else
    +          details = parse_id_invalid_details(accept_external_id: accept_external_id,
    +                                             accept_public_id: accept_public_id)
    +          message = "#{base_error_message}: #{details}"
    +          raise REXML::ParseException.new(message, @source)
    +        end
    +      end
    +
    +      def parse_id_invalid_details(accept_external_id:,
    +                                   accept_public_id:)
    +        public = /\A\s*PUBLIC/um
    +        system = /\A\s*SYSTEM/um
    +        if (accept_external_id or accept_public_id) and @source.match(/#{public}/um)
    +          if @source.match(/#{public}(?:\s+[^'"]|\s*[\[>])/um)
    +            return "public ID literal is missing"
    +          end
    +          unless @source.match(/#{public}\s+#{PUBIDLITERAL}/um)
    +            return "invalid public ID literal"
    +          end
    +          if accept_public_id
    +            if @source.match(/#{public}\s+#{PUBIDLITERAL}\s+[^'"]/um)
    +              return "system ID literal is missing"
    +            end
    +            unless @source.match(/#{public}\s+#{PUBIDLITERAL}\s+#{SYSTEMLITERAL}/um)
    +              return "invalid system literal"
    +            end
    +            "garbage after system literal"
    +          else
    +            "garbage after public ID literal"
    +          end
    +        elsif accept_external_id and @source.match(/#{system}/um)
    +          if @source.match(/#{system}(?:\s+[^'"]|\s*[\[>])/um)
    +            return "system literal is missing"
    +          end
    +          unless @source.match(/#{system}\s+#{SYSTEMLITERAL}/um)
    +            return "invalid system literal"
    +          end
    +          "garbage after system literal"
    +        else
    +          unless @source.match(/\A\s*(?:PUBLIC|SYSTEM)\s/um)
    +            return "invalid ID type"
    +          end
    +          "ID type is missing"
    +        end
    +      end
    +
           def process_instruction
             match_data = @source.match(INSTRUCTION_PATTERN, true)
             unless match_data
    @@ -580,42 +668,6 @@ def parse_attributes(prefixes, curr_ns)
             end
             return attributes, closed
           end
    -
    -      def notation_decl_invalid_details
    -        name = /#{NOTATIONDECL_START}\s+#{NAME}/um
    -        public = /#{name}\s+PUBLIC/um
    -        system = /#{name}\s+SYSTEM/um
    -        if @source.match(/#{NOTATIONDECL_START}\s*>/um)
    -          return "name is missing"
    -        elsif not @source.match(/#{name}[\s>]/um)
    -          return "invalid name"
    -        elsif @source.match(/#{name}\s*>/um)
    -          return "ID type is missing"
    -        elsif not @source.match(/#{name}\s+(?:PUBLIC|SYSTEM)[\s>]/um)
    -          return "invalid ID type"
    -        elsif @source.match(/#{public}/um)
    -          if @source.match(/#{public}\s*>/um)
    -            return "public ID literal is missing"
    -          elsif not @source.match(/#{public}\s+#{PUBIDLITERAL}/um)
    -            return "invalid public ID literal"
    -          elsif @source.match(/#{public}\s+#{PUBIDLITERAL}[^\s>]/um)
    -            return "garbage after public ID literal"
    -          elsif not @source.match(/#{public}\s+#{PUBIDLITERAL}\s+#{SYSTEMLITERAL}/um)
    -            return "invalid system literal"
    -          elsif not @source.match(/#{public}\s+#{PUBIDLITERAL}\s+#{SYSTEMLITERAL}\s*>/um)
    -            return "garbage after system literal"
    -          end
    -        elsif @source.match(/#{system}/um)
    -          if @source.match(/#{system}\s*>/um)
    -            return "system literal is missing"
    -          elsif not @source.match(/#{system}\s+#{SYSTEMLITERAL}/um)
    -            return "invalid system literal"
    -          elsif not @source.match(/#{system}\s+#{SYSTEMLITERAL}\s*>/um)
    -            return "garbage after system literal"
    -          end
    -        end
    -        "end > is missing"
    -      end
         end
       end
     end
    
  • test/parse/test_document_type_declaration.rb+186 7 modified
    @@ -5,17 +5,187 @@
     module REXMLTests
       class TestParseDocumentTypeDeclaration < Test::Unit::TestCase
         private
    -    def xml(internal_subset)
    -      <<-XML
    -<!DOCTYPE r SYSTEM "urn:x-rexml:test" [
    -#{internal_subset}
    -]>
    +    def parse(doctype)
    +      REXML::Document.new(<<-XML).doctype
    +#{doctype}
     <r/>
           XML
         end
     
    -    def parse(internal_subset)
    -      REXML::Document.new(xml(internal_subset)).doctype
    +    class TestName < self
    +      def test_valid
    +        doctype = parse(<<-DOCTYPE)
    +<!DOCTYPE r>
    +        DOCTYPE
    +        assert_equal("r", doctype.name)
    +      end
    +
    +      def test_garbage_plus_before_name_at_line_start
    +        exception = assert_raise(REXML::ParseException) do
    +          parse(<<-DOCTYPE)
    +<!DOCTYPE +
    +r SYSTEM "urn:x-rexml:test" [
    +]>
    +          DOCTYPE
    +        end
    +        assert_equal(<<-DETAIL.chomp, exception.to_s)
    +Malformed DOCTYPE: invalid name
    +Line: 5
    +Position: 51
    +Last 80 unconsumed characters:
    ++ r SYSTEM "urn:x-rexml:test" [ ]>  <r/> 
    +        DETAIL
    +      end
    +    end
    +
    +    class TestExternalID < self
    +      class TestSystem < self
    +        def test_left_bracket_in_system_literal
    +          doctype = parse(<<-DOCTYPE)
    +<!DOCTYPE r SYSTEM "urn:x-rexml:[test" [
    +]>
    +          DOCTYPE
    +          assert_equal([
    +                         "r",
    +                         "SYSTEM",
    +                         nil,
    +                         "urn:x-rexml:[test",
    +                       ],
    +                       [
    +                         doctype.name,
    +                         doctype.external_id,
    +                         doctype.public,
    +                         doctype.system,
    +                       ])
    +        end
    +
    +        def test_greater_than_in_system_literal
    +          doctype = parse(<<-DOCTYPE)
    +<!DOCTYPE r SYSTEM "urn:x-rexml:>test" [
    +]>
    +          DOCTYPE
    +          assert_equal([
    +                         "r",
    +                         "SYSTEM",
    +                         nil,
    +                         "urn:x-rexml:>test",
    +                       ],
    +                       [
    +                         doctype.name,
    +                         doctype.external_id,
    +                         doctype.public,
    +                         doctype.system,
    +                       ])
    +        end
    +
    +        def test_no_literal
    +          exception = assert_raise(REXML::ParseException) do
    +            parse(<<-DOCTYPE)
    +<!DOCTYPE r SYSTEM>
    +            DOCTYPE
    +          end
    +          assert_equal(<<-DETAIL.chomp, exception.to_s)
    +Malformed DOCTYPE: system literal is missing
    +Line: 3
    +Position: 26
    +Last 80 unconsumed characters:
    + SYSTEM>  <r/> 
    +          DETAIL
    +        end
    +
    +        def test_garbage_after_literal
    +          exception = assert_raise(REXML::ParseException) do
    +            parse(<<-DOCTYPE)
    +<!DOCTYPE r SYSTEM 'r.dtd'x'>
    +            DOCTYPE
    +          end
    +          assert_equal(<<-DETAIL.chomp, exception.to_s)
    +Malformed DOCTYPE: garbage after external ID
    +Line: 3
    +Position: 36
    +Last 80 unconsumed characters:
    +x'>  <r/> 
    +          DETAIL
    +        end
    +
    +        def test_single_quote
    +          doctype = parse(<<-DOCTYPE)
    +<!DOCTYPE r SYSTEM 'r".dtd'>
    +          DOCTYPE
    +          assert_equal("r\".dtd", doctype.system)
    +        end
    +
    +        def test_double_quote
    +          doctype = parse(<<-DOCTYPE)
    +<!DOCTYPE r SYSTEM "r'.dtd">
    +          DOCTYPE
    +          assert_equal("r'.dtd", doctype.system)
    +        end
    +      end
    +
    +      class TestPublic < self
    +        class TestPublicIDLiteral < self
    +          def test_content_double_quote
    +            exception = assert_raise(REXML::ParseException) do
    +              parse(<<-DOCTYPE)
    +<!DOCTYPE r PUBLIC 'double quote " is invalid' "r.dtd">
    +              DOCTYPE
    +            end
    +            assert_equal(<<-DETAIL.chomp, exception.to_s)
    +Malformed DOCTYPE: invalid public ID literal
    +Line: 3
    +Position: 62
    +Last 80 unconsumed characters:
    + PUBLIC 'double quote " is invalid' "r.dtd">  <r/> 
    +            DETAIL
    +          end
    +
    +          def test_single_quote
    +            doctype = parse(<<-DOCTYPE)
    +<!DOCTYPE r PUBLIC 'public-id-literal' "r.dtd">
    +            DOCTYPE
    +            assert_equal("public-id-literal", doctype.public)
    +          end
    +
    +          def test_double_quote
    +            doctype = parse(<<-DOCTYPE)
    +<!DOCTYPE r PUBLIC "public'-id-literal" "r.dtd">
    +            DOCTYPE
    +            assert_equal("public'-id-literal", doctype.public)
    +          end
    +        end
    +
    +        class TestSystemLiteral < self
    +          def test_garbage_after_literal
    +            exception = assert_raise(REXML::ParseException) do
    +              parse(<<-DOCTYPE)
    +<!DOCTYPE r PUBLIC 'public-id-literal' 'system-literal'x'>
    +              DOCTYPE
    +            end
    +            assert_equal(<<-DETAIL.chomp, exception.to_s)
    +Malformed DOCTYPE: garbage after external ID
    +Line: 3
    +Position: 65
    +Last 80 unconsumed characters:
    +x'>  <r/> 
    +           DETAIL
    +          end
    +
    +          def test_single_quote
    +            doctype = parse(<<-DOCTYPE)
    +<!DOCTYPE r PUBLIC "public-id-literal" 'system"-literal'>
    +            DOCTYPE
    +            assert_equal("system\"-literal", doctype.system)
    +          end
    +
    +          def test_double_quote
    +            doctype = parse(<<-DOCTYPE)
    +<!DOCTYPE r PUBLIC "public-id-literal" "system'-literal">
    +            DOCTYPE
    +            assert_equal("system'-literal", doctype.system)
    +          end
    +        end
    +      end
         end
     
         class TestMixed < self
    @@ -45,6 +215,15 @@ def test_notation_attlist
             assert_equal([REXML::NotationDecl, REXML::AttlistDecl],
                          doctype.children.collect(&:class))
           end
    +
    +      private
    +      def parse(internal_subset)
    +        super(<<-DOCTYPE)
    +<!DOCTYPE r SYSTEM "urn:x-rexml:test" [
    +#{internal_subset}
    +]>
    +        DOCTYPE
    +      end
         end
       end
     end
    
  • test/parse/test_notation_declaration.rb+14 14 modified
    @@ -50,7 +50,7 @@ def test_invalid_name
     Line: 5
     Position: 74
     Last 80 unconsumed characters:
    - <!NOTATION '>  ]> <r/> 
    +'>  ]> <r/> 
             DETAIL
           end
     
    @@ -61,11 +61,11 @@ def test_no_id_type
               INTERNAL_SUBSET
             end
             assert_equal(<<-DETAIL.chomp, exception.to_s)
    -Malformed notation declaration: ID type is missing
    +Malformed notation declaration: invalid ID type
     Line: 5
     Position: 77
     Last 80 unconsumed characters:
    - <!NOTATION name>  ]> <r/> 
    +>  ]> <r/> 
             DETAIL
           end
     
    @@ -80,7 +80,7 @@ def test_invalid_id_type
     Line: 5
     Position: 85
     Last 80 unconsumed characters:
    - <!NOTATION name INVALID>  ]> <r/> 
    + INVALID>  ]> <r/> 
             DETAIL
           end
         end
    @@ -98,7 +98,7 @@ def test_no_literal
     Line: 5
     Position: 84
     Last 80 unconsumed characters:
    - <!NOTATION name SYSTEM>  ]> <r/> 
    + SYSTEM>  ]> <r/> 
               DETAIL
             end
     
    @@ -109,11 +109,11 @@ def test_garbage_after_literal
                 INTERNAL_SUBSET
               end
               assert_equal(<<-DETAIL.chomp, exception.to_s)
    -Malformed notation declaration: garbage after system literal
    +Malformed notation declaration: garbage before end >
     Line: 5
     Position: 103
     Last 80 unconsumed characters:
    - <!NOTATION name SYSTEM 'system-literal'x'>  ]> <r/> 
    +x'>  ]> <r/> 
               DETAIL
             end
     
    @@ -145,7 +145,7 @@ def test_content_double_quote
     Line: 5
     Position: 129
     Last 80 unconsumed characters:
    - <!NOTATION name PUBLIC 'double quote " is invalid' "system-literal">  ]> <r/> 
    + PUBLIC 'double quote " is invalid' "system-literal">  ]> <r/> 
                 DETAIL
               end
     
    @@ -172,11 +172,11 @@ def test_garbage_after_literal
                   INTERNAL_SUBSET
                 end
                 assert_equal(<<-DETAIL.chomp, exception.to_s)
    -Malformed notation declaration: garbage after system literal
    +Malformed notation declaration: garbage before end >
     Line: 5
     Position: 123
     Last 80 unconsumed characters:
    - <!NOTATION name PUBLIC 'public-id-literal' 'system-literal'x'>  ]> <r/> 
    +x'>  ]> <r/> 
                DETAIL
               end
     
    @@ -229,7 +229,7 @@ def test_no_literal
     Line: 5
     Position: 84
     Last 80 unconsumed characters:
    - <!NOTATION name PUBLIC>  ]> <r/> 
    + PUBLIC>  ]> <r/> 
             DETAIL
           end
     
    @@ -244,7 +244,7 @@ def test_literal_content_double_quote
     Line: 5
     Position: 128
     Last 80 unconsumed characters:
    - <!NOTATION name PUBLIC 'double quote \" is invalid in PubidLiteral'>  ]> <r/> 
    + PUBLIC 'double quote \" is invalid in PubidLiteral'>  ]> <r/> 
             DETAIL
           end
     
    @@ -255,11 +255,11 @@ def test_garbage_after_literal
               INTERNAL_SUBSET
             end
             assert_equal(<<-DETAIL.chomp, exception.to_s)
    -Malformed notation declaration: garbage after public ID literal
    +Malformed notation declaration: garbage before end >
     Line: 5
     Position: 106
     Last 80 unconsumed characters:
    - <!NOTATION name PUBLIC 'public-id-literal'x'>  ]> <r/> 
    +x'>  ]> <r/> 
             DETAIL
           end
     
    
f9d88e4948b4

Fix a bug that invalid document declaration may be generated

https://github.com/ruby/rexmlSutou KouheiFeb 23, 2021via ghsa
2 files changed · +155 35
  • lib/rexml/doctype.rb+50 35 modified
    @@ -7,6 +7,44 @@
     require_relative 'xmltokens'
     
     module REXML
    +  class ReferenceWriter
    +    def initialize(id_type,
    +                   public_id_literal,
    +                   system_literal,
    +                   context=nil)
    +      @id_type = id_type
    +      @public_id_literal = public_id_literal
    +      @system_literal = system_literal
    +      if context and context[:prologue_quote] == :apostrophe
    +        @default_quote = "'"
    +      else
    +        @default_quote = "\""
    +      end
    +    end
    +
    +    def write(output)
    +      output << " #{@id_type}"
    +      if @public_id_literal
    +        if @public_id_literal.include?("'")
    +          quote = "\""
    +        else
    +          quote = @default_quote
    +        end
    +        output << " #{quote}#{@public_id_literal}#{quote}"
    +      end
    +      if @system_literal
    +        if @system_literal.include?("'")
    +          quote = "\""
    +        elsif @system_literal.include?("\"")
    +          quote = "'"
    +        else
    +          quote = @default_quote
    +        end
    +        output << " #{quote}#{@system_literal}#{quote}"
    +      end
    +    end
    +  end
    +
       # Represents an XML DOCTYPE declaration; that is, the contents of <!DOCTYPE
       # ... >.  DOCTYPES can be used to declare the DTD of a document, as well as
       # being used to declare entities used in the document.
    @@ -110,19 +148,17 @@ def clone
         #   Ignored
         def write( output, indent=0, transitive=false, ie_hack=false )
           f = REXML::Formatters::Default.new
    -      c = context
    -      if c and c[:prologue_quote] == :apostrophe
    -        quote = "'"
    -      else
    -        quote = "\""
    -      end
           indent( output, indent )
           output << START
           output << ' '
           output << @name
    -      output << " #{@external_id}" if @external_id
    -      output << " #{quote}#{@long_name}#{quote}" if @long_name
    -      output << " #{quote}#{@uri}#{quote}" if @uri
    +      if @external_id
    +        reference_writer = ReferenceWriter.new(@external_id,
    +                                               @long_name,
    +                                               @uri,
    +                                               context)
    +        reference_writer.write(output)
    +      end
           unless @children.empty?
             output << ' ['
             @children.each { |child|
    @@ -252,32 +288,11 @@ def initialize name, middle, pub, sys
         end
     
         def to_s
    -      c = nil
    -      c = parent.context if parent
    -      if c and c[:prologue_quote] == :apostrophe
    -        default_quote = "'"
    -      else
    -        default_quote = "\""
    -      end
    -      notation = "<!NOTATION #{@name} #{@middle}"
    -      if @public
    -        if @public.include?("'")
    -          quote = "\""
    -        else
    -          quote = default_quote
    -        end
    -        notation << " #{quote}#{@public}#{quote}"
    -      end
    -      if @system
    -        if @system.include?("'")
    -          quote = "\""
    -        elsif @system.include?("\"")
    -          quote = "'"
    -        else
    -          quote = default_quote
    -        end
    -        notation << " #{quote}#{@system}#{quote}"
    -      end
    +      context = nil
    +      context = parent.context if parent
    +      notation = "<!NOTATION #{@name}"
    +      reference_writer = ReferenceWriter.new(@middle, @public, @system, context)
    +      reference_writer.write(notation)
           notation << ">"
           notation
         end
    
  • test/test_doctype.rb+105 0 modified
    @@ -77,6 +77,111 @@ def test_notations
         end
       end
     
    +  class TestDocType < Test::Unit::TestCase
    +    class TestExternalID < self
    +      class TestSystem < self
    +        class TestSystemLiteral < self
    +          def test_to_s
    +            doctype = REXML::DocType.new(["root", "SYSTEM", nil, "root.dtd"])
    +            assert_equal("<!DOCTYPE root SYSTEM \"root.dtd\">",
    +                         doctype.to_s)
    +          end
    +
    +          def test_to_s_apostrophe
    +            doctype = REXML::DocType.new(["root", "SYSTEM", nil, "root.dtd"])
    +            doc = REXML::Document.new
    +            doc << doctype
    +            doctype.parent.context[:prologue_quote] = :apostrophe
    +            assert_equal("<!DOCTYPE root SYSTEM 'root.dtd'>",
    +                         doctype.to_s)
    +          end
    +
    +          def test_to_s_single_quote_apostrophe
    +            doctype = REXML::DocType.new(["root", "SYSTEM", nil, "root'.dtd"])
    +            doc = REXML::Document.new
    +            doc << doctype
    +            # This isn't used.
    +            doctype.parent.context[:prologue_quote] = :apostrophe
    +            assert_equal("<!DOCTYPE root SYSTEM \"root'.dtd\">",
    +                         doctype.to_s)
    +          end
    +
    +          def test_to_s_double_quote
    +            doctype = REXML::DocType.new(["root", "SYSTEM", nil, "root\".dtd"])
    +            doc = REXML::Document.new
    +            doc << doctype
    +            # This isn't used.
    +            doctype.parent.context[:prologue_quote] = :apostrophe
    +            assert_equal("<!DOCTYPE root SYSTEM 'root\".dtd'>",
    +                         doctype.to_s)
    +          end
    +        end
    +      end
    +
    +      class TestPublic < self
    +        class TestPublicIDLiteral < self
    +          def test_to_s
    +            doctype = REXML::DocType.new(["root", "PUBLIC", "pub", "root.dtd"])
    +            assert_equal("<!DOCTYPE root PUBLIC \"pub\" \"root.dtd\">",
    +                         doctype.to_s)
    +          end
    +
    +          def test_to_s_apostrophe
    +            doctype = REXML::DocType.new(["root", "PUBLIC", "pub", "root.dtd"])
    +            doc = REXML::Document.new
    +            doc << doctype
    +            doctype.parent.context[:prologue_quote] = :apostrophe
    +            assert_equal("<!DOCTYPE root PUBLIC 'pub' 'root.dtd'>",
    +                         doctype.to_s)
    +          end
    +
    +          def test_to_s_apostrophe_include_apostrophe
    +            doctype = REXML::DocType.new(["root", "PUBLIC", "pub'", "root.dtd"])
    +            doc = REXML::Document.new
    +            doc << doctype
    +            # This isn't used.
    +            doctype.parent.context[:prologue_quote] = :apostrophe
    +            assert_equal("<!DOCTYPE root PUBLIC \"pub'\" 'root.dtd'>",
    +                         doctype.to_s)
    +          end
    +        end
    +
    +        class TestSystemLiteral < self
    +          def test_to_s
    +            doctype = REXML::DocType.new(["root", "PUBLIC", "pub", "root.dtd"])
    +            assert_equal("<!DOCTYPE root PUBLIC \"pub\" \"root.dtd\">",
    +                         doctype.to_s)
    +          end
    +
    +          def test_to_s_apostrophe
    +            doctype = REXML::DocType.new(["root", "PUBLIC", "pub", "root.dtd"])
    +            doc = REXML::Document.new
    +            doc << doctype
    +            doctype.parent.context[:prologue_quote] = :apostrophe
    +            assert_equal("<!DOCTYPE root PUBLIC 'pub' 'root.dtd'>",
    +                         doctype.to_s)
    +          end
    +
    +          def test_to_s_apostrophe_include_apostrophe
    +            doctype = REXML::DocType.new(["root", "PUBLIC", "pub", "root'.dtd"])
    +            doc = REXML::Document.new
    +            doc << doctype
    +            # This isn't used.
    +            doctype.parent.context[:prologue_quote] = :apostrophe
    +            assert_equal("<!DOCTYPE root PUBLIC 'pub' \"root'.dtd\">",
    +                         doctype.to_s)
    +          end
    +
    +          def test_to_s_double_quote
    +            doctype = REXML::DocType.new(["root", "PUBLIC", "pub", "root\".dtd"])
    +            assert_equal("<!DOCTYPE root PUBLIC \"pub\" 'root\".dtd'>",
    +                         doctype.to_s)
    +          end
    +        end
    +      end
    +    end
    +  end
    +
       class TestNotationDeclPublic < Test::Unit::TestCase
         def setup
           @name = "vrml"
    
f7bab8937513

Fix a bug that invalid element end may be accepted

https://github.com/ruby/rexmlSutou KouheiFeb 23, 2021via ghsa
2 files changed · +14 1
  • lib/rexml/parsers/baseparser.rb+1 1 modified
    @@ -62,7 +62,7 @@ class BaseParser
           INSTRUCTION_START = /\A<\?/u
           INSTRUCTION_PATTERN = /<\?#{NAME}(\s+.*?)?\?>/um
           TAG_MATCH = /\A<((?>#{QNAME_STR}))/um
    -      CLOSE_MATCH = /^\s*<\/(#{QNAME_STR})\s*>/um
    +      CLOSE_MATCH = /\A\s*<\/(#{QNAME_STR})\s*>/um
     
           VERSION = /\bversion\s*=\s*["'](.*?)['"]/um
           ENCODING = /\bencoding\s*=\s*["'](.*?)['"]/um
    
  • test/parse/test_element.rb+13 0 modified
    @@ -59,6 +59,19 @@ def test_garbage_less_than_before_root_element_at_line_start
     < <x/>
             DETAIL
           end
    +
    +      def test_garbage_less_than_slash_before_end_tag_at_line_start
    +        exception = assert_raise(REXML::ParseException) do
    +          parse("<x></\n</x>")
    +        end
    +        assert_equal(<<-DETAIL.chomp, exception.to_s)
    +Missing end tag for 'x'
    +Line: 2
    +Position: 10
    +Last 80 unconsumed characters:
    +</ </x>
    +        DETAIL
    +      end
         end
       end
     end
    
6a250d2cd119

Fix a bug that invalid element start may be accepted

https://github.com/ruby/rexmlSutou KouheiFeb 23, 2021via ghsa
2 files changed · +14 1
  • lib/rexml/parsers/baseparser.rb+1 1 modified
    @@ -61,7 +61,7 @@ class BaseParser
           XMLDECL_PATTERN = /<\?xml\s+(.*?)\?>/um
           INSTRUCTION_START = /\A<\?/u
           INSTRUCTION_PATTERN = /<\?#{NAME}(\s+.*?)?\?>/um
    -      TAG_MATCH = /^<((?>#{QNAME_STR}))/um
    +      TAG_MATCH = /\A<((?>#{QNAME_STR}))/um
           CLOSE_MATCH = /^\s*<\/(#{QNAME_STR})\s*>/um
     
           VERSION = /\bversion\s*=\s*["'](.*?)['"]/um
    
  • test/parse/test_element.rb+13 0 modified
    @@ -46,6 +46,19 @@ def test_empty_namespace_attribute_name
     
             DETAIL
           end
    +
    +      def test_garbage_less_than_before_root_element_at_line_start
    +        exception = assert_raise(REXML::ParseException) do
    +          parse("<\n<x/>")
    +        end
    +        assert_equal(<<-DETAIL.chomp, exception.to_s)
    +malformed XML: missing tag start
    +Line: 2
    +Position: 6
    +Last 80 unconsumed characters:
    +< <x/>
    +        DETAIL
    +      end
         end
       end
     end
    
2fe62e29094d

Fix a bug that invalid notation declaration may be accepted

https://github.com/ruby/rexmlSutou KouheiFeb 19, 2021via ghsa
2 files changed · +234 6
  • lib/rexml/parsers/baseparser.rb+53 6 modified
    @@ -83,9 +83,6 @@ class BaseParser
           ATTDEF_RE = /#{ATTDEF}/
           ATTLISTDECL_START = /\A\s*<!ATTLIST/um
           ATTLISTDECL_PATTERN = /\A\s*<!ATTLIST\s+#{NAME}(?:#{ATTDEF})*\s*>/um
    -      NOTATIONDECL_START = /\A\s*<!NOTATION/um
    -      PUBLIC = /\A\s*<!NOTATION\s+(\w[\-\w]*)\s+(PUBLIC)\s+(["'])(.*?)\3(?:\s+(["'])(.*?)\5)?\s*>/um
    -      SYSTEM = /\A\s*<!NOTATION\s+(\w[\-\w]*)\s+(SYSTEM)\s+(["'])(.*?)\3\s*>/um
     
           TEXT_PATTERN = /\A([^<]*)/um
     
    @@ -103,6 +100,10 @@ class BaseParser
           GEDECL = "<!ENTITY\\s+#{NAME}\\s+#{ENTITYDEF}\\s*>"
           ENTITYDECL = /\s*(?:#{GEDECL})|(?:#{PEDECL})/um
     
    +      NOTATIONDECL_START = /\A\s*<!NOTATION/um
    +      PUBLIC = /\A\s*<!NOTATION\s+#{NAME}\s+(PUBLIC)\s+#{PUBIDLITERAL}(?:\s+#{SYSTEMLITERAL})?\s*>/um
    +      SYSTEM = /\A\s*<!NOTATION\s+#{NAME}\s+(SYSTEM)\s+#{SYSTEMLITERAL}\s*>/um
    +
           EREFERENCE = /&(?!#{NAME};)/
     
           DEFAULT_ENTITIES = {
    @@ -315,12 +316,22 @@ def pull_event
                 md = nil
                 if @source.match( PUBLIC )
                   md = @source.match( PUBLIC, true )
    -              vals = [md[1],md[2],md[4],md[6]]
    +              pubid = system = nil
    +              pubid_literal = md[3]
    +              pubid = pubid_literal[1..-2] if pubid_literal # Remove quote
    +              system_literal = md[4]
    +              system = system_literal[1..-2] if system_literal # Remove quote
    +              vals = [md[1], md[2], pubid, system]
                 elsif @source.match( SYSTEM )
                   md = @source.match( SYSTEM, true )
    -              vals = [md[1],md[2],nil,md[4]]
    +              system = nil
    +              system_literal = md[3]
    +              system = system_literal[1..-2] if system_literal # Remove quote
    +              vals = [md[1], md[2], nil, system]
                 else
    -              raise REXML::ParseException.new( "error parsing notation: no matching pattern", @source )
    +              details = notation_decl_invalid_details
    +              message = "Malformed notation declaration: #{details}"
    +              raise REXML::ParseException.new(message, @source)
                 end
                 return [ :notationdecl, *vals ]
               when DOCTYPE_END
    @@ -569,6 +580,42 @@ def parse_attributes(prefixes, curr_ns)
             end
             return attributes, closed
           end
    +
    +      def notation_decl_invalid_details
    +        name = /#{NOTATIONDECL_START}\s+#{NAME}/um
    +        public = /#{name}\s+PUBLIC/um
    +        system = /#{name}\s+SYSTEM/um
    +        if @source.match(/#{NOTATIONDECL_START}\s*>/um)
    +          return "name is missing"
    +        elsif not @source.match(/#{name}[\s>]/um)
    +          return "invalid name"
    +        elsif @source.match(/#{name}\s*>/um)
    +          return "ID type is missing"
    +        elsif not @source.match(/#{name}\s+(?:PUBLIC|SYSTEM)[\s>]/um)
    +          return "invalid ID type"
    +        elsif @source.match(/#{public}/um)
    +          if @source.match(/#{public}\s*>/um)
    +            return "public ID literal is missing"
    +          elsif not @source.match(/#{public}\s+#{PUBIDLITERAL}/um)
    +            return "invalid public ID literal"
    +          elsif @source.match(/#{public}\s+#{PUBIDLITERAL}[^\s>]/um)
    +            return "garbage after public ID literal"
    +          elsif not @source.match(/#{public}\s+#{PUBIDLITERAL}\s+#{SYSTEMLITERAL}/um)
    +            return "invalid system literal"
    +          elsif not @source.match(/#{public}\s+#{PUBIDLITERAL}\s+#{SYSTEMLITERAL}\s*>/um)
    +            return "garbage after system literal"
    +          end
    +        elsif @source.match(/#{system}/um)
    +          if @source.match(/#{system}\s*>/um)
    +            return "system literal is missing"
    +          elsif not @source.match(/#{system}\s+#{SYSTEMLITERAL}/um)
    +            return "invalid system literal"
    +          elsif not @source.match(/#{system}\s+#{SYSTEMLITERAL}\s*>/um)
    +            return "garbage after system literal"
    +          end
    +        end
    +        "end > is missing"
    +      end
         end
       end
     end
    
  • test/parse/test_notation_declaration.rb+181 0 modified
    @@ -23,10 +23,100 @@ def test_name
             doctype = parse("<!NOTATION name PUBLIC 'urn:public-id'>")
             assert_equal("name", doctype.notation("name").name)
           end
    +
    +      def test_no_name
    +        exception = assert_raise(REXML::ParseException) do
    +          parse(<<-INTERNAL_SUBSET)
    +<!NOTATION>
    +          INTERNAL_SUBSET
    +        end
    +        assert_equal(<<-DETAIL.chomp, exception.to_s)
    +Malformed notation declaration: name is missing
    +Line: 5
    +Position: 72
    +Last 80 unconsumed characters:
    + <!NOTATION>  ]> <r/> 
    +        DETAIL
    +      end
    +
    +      def test_invalid_name
    +        exception = assert_raise(REXML::ParseException) do
    +          parse(<<-INTERNAL_SUBSET)
    +<!NOTATION '>
    +          INTERNAL_SUBSET
    +        end
    +        assert_equal(<<-DETAIL.chomp, exception.to_s)
    +Malformed notation declaration: invalid name
    +Line: 5
    +Position: 74
    +Last 80 unconsumed characters:
    + <!NOTATION '>  ]> <r/> 
    +        DETAIL
    +      end
    +
    +      def test_no_id_type
    +        exception = assert_raise(REXML::ParseException) do
    +          parse(<<-INTERNAL_SUBSET)
    +<!NOTATION name>
    +          INTERNAL_SUBSET
    +        end
    +        assert_equal(<<-DETAIL.chomp, exception.to_s)
    +Malformed notation declaration: ID type is missing
    +Line: 5
    +Position: 77
    +Last 80 unconsumed characters:
    + <!NOTATION name>  ]> <r/> 
    +        DETAIL
    +      end
    +
    +      def test_invalid_id_type
    +        exception = assert_raise(REXML::ParseException) do
    +          parse(<<-INTERNAL_SUBSET)
    +<!NOTATION name INVALID>
    +          INTERNAL_SUBSET
    +        end
    +        assert_equal(<<-DETAIL.chomp, exception.to_s)
    +Malformed notation declaration: invalid ID type
    +Line: 5
    +Position: 85
    +Last 80 unconsumed characters:
    + <!NOTATION name INVALID>  ]> <r/> 
    +        DETAIL
    +      end
         end
     
         class TestExternalID < self
           class TestSystem < self
    +        def test_no_literal
    +          exception = assert_raise(REXML::ParseException) do
    +            parse(<<-INTERNAL_SUBSET)
    +<!NOTATION name SYSTEM>
    +            INTERNAL_SUBSET
    +          end
    +          assert_equal(<<-DETAIL.chomp, exception.to_s)
    +Malformed notation declaration: system literal is missing
    +Line: 5
    +Position: 84
    +Last 80 unconsumed characters:
    + <!NOTATION name SYSTEM>  ]> <r/> 
    +          DETAIL
    +        end
    +
    +        def test_garbage_after_literal
    +          exception = assert_raise(REXML::ParseException) do
    +            parse(<<-INTERNAL_SUBSET)
    +<!NOTATION name SYSTEM 'system-literal'x'>
    +            INTERNAL_SUBSET
    +          end
    +          assert_equal(<<-DETAIL.chomp, exception.to_s)
    +Malformed notation declaration: garbage after system literal
    +Line: 5
    +Position: 103
    +Last 80 unconsumed characters:
    + <!NOTATION name SYSTEM 'system-literal'x'>  ]> <r/> 
    +          DETAIL
    +        end
    +
             def test_single_quote
               doctype = parse(<<-INTERNAL_SUBSET)
     <!NOTATION name SYSTEM 'system-literal'>
    @@ -44,6 +134,21 @@ def test_double_quote
     
           class TestPublic < self
             class TestPublicIDLiteral < self
    +          def test_content_double_quote
    +            exception = assert_raise(REXML::ParseException) do
    +              parse(<<-INTERNAL_SUBSET)
    +<!NOTATION name PUBLIC 'double quote " is invalid' "system-literal">
    +              INTERNAL_SUBSET
    +            end
    +            assert_equal(<<-DETAIL.chomp, exception.to_s)
    +Malformed notation declaration: invalid public ID literal
    +Line: 5
    +Position: 129
    +Last 80 unconsumed characters:
    + <!NOTATION name PUBLIC 'double quote " is invalid' "system-literal">  ]> <r/> 
    +            DETAIL
    +          end
    +
               def test_single_quote
                 doctype = parse(<<-INTERNAL_SUBSET)
     <!NOTATION name PUBLIC 'public-id-literal' "system-literal">
    @@ -60,6 +165,21 @@ def test_double_quote
             end
     
             class TestSystemLiteral < self
    +          def test_garbage_after_literal
    +            exception = assert_raise(REXML::ParseException) do
    +              parse(<<-INTERNAL_SUBSET)
    +<!NOTATION name PUBLIC 'public-id-literal' 'system-literal'x'>
    +              INTERNAL_SUBSET
    +            end
    +            assert_equal(<<-DETAIL.chomp, exception.to_s)
    +Malformed notation declaration: garbage after system literal
    +Line: 5
    +Position: 123
    +Last 80 unconsumed characters:
    + <!NOTATION name PUBLIC 'public-id-literal' 'system-literal'x'>  ]> <r/> 
    +           DETAIL
    +          end
    +
               def test_single_quote
                 doctype = parse(<<-INTERNAL_SUBSET)
     <!NOTATION name PUBLIC "public-id-literal" 'system-literal'>
    @@ -96,5 +216,66 @@ def test_public_system
             end
           end
         end
    +
    +    class TestPublicID < self
    +      def test_no_literal
    +        exception = assert_raise(REXML::ParseException) do
    +          parse(<<-INTERNAL_SUBSET)
    +<!NOTATION name PUBLIC>
    +          INTERNAL_SUBSET
    +        end
    +        assert_equal(<<-DETAIL.chomp, exception.to_s)
    +Malformed notation declaration: public ID literal is missing
    +Line: 5
    +Position: 84
    +Last 80 unconsumed characters:
    + <!NOTATION name PUBLIC>  ]> <r/> 
    +        DETAIL
    +      end
    +
    +      def test_literal_content_double_quote
    +        exception = assert_raise(REXML::ParseException) do
    +          parse(<<-INTERNAL_SUBSET)
    +<!NOTATION name PUBLIC 'double quote " is invalid in PubidLiteral'>
    +          INTERNAL_SUBSET
    +        end
    +        assert_equal(<<-DETAIL.chomp, exception.to_s)
    +Malformed notation declaration: invalid public ID literal
    +Line: 5
    +Position: 128
    +Last 80 unconsumed characters:
    + <!NOTATION name PUBLIC 'double quote \" is invalid in PubidLiteral'>  ]> <r/> 
    +        DETAIL
    +      end
    +
    +      def test_garbage_after_literal
    +        exception = assert_raise(REXML::ParseException) do
    +          parse(<<-INTERNAL_SUBSET)
    +<!NOTATION name PUBLIC 'public-id-literal'x'>
    +          INTERNAL_SUBSET
    +        end
    +        assert_equal(<<-DETAIL.chomp, exception.to_s)
    +Malformed notation declaration: garbage after public ID literal
    +Line: 5
    +Position: 106
    +Last 80 unconsumed characters:
    + <!NOTATION name PUBLIC 'public-id-literal'x'>  ]> <r/> 
    +        DETAIL
    +      end
    +
    +      def test_literal_single_quote
    +        doctype = parse(<<-INTERNAL_SUBSET)
    +<!NOTATION name PUBLIC 'public-id-literal'>
    +        INTERNAL_SUBSET
    +        assert_equal("public-id-literal", doctype.notation("name").public)
    +      end
    +
    +      def test_literal_double_quote
    +        doctype = parse(<<-INTERNAL_SUBSET)
    +<!NOTATION name PUBLIC "public-id-literal">
    +        INTERNAL_SUBSET
    +        assert_equal("public-id-literal", doctype.notation("name").public)
    +      end
    +    end
       end
     end
    
a659c63e3741

Fix a bug that invalid notation declaration may be generated

https://github.com/ruby/rexmlSutou KouheiFeb 19, 2021via ghsa
2 files changed · +118 5
  • lib/rexml/doctype.rb+20 4 modified
    @@ -255,13 +255,29 @@ def to_s
           c = nil
           c = parent.context if parent
           if c and c[:prologue_quote] == :apostrophe
    -        quote = "'"
    +        default_quote = "'"
           else
    -        quote = "\""
    +        default_quote = "\""
           end
           notation = "<!NOTATION #{@name} #{@middle}"
    -      notation << " #{quote}#{@public}#{quote}" if @public
    -      notation << " #{quote}#{@system}#{quote}" if @system
    +      if @public
    +        if @public.include?("'")
    +          quote = "\""
    +        else
    +          quote = default_quote
    +        end
    +        notation << " #{quote}#{@public}#{quote}"
    +      end
    +      if @system
    +        if @system.include?("'")
    +          quote = "\""
    +        elsif @system.include?("\"")
    +          quote = "'"
    +        else
    +          quote = default_quote
    +        end
    +        notation << " #{quote}#{@system}#{quote}"
    +      end
           notation << ">"
           notation
         end
    
  • test/test_doctype.rb+98 1 modified
    @@ -89,11 +89,26 @@ def test_to_s
                        decl(@id, nil).to_s)
         end
     
    +    def test_to_s_pubid_literal_include_apostrophe
    +      assert_equal("<!NOTATION #{@name} PUBLIC \"#{@id}'\">",
    +                   decl("#{@id}'", nil).to_s)
    +    end
    +
         def test_to_s_with_uri
           assert_equal("<!NOTATION #{@name} PUBLIC \"#{@id}\" \"#{@uri}\">",
                        decl(@id, @uri).to_s)
         end
     
    +    def test_to_s_system_literal_include_apostrophe
    +      assert_equal("<!NOTATION #{@name} PUBLIC \"#{@id}\" \"system'literal\">",
    +                   decl(@id, "system'literal").to_s)
    +    end
    +
    +    def test_to_s_system_literal_include_double_quote
    +      assert_equal("<!NOTATION #{@name} PUBLIC \"#{@id}\" 'system\"literal'>",
    +                   decl(@id, "system\"literal").to_s)
    +    end
    +
         def test_to_s_apostrophe
           document = REXML::Document.new(<<-XML)
           <!DOCTYPE root SYSTEM "urn:x-test:sysid" [
    @@ -107,6 +122,49 @@ def test_to_s_apostrophe
                        notation.to_s)
         end
     
    +    def test_to_s_apostrophe_pubid_literal_include_apostrophe
    +      document = REXML::Document.new(<<-XML)
    +      <!DOCTYPE root SYSTEM "urn:x-test:sysid" [
    +        #{decl("#{@id}'", @uri).to_s}
    +      ]>
    +      <root/>
    +      XML
    +      # This isn't used for PubidLiteral because PubidChar includes '.
    +      document.context[:prologue_quote] = :apostrophe
    +      notation = document.doctype.notations[0]
    +      assert_equal("<!NOTATION #{@name} PUBLIC \"#{@id}'\" '#{@uri}'>",
    +                   notation.to_s)
    +    end
    +
    +    def test_to_s_apostrophe_system_literal_include_apostrophe
    +      document = REXML::Document.new(<<-XML)
    +      <!DOCTYPE root SYSTEM "urn:x-test:sysid" [
    +        #{decl(@id, "system'literal").to_s}
    +      ]>
    +      <root/>
    +      XML
    +      # This isn't used for SystemLiteral because SystemLiteral includes '.
    +      document.context[:prologue_quote] = :apostrophe
    +      notation = document.doctype.notations[0]
    +      assert_equal("<!NOTATION #{@name} PUBLIC '#{@id}' \"system'literal\">",
    +                   notation.to_s)
    +    end
    +
    +    def test_to_s_apostrophe_system_literal_include_double_quote
    +      document = REXML::Document.new(<<-XML)
    +      <!DOCTYPE root SYSTEM "urn:x-test:sysid" [
    +        #{decl(@id, "system\"literal").to_s}
    +      ]>
    +      <root/>
    +      XML
    +      # This isn't used for SystemLiteral because SystemLiteral includes ".
    +      # But quoted by ' because SystemLiteral includes ".
    +      document.context[:prologue_quote] = :apostrophe
    +      notation = document.doctype.notations[0]
    +      assert_equal("<!NOTATION #{@name} PUBLIC '#{@id}' 'system\"literal'>",
    +                   notation.to_s)
    +    end
    +
         private
         def decl(id, uri)
           REXML::NotationDecl.new(@name, "PUBLIC", id, uri)
    @@ -124,6 +182,16 @@ def test_to_s
                        decl(@id).to_s)
         end
     
    +    def test_to_s_include_apostrophe
    +      assert_equal("<!NOTATION #{@name} SYSTEM \"#{@id}'\">",
    +                   decl("#{@id}'").to_s)
    +    end
    +
    +    def test_to_s_include_double_quote
    +      assert_equal("<!NOTATION #{@name} SYSTEM '#{@id}\"'>",
    +                   decl("#{@id}\"").to_s)
    +    end
    +
         def test_to_s_apostrophe
           document = REXML::Document.new(<<-XML)
           <!DOCTYPE root SYSTEM "urn:x-test:sysid" [
    @@ -137,9 +205,38 @@ def test_to_s_apostrophe
                        notation.to_s)
         end
     
    +    def test_to_s_apostrophe_include_apostrophe
    +      document = REXML::Document.new(<<-XML)
    +      <!DOCTYPE root SYSTEM "urn:x-test:sysid" [
    +        #{decl("#{@id}'").to_s}
    +      ]>
    +      <root/>
    +      XML
    +      # This isn't used for SystemLiteral because SystemLiteral includes '.
    +      document.context[:prologue_quote] = :apostrophe
    +      notation = document.doctype.notations[0]
    +      assert_equal("<!NOTATION #{@name} SYSTEM \"#{@id}'\">",
    +                   notation.to_s)
    +    end
    +
    +    def test_to_s_apostrophe_include_double_quote
    +      document = REXML::Document.new(<<-XML)
    +      <!DOCTYPE root SYSTEM "urn:x-test:sysid" [
    +        #{decl("#{@id}\"").to_s}
    +      ]>
    +      <root/>
    +      XML
    +      # This isn't used for SystemLiteral because SystemLiteral includes ".
    +      # But quoted by ' because SystemLiteral includes ".
    +      document.context[:prologue_quote] = :apostrophe
    +      notation = document.doctype.notations[0]
    +      assert_equal("<!NOTATION #{@name} SYSTEM '#{@id}\"'>",
    +                   notation.to_s)
    +    end
    +
         private
         def decl(id)
    -      REXML::NotationDecl.new(@name, "SYSTEM", id, nil)
    +      REXML::NotationDecl.new(@name, "SYSTEM", nil, id)
         end
       end
     end
    

Vulnerability mechanics

Generated on May 9, 2026. Inputs: CWE entries + fix-commit diffs from this CVE's patches. Citations validated against bundle.

References

18

News mentions

0

No linked articles in our index yet.