VYPR
High severityNVD Advisory· Published May 20, 2022· Updated May 27, 2025

Improper Handling of Unexpected Data Type in Nokogiri

CVE-2022-29181

Description

Nokogiri is an open source XML and HTML library for Ruby. Nokogiri prior to version 1.13.6 does not type-check all inputs into the XML and HTML4 SAX parsers, allowing specially crafted untrusted inputs to cause illegal memory access errors (segfault) or reads from unrelated memory. Version 1.13.6 contains a patch for this issue. As a workaround, ensure the untrusted input is a String by calling #to_s or equivalent.

AI Insight

LLM-synthesized narrative grounded in this CVE's description and references.

Nokogiri prior to 1.13.6 lacks type-checking in SAX parsers, allowing crafted input to cause segfaults or read from unintended memory.

Vulnerability

Nokogiri, an open-source XML and HTML library for Ruby, prior to version 1.13.6 does not type-check all inputs into the XML::SAX::Parser, XML::SAX::ParserContext, HTML4::SAX::Parser, and HTML4::SAX::ParserContext constructors. This flaw allows specially crafted untrusted inputs that are not a String to trigger illegal memory access errors (segfaults) or reads from unrelated memory, as described in the official advisory [2] and the fix commit [4]. Affected versions are all releases before 1.13.6.

Exploitation

An attacker can exploit this vulnerability by providing a non-String object (e.g., an array, integer, or custom object) to one of the vulnerable SAX parser constructors. The attacker needs no special network position beyond the ability to supply input to a Nokogiri-using application; no authentication or user interaction is required beyond the application processing the untrusted input. The lack of type checking in the C extension causes the parser to treat the malformed Ruby object as a pointer, leading to memory access errors [2][3].

Impact

Successful exploitation can cause a segmentation fault (segfault), resulting in a denial-of-service (DoS) condition. More critically, the flaw may allow reads from unrelated memory, potentially leaking sensitive information from adjacent memory regions. The impact is limited to the process's memory space; no consistent remote code execution (RCE) has been demonstrated, but information disclosure is possible [2][4].

Mitigation

Nokogiri version 1.13.6 and later contain a patch that properly type-checks inputs and raises a TypeError if the argument is not a String. Users should upgrade to 1.13.6 or later immediately. As a workaround, ensure that untrusted input passed to SAX parsers is converted to a String by calling #to_s or an equivalent method before parsing [2][4].

AI Insight generated on May 21, 2026. Synthesized from this CVE's description and the cited reference URLs; citations are validated against the source bundle.

Affected packages

Versions sourced from the GitHub Security Advisory.

PackageAffected versionsPatched versions
nokogiriRubyGems
< 1.13.61.13.6

Affected products

13

Patches

2
83cc451c3f29

fix: {HTML4,XML}::SAX::{Parser,ParserContext} check arg types

https://github.com/sparklemotion/nokogiriMike DalessioMay 7, 2022via ghsa
10 files changed · +66 12
  • ext/java/nokogiri/Html4SaxParserContext.java+11 0 modified
    @@ -231,6 +231,13 @@ static EncodingType get(final int ordinal)
                  IRubyObject data,
                  IRubyObject encoding)
       {
    +    if (!(data instanceof RubyString)) {
    +      throw context.getRuntime().newTypeError("data must be kind_of String");
    +    }
    +    if (!(encoding instanceof RubyString)) {
    +      throw context.getRuntime().newTypeError("data must be kind_of String");
    +    }
    +
         Html4SaxParserContext ctx = Html4SaxParserContext.newInstance(context.runtime, (RubyClass) klass);
         ctx.setInputSourceFile(context, data);
         String javaEncoding = findEncodingName(context, encoding);
    @@ -247,6 +254,10 @@ static EncodingType get(final int ordinal)
                IRubyObject data,
                IRubyObject encoding)
       {
    +    if (!(encoding instanceof RubyFixnum)) {
    +      throw context.getRuntime().newTypeError("encoding must be kind_of String");
    +    }
    +
         Html4SaxParserContext ctx = Html4SaxParserContext.newInstance(context.runtime, (RubyClass) klass);
         ctx.setIOInputSource(context, data, context.nil);
         String javaEncoding = findEncodingName(context, encoding);
    
  • ext/java/nokogiri/internals/ParserContext.java+7 1 modified
    @@ -58,6 +58,12 @@ public abstract class ParserContext extends RubyObject
         source = new InputSource();
         ParserContext.setUrl(context, source, url);
     
    +    Ruby ruby = context.getRuntime();
    +
    +    if (!(data.respondsTo("read"))) {
    +      throw ruby.newTypeError("must respond to :read");
    +    }
    +
         source.setByteStream(new IOInputStream(data));
         if (java_encoding != null) {
           source.setEncoding(java_encoding);
    @@ -73,7 +79,7 @@ public abstract class ParserContext extends RubyObject
         Ruby ruby = context.getRuntime();
     
         if (!(data instanceof RubyString)) {
    -      throw ruby.newArgumentError("must be kind_of String");
    +      throw ruby.newTypeError("must be kind_of String");
         }
     
         RubyString stringData = (RubyString) data;
    
  • ext/java/nokogiri/XmlSaxParserContext.java+5 2 modified
    @@ -130,9 +130,12 @@ public class XmlSaxParserContext extends ParserContext
       parse_io(ThreadContext context,
                IRubyObject klazz,
                IRubyObject data,
    -           IRubyObject enc)
    +           IRubyObject encoding)
       {
    -    //int encoding = (int)enc.convertToInteger().getLongValue();
    +    // check the type of the unused encoding to match behavior of CRuby
    +    if (!(encoding instanceof RubyFixnum)) {
    +      throw context.getRuntime().newTypeError("encoding must be kind_of String");
    +    }
         final Ruby runtime = context.runtime;
         XmlSaxParserContext ctx = newInstance(runtime, (RubyClass) klazz);
         ctx.initialize(runtime);
    
  • ext/nokogiri/html4_sax_parser_context.c+2 3 modified
    @@ -19,9 +19,8 @@ parse_memory(VALUE klass, VALUE data, VALUE encoding)
     {
       htmlParserCtxtPtr ctxt;
     
    -  if (NIL_P(data)) {
    -    rb_raise(rb_eArgError, "data cannot be nil");
    -  }
    +  Check_Type(data, T_STRING);
    +
       if (!(int)RSTRING_LEN(data)) {
         rb_raise(rb_eRuntimeError, "data cannot be empty");
       }
    
  • ext/nokogiri/xml_sax_parser_context.c+10 3 modified
    @@ -2,6 +2,8 @@
     
     VALUE cNokogiriXmlSaxParserContext ;
     
    +static ID id_read;
    +
     static void
     deallocate(xmlParserCtxtPtr ctxt)
     {
    @@ -26,6 +28,10 @@ parse_io(VALUE klass, VALUE io, VALUE encoding)
       xmlParserCtxtPtr ctxt;
       xmlCharEncoding enc = (xmlCharEncoding)NUM2INT(encoding);
     
    +  if (!rb_respond_to(io, id_read)) {
    +    rb_raise(rb_eTypeError, "argument expected to respond to :read");
    +  }
    +
       ctxt = xmlCreateIOParserCtxt(NULL, NULL,
                                    (xmlInputReadCallback)noko_io_read,
                                    (xmlInputCloseCallback)noko_io_close,
    @@ -62,9 +68,8 @@ parse_memory(VALUE klass, VALUE data)
     {
       xmlParserCtxtPtr ctxt;
     
    -  if (NIL_P(data)) {
    -    rb_raise(rb_eArgError, "data cannot be nil");
    -  }
    +  Check_Type(data, T_STRING);
    +
       if (!(int)RSTRING_LEN(data)) {
         rb_raise(rb_eRuntimeError, "data cannot be empty");
       }
    @@ -278,4 +283,6 @@ noko_init_xml_sax_parser_context()
       rb_define_method(cNokogiriXmlSaxParserContext, "recovery", get_recovery, 0);
       rb_define_method(cNokogiriXmlSaxParserContext, "line", line, 0);
       rb_define_method(cNokogiriXmlSaxParserContext, "column", column, 0);
    +
    +  id_read = rb_intern("read");
     }
    
  • lib/nokogiri/html4/sax/parser.rb+1 1 modified
    @@ -28,7 +28,7 @@ class Parser < Nokogiri::XML::SAX::Parser
             ###
             # Parse html stored in +data+ using +encoding+
             def parse_memory(data, encoding = "UTF-8")
    -          raise ArgumentError unless data
    +          raise TypeError unless String === data
               return if data.empty?
     
               ctx = ParserContext.memory(data, encoding)
    
  • test/html4/sax/test_parser_context.rb+9 0 modified
    @@ -40,6 +40,15 @@ def test_from_file
               ctx.parse_with(parser)
               # end
             end
    +
    +        def test_graceful_handling_of_invalid_types
    +          assert_raises(TypeError) { ParserContext.new(0xcafecafe) }
    +          assert_raises(TypeError) { ParserContext.memory(0xcafecafe, "UTF-8") }
    +          assert_raises(TypeError) { ParserContext.io(0xcafecafe, 1) }
    +          assert_raises(TypeError) { ParserContext.io(StringIO.new("asdf"), "should be an index into ENCODINGS") }
    +          assert_raises(TypeError) { ParserContext.file(0xcafecafe, "UTF-8") }
    +          assert_raises(TypeError) { ParserContext.file("path/to/file", 0xcafecafe) }
    +        end
           end
         end
       end
    
  • test/html4/sax/test_parser.rb+7 1 modified
    @@ -54,7 +54,7 @@ def test_parse_file_with_dir
             end
     
             def test_parse_memory_nil
    -          assert_raises(ArgumentError) do
    +          assert_raises(TypeError) do
                 @parser.parse_memory(nil)
               end
             end
    @@ -161,6 +161,12 @@ def test_parsing_dom_error_from_io
             def test_empty_processing_instruction
               @parser.parse_memory("<strong>this will segfault<?strong>")
             end
    +
    +        it "handles invalid types gracefully" do
    +          assert_raises(TypeError) { Nokogiri::HTML::SAX::Parser.new.parse(0xcafecafe) }
    +          assert_raises(TypeError) { Nokogiri::HTML::SAX::Parser.new.parse_memory(0xcafecafe) }
    +          assert_raises(TypeError) { Nokogiri::HTML::SAX::Parser.new.parse_io(0xcafecafe) }
    +        end
           end
         end
       end
    
  • test/xml/sax/test_parser_context.rb+7 0 modified
    @@ -80,6 +80,13 @@ def test_recovery
               assert(pc.recovery)
             end
     
    +        def test_graceful_handling_of_invalid_types
    +          assert_raises(TypeError) { ParserContext.new(0xcafecafe) }
    +          assert_raises(TypeError) { ParserContext.memory(0xcafecafe) }
    +          assert_raises(TypeError) { ParserContext.io(0xcafecafe, 1) }
    +          assert_raises(TypeError) { ParserContext.io(StringIO.new("asdf"), "should be an index into ENCODINGS") }
    +        end
    +
             def test_from_io
               ctx = ParserContext.new(StringIO.new("fo"), "UTF-8")
               assert(ctx)
    
  • test/xml/sax/test_parser.rb+7 1 modified
    @@ -71,6 +71,12 @@ class Nokogiri::SAX::TestCase
           end
         end
     
    +    it "handles invalid types gracefully" do
    +      assert_raises(TypeError) { Nokogiri::XML::SAX::Parser.new.parse(0xcafecafe) }
    +      assert_raises(TypeError) { Nokogiri::XML::SAX::Parser.new.parse_memory(0xcafecafe) }
    +      assert_raises(TypeError) { Nokogiri::XML::SAX::Parser.new.parse_io(0xcafecafe) }
    +    end
    +
         it :test_namespace_declaration_order_is_saved do
           parser.parse(<<~EOF)
             <root xmlns:foo='http://foo.example.com/' xmlns='http://example.com/'>
    @@ -261,7 +267,7 @@ def call_parse_io_with_encoding(encoding)
         end
     
         it :test_render_parse_nil_param do
    -      assert_raises(ArgumentError) { parser.parse_memory(nil) }
    +      assert_raises(TypeError) { parser.parse_memory(nil) }
         end
     
         it :test_bad_encoding_args do
    
db05ba9a1bd4

fix: {HTML4,XML}::SAX::{Parser,ParserContext} check arg types

https://github.com/sparklemotion/nokogiriMike DalessioMay 7, 2022via ghsa
11 files changed · +68 13
  • CHANGELOG.md+2 1 modified
    @@ -28,6 +28,7 @@ This version of Nokogiri uses [`jar-dependencies`](https://github.com/mkristian/
     
     * [CRuby] UTF-16-encoded documents longer than ~4000 code points now serialize properly. Previously the serialized document was corrupted when it exceeded the length of libxml2's internal string buffer. [[#752](https://github.com/sparklemotion/nokogiri/issues/752)]
     * [HTML5] The Gumbo parser now correctly handles text at the end of `form` elements.
    +* `{HTML4,XML}::SAX::{Parser,ParserContext}` constructor methods now raise `TypeError` instead of segfaulting when an incorrect type is passed. (Thanks to [@agustingianni](https://github.com/agustingianni) from the Github Security Lab for reporting!)
     
     
     ### Improved
    @@ -36,7 +37,7 @@ This version of Nokogiri uses [`jar-dependencies`](https://github.com/mkristian/
     * Avoid compile-time conflict with system-installed `gumbo.h` on OpenBSD. [[#2464](https://github.com/sparklemotion/nokogiri/issues/2464)]
     * Remove calls to `vasprintf` in favor of platform-independent `rb_vsprintf`
     * Prefer `ruby_xmalloc` to `malloc` within the C extension. [[#2480](https://github.com/sparklemotion/nokogiri/issues/2480)] (Thanks, [@Garfield96](https://github.com/Garfield96)!)
    -* Installation from source on systems missing libiconv will once again generate a helpful error message (broken since v1.11.0). [#2505]
    +* Installation from source on systems missing libiconv will once again generate a helpful error message (broken since v1.11.0). [[#2505](https://github.com/sparklemotion/nokogiri/issues/2505)]
     
     
     ## 1.13.5 / 2022-05-04
    
  • ext/java/nokogiri/Html4SaxParserContext.java+11 0 modified
    @@ -231,6 +231,13 @@ static EncodingType get(final int ordinal)
                  IRubyObject data,
                  IRubyObject encoding)
       {
    +    if (!(data instanceof RubyString)) {
    +      throw context.getRuntime().newTypeError("data must be kind_of String");
    +    }
    +    if (!(encoding instanceof RubyString)) {
    +      throw context.getRuntime().newTypeError("data must be kind_of String");
    +    }
    +
         Html4SaxParserContext ctx = Html4SaxParserContext.newInstance(context.runtime, (RubyClass) klass);
         ctx.setInputSourceFile(context, data);
         String javaEncoding = findEncodingName(context, encoding);
    @@ -247,6 +254,10 @@ static EncodingType get(final int ordinal)
                IRubyObject data,
                IRubyObject encoding)
       {
    +    if (!(encoding instanceof RubyFixnum)) {
    +      throw context.getRuntime().newTypeError("encoding must be kind_of String");
    +    }
    +
         Html4SaxParserContext ctx = Html4SaxParserContext.newInstance(context.runtime, (RubyClass) klass);
         ctx.setIOInputSource(context, data, context.nil);
         String javaEncoding = findEncodingName(context, encoding);
    
  • ext/java/nokogiri/internals/ParserContext.java+7 1 modified
    @@ -60,6 +60,12 @@ public abstract class ParserContext extends RubyObject
         source = new InputSource();
         ParserContext.setUrl(context, source, url);
     
    +    Ruby ruby = context.getRuntime();
    +
    +    if (!(data.respondsTo("read"))) {
    +      throw ruby.newTypeError("must respond to :read");
    +    }
    +
         source.setByteStream(new IOInputStream(data));
         if (java_encoding != null) {
           source.setEncoding(java_encoding);
    @@ -75,7 +81,7 @@ public abstract class ParserContext extends RubyObject
         Ruby ruby = context.getRuntime();
     
         if (!(data instanceof RubyString)) {
    -      throw ruby.newArgumentError("must be kind_of String");
    +      throw ruby.newTypeError("must be kind_of String");
         }
     
         RubyString stringData = (RubyString) data;
    
  • ext/java/nokogiri/XmlSaxParserContext.java+5 2 modified
    @@ -131,9 +131,12 @@ public class XmlSaxParserContext extends ParserContext
       parse_io(ThreadContext context,
                IRubyObject klazz,
                IRubyObject data,
    -           IRubyObject enc)
    +           IRubyObject encoding)
       {
    -    //int encoding = (int)enc.convertToInteger().getLongValue();
    +    // check the type of the unused encoding to match behavior of CRuby
    +    if (!(encoding instanceof RubyFixnum)) {
    +      throw context.getRuntime().newTypeError("encoding must be kind_of String");
    +    }
         final Ruby runtime = context.runtime;
         XmlSaxParserContext ctx = newInstance(runtime, (RubyClass) klazz);
         ctx.initialize(runtime);
    
  • ext/nokogiri/html4_sax_parser_context.c+2 3 modified
    @@ -19,9 +19,8 @@ parse_memory(VALUE klass, VALUE data, VALUE encoding)
     {
       htmlParserCtxtPtr ctxt;
     
    -  if (NIL_P(data)) {
    -    rb_raise(rb_eArgError, "data cannot be nil");
    -  }
    +  Check_Type(data, T_STRING);
    +
       if (!(int)RSTRING_LEN(data)) {
         rb_raise(rb_eRuntimeError, "data cannot be empty");
       }
    
  • ext/nokogiri/xml_sax_parser_context.c+10 3 modified
    @@ -2,6 +2,8 @@
     
     VALUE cNokogiriXmlSaxParserContext ;
     
    +static ID id_read;
    +
     static void
     deallocate(xmlParserCtxtPtr ctxt)
     {
    @@ -26,6 +28,10 @@ parse_io(VALUE klass, VALUE io, VALUE encoding)
       xmlParserCtxtPtr ctxt;
       xmlCharEncoding enc = (xmlCharEncoding)NUM2INT(encoding);
     
    +  if (!rb_respond_to(io, id_read)) {
    +    rb_raise(rb_eTypeError, "argument expected to respond to :read");
    +  }
    +
       ctxt = xmlCreateIOParserCtxt(NULL, NULL,
                                    (xmlInputReadCallback)noko_io_read,
                                    (xmlInputCloseCallback)noko_io_close,
    @@ -62,9 +68,8 @@ parse_memory(VALUE klass, VALUE data)
     {
       xmlParserCtxtPtr ctxt;
     
    -  if (NIL_P(data)) {
    -    rb_raise(rb_eArgError, "data cannot be nil");
    -  }
    +  Check_Type(data, T_STRING);
    +
       if (!(int)RSTRING_LEN(data)) {
         rb_raise(rb_eRuntimeError, "data cannot be empty");
       }
    @@ -278,4 +283,6 @@ noko_init_xml_sax_parser_context()
       rb_define_method(cNokogiriXmlSaxParserContext, "recovery", get_recovery, 0);
       rb_define_method(cNokogiriXmlSaxParserContext, "line", line, 0);
       rb_define_method(cNokogiriXmlSaxParserContext, "column", column, 0);
    +
    +  id_read = rb_intern("read");
     }
    
  • lib/nokogiri/html4/sax/parser.rb+1 1 modified
    @@ -28,7 +28,7 @@ class Parser < Nokogiri::XML::SAX::Parser
             ###
             # Parse html stored in +data+ using +encoding+
             def parse_memory(data, encoding = "UTF-8")
    -          raise ArgumentError unless data
    +          raise TypeError unless String === data
               return if data.empty?
     
               ctx = ParserContext.memory(data, encoding)
    
  • test/html4/sax/test_parser_context.rb+9 0 modified
    @@ -40,6 +40,15 @@ def test_from_file
               ctx.parse_with(parser)
               # end
             end
    +
    +        def test_graceful_handling_of_invalid_types
    +          assert_raises(TypeError) { ParserContext.new(0xcafecafe) }
    +          assert_raises(TypeError) { ParserContext.memory(0xcafecafe, "UTF-8") }
    +          assert_raises(TypeError) { ParserContext.io(0xcafecafe, 1) }
    +          assert_raises(TypeError) { ParserContext.io(StringIO.new("asdf"), "should be an index into ENCODINGS") }
    +          assert_raises(TypeError) { ParserContext.file(0xcafecafe, "UTF-8") }
    +          assert_raises(TypeError) { ParserContext.file("path/to/file", 0xcafecafe) }
    +        end
           end
         end
       end
    
  • test/html4/sax/test_parser.rb+7 1 modified
    @@ -54,7 +54,7 @@ def test_parse_file_with_dir
             end
     
             def test_parse_memory_nil
    -          assert_raises(ArgumentError) do
    +          assert_raises(TypeError) do
                 @parser.parse_memory(nil)
               end
             end
    @@ -161,6 +161,12 @@ def test_parsing_dom_error_from_io
             def test_empty_processing_instruction
               @parser.parse_memory("<strong>this will segfault<?strong>")
             end
    +
    +        it "handles invalid types gracefully" do
    +          assert_raises(TypeError) { Nokogiri::HTML::SAX::Parser.new.parse(0xcafecafe) }
    +          assert_raises(TypeError) { Nokogiri::HTML::SAX::Parser.new.parse_memory(0xcafecafe) }
    +          assert_raises(TypeError) { Nokogiri::HTML::SAX::Parser.new.parse_io(0xcafecafe) }
    +        end
           end
         end
       end
    
  • test/xml/sax/test_parser_context.rb+7 0 modified
    @@ -80,6 +80,13 @@ def test_recovery
               assert(pc.recovery)
             end
     
    +        def test_graceful_handling_of_invalid_types
    +          assert_raises(TypeError) { ParserContext.new(0xcafecafe) }
    +          assert_raises(TypeError) { ParserContext.memory(0xcafecafe) }
    +          assert_raises(TypeError) { ParserContext.io(0xcafecafe, 1) }
    +          assert_raises(TypeError) { ParserContext.io(StringIO.new("asdf"), "should be an index into ENCODINGS") }
    +        end
    +
             def test_from_io
               ctx = ParserContext.new(StringIO.new("fo"), "UTF-8")
               assert(ctx)
    
  • test/xml/sax/test_parser.rb+7 1 modified
    @@ -73,6 +73,12 @@ class TestCase
               end
             end
     
    +        it "handles invalid types gracefully" do
    +          assert_raises(TypeError) { Nokogiri::XML::SAX::Parser.new.parse(0xcafecafe) }
    +          assert_raises(TypeError) { Nokogiri::XML::SAX::Parser.new.parse_memory(0xcafecafe) }
    +          assert_raises(TypeError) { Nokogiri::XML::SAX::Parser.new.parse_io(0xcafecafe) }
    +        end
    +
             it :test_namespace_declaration_order_is_saved do
               parser.parse(<<~EOF)
                 <root xmlns:foo='http://foo.example.com/' xmlns='http://example.com/'>
    @@ -263,7 +269,7 @@ def call_parse_io_with_encoding(encoding)
             end
     
             it :test_render_parse_nil_param do
    -          assert_raises(ArgumentError) { parser.parse_memory(nil) }
    +          assert_raises(TypeError) { parser.parse_memory(nil) }
             end
     
             it :test_bad_encoding_args do
    

Vulnerability mechanics

Generated on May 9, 2026. Inputs: CWE entries + fix-commit diffs from this CVE's patches. Citations validated against bundle.

References

11

News mentions

0

No linked articles in our index yet.