Improper Handling of Unexpected Data Type in Nokogiri
Description
Nokogiri is an open source XML and HTML library for Ruby. Nokogiri prior to version 1.13.6 does not type-check all inputs into the XML and HTML4 SAX parsers, allowing specially crafted untrusted inputs to cause illegal memory access errors (segfault) or reads from unrelated memory. Version 1.13.6 contains a patch for this issue. As a workaround, ensure the untrusted input is a String by calling #to_s or equivalent.
AI Insight
LLM-synthesized narrative grounded in this CVE's description and references.
Nokogiri prior to 1.13.6 lacks type-checking in SAX parsers, allowing crafted input to cause segfaults or read from unintended memory.
Vulnerability
Nokogiri, an open-source XML and HTML library for Ruby, prior to version 1.13.6 does not type-check all inputs into the XML::SAX::Parser, XML::SAX::ParserContext, HTML4::SAX::Parser, and HTML4::SAX::ParserContext constructors. This flaw allows specially crafted untrusted inputs that are not a String to trigger illegal memory access errors (segfaults) or reads from unrelated memory, as described in the official advisory [2] and the fix commit [4]. Affected versions are all releases before 1.13.6.
Exploitation
An attacker can exploit this vulnerability by providing a non-String object (e.g., an array, integer, or custom object) to one of the vulnerable SAX parser constructors. The attacker needs no special network position beyond the ability to supply input to a Nokogiri-using application; no authentication or user interaction is required beyond the application processing the untrusted input. The lack of type checking in the C extension causes the parser to treat the malformed Ruby object as a pointer, leading to memory access errors [2][3].
Impact
Successful exploitation can cause a segmentation fault (segfault), resulting in a denial-of-service (DoS) condition. More critically, the flaw may allow reads from unrelated memory, potentially leaking sensitive information from adjacent memory regions. The impact is limited to the process's memory space; no consistent remote code execution (RCE) has been demonstrated, but information disclosure is possible [2][4].
Mitigation
Nokogiri version 1.13.6 and later contain a patch that properly type-checks inputs and raises a TypeError if the argument is not a String. Users should upgrade to 1.13.6 or later immediately. As a workaround, ensure that untrusted input passed to SAX parsers is converted to a String by calling #to_s or an equivalent method before parsing [2][4].
AI Insight generated on May 21, 2026. Synthesized from this CVE's description and the cited reference URLs; citations are validated against the source bundle.
Affected packages
Versions sourced from the GitHub Security Advisory.
| Package | Affected versions | Patched versions |
|---|---|---|
nokogiriRubyGems | < 1.13.6 | 1.13.6 |
Affected products
13- ghsa-coords12 versionspkg:gem/nokogiripkg:rpm/opensuse/ruby3.2-rubygem-nokogiri&distro=openSUSE%20Tumbleweedpkg:rpm/opensuse/rubygem-nokogiri&distro=openSUSE%20Leap%2015.3pkg:rpm/opensuse/rubygem-nokogiri&distro=openSUSE%20Leap%2015.4pkg:rpm/opensuse/rubygem-nokogiri&distro=openSUSE%20Tumbleweedpkg:rpm/suse/rubygem-nokogiri&distro=SUSE%20Linux%20Enterprise%20High%20Availability%20Extension%2015pkg:rpm/suse/rubygem-nokogiri&distro=SUSE%20Linux%20Enterprise%20High%20Availability%20Extension%2015%20SP1pkg:rpm/suse/rubygem-nokogiri&distro=SUSE%20Linux%20Enterprise%20High%20Availability%20Extension%2015%20SP2pkg:rpm/suse/rubygem-nokogiri&distro=SUSE%20Linux%20Enterprise%20Module%20for%20Basesystem%2015%20SP3pkg:rpm/suse/rubygem-nokogiri&distro=SUSE%20Linux%20Enterprise%20Module%20for%20Basesystem%2015%20SP4pkg:rpm/suse/rubygem-nokogiri&distro=SUSE%20OpenStack%20Cloud%20Crowbar%208pkg:rpm/suse/rubygem-nokogiri&distro=SUSE%20OpenStack%20Cloud%20Crowbar%209
< 1.13.6+ 11 more
- (no CPE)range: < 1.13.6
- (no CPE)range: < 1.13.9-1.7
- (no CPE)range: < 1.8.5-150000.3.9.1
- (no CPE)range: < 1.8.5-150400.14.3.1
- (no CPE)range: < 1.13.6-1.1
- (no CPE)range: < 1.8.5-150000.3.9.1
- (no CPE)range: < 1.8.5-150000.3.9.1
- (no CPE)range: < 1.8.5-150000.3.9.1
- (no CPE)range: < 1.8.5-150000.3.9.1
- (no CPE)range: < 1.8.5-150400.14.3.1
- (no CPE)range: < 1.6.1-5.6.1
- (no CPE)range: < 1.6.1-5.6.1
- sparklemotion/nokogiriv5Range: < 1.13.6
Patches
283cc451c3f29fix: {HTML4,XML}::SAX::{Parser,ParserContext} check arg types
10 files changed · +66 −12
ext/java/nokogiri/Html4SaxParserContext.java+11 −0 modified@@ -231,6 +231,13 @@ static EncodingType get(final int ordinal) IRubyObject data, IRubyObject encoding) { + if (!(data instanceof RubyString)) { + throw context.getRuntime().newTypeError("data must be kind_of String"); + } + if (!(encoding instanceof RubyString)) { + throw context.getRuntime().newTypeError("data must be kind_of String"); + } + Html4SaxParserContext ctx = Html4SaxParserContext.newInstance(context.runtime, (RubyClass) klass); ctx.setInputSourceFile(context, data); String javaEncoding = findEncodingName(context, encoding); @@ -247,6 +254,10 @@ static EncodingType get(final int ordinal) IRubyObject data, IRubyObject encoding) { + if (!(encoding instanceof RubyFixnum)) { + throw context.getRuntime().newTypeError("encoding must be kind_of String"); + } + Html4SaxParserContext ctx = Html4SaxParserContext.newInstance(context.runtime, (RubyClass) klass); ctx.setIOInputSource(context, data, context.nil); String javaEncoding = findEncodingName(context, encoding);
ext/java/nokogiri/internals/ParserContext.java+7 −1 modified@@ -58,6 +58,12 @@ public abstract class ParserContext extends RubyObject source = new InputSource(); ParserContext.setUrl(context, source, url); + Ruby ruby = context.getRuntime(); + + if (!(data.respondsTo("read"))) { + throw ruby.newTypeError("must respond to :read"); + } + source.setByteStream(new IOInputStream(data)); if (java_encoding != null) { source.setEncoding(java_encoding); @@ -73,7 +79,7 @@ public abstract class ParserContext extends RubyObject Ruby ruby = context.getRuntime(); if (!(data instanceof RubyString)) { - throw ruby.newArgumentError("must be kind_of String"); + throw ruby.newTypeError("must be kind_of String"); } RubyString stringData = (RubyString) data;
ext/java/nokogiri/XmlSaxParserContext.java+5 −2 modified@@ -130,9 +130,12 @@ public class XmlSaxParserContext extends ParserContext parse_io(ThreadContext context, IRubyObject klazz, IRubyObject data, - IRubyObject enc) + IRubyObject encoding) { - //int encoding = (int)enc.convertToInteger().getLongValue(); + // check the type of the unused encoding to match behavior of CRuby + if (!(encoding instanceof RubyFixnum)) { + throw context.getRuntime().newTypeError("encoding must be kind_of String"); + } final Ruby runtime = context.runtime; XmlSaxParserContext ctx = newInstance(runtime, (RubyClass) klazz); ctx.initialize(runtime);
ext/nokogiri/html4_sax_parser_context.c+2 −3 modified@@ -19,9 +19,8 @@ parse_memory(VALUE klass, VALUE data, VALUE encoding) { htmlParserCtxtPtr ctxt; - if (NIL_P(data)) { - rb_raise(rb_eArgError, "data cannot be nil"); - } + Check_Type(data, T_STRING); + if (!(int)RSTRING_LEN(data)) { rb_raise(rb_eRuntimeError, "data cannot be empty"); }
ext/nokogiri/xml_sax_parser_context.c+10 −3 modified@@ -2,6 +2,8 @@ VALUE cNokogiriXmlSaxParserContext ; +static ID id_read; + static void deallocate(xmlParserCtxtPtr ctxt) { @@ -26,6 +28,10 @@ parse_io(VALUE klass, VALUE io, VALUE encoding) xmlParserCtxtPtr ctxt; xmlCharEncoding enc = (xmlCharEncoding)NUM2INT(encoding); + if (!rb_respond_to(io, id_read)) { + rb_raise(rb_eTypeError, "argument expected to respond to :read"); + } + ctxt = xmlCreateIOParserCtxt(NULL, NULL, (xmlInputReadCallback)noko_io_read, (xmlInputCloseCallback)noko_io_close, @@ -62,9 +68,8 @@ parse_memory(VALUE klass, VALUE data) { xmlParserCtxtPtr ctxt; - if (NIL_P(data)) { - rb_raise(rb_eArgError, "data cannot be nil"); - } + Check_Type(data, T_STRING); + if (!(int)RSTRING_LEN(data)) { rb_raise(rb_eRuntimeError, "data cannot be empty"); } @@ -278,4 +283,6 @@ noko_init_xml_sax_parser_context() rb_define_method(cNokogiriXmlSaxParserContext, "recovery", get_recovery, 0); rb_define_method(cNokogiriXmlSaxParserContext, "line", line, 0); rb_define_method(cNokogiriXmlSaxParserContext, "column", column, 0); + + id_read = rb_intern("read"); }
lib/nokogiri/html4/sax/parser.rb+1 −1 modified@@ -28,7 +28,7 @@ class Parser < Nokogiri::XML::SAX::Parser ### # Parse html stored in +data+ using +encoding+ def parse_memory(data, encoding = "UTF-8") - raise ArgumentError unless data + raise TypeError unless String === data return if data.empty? ctx = ParserContext.memory(data, encoding)
test/html4/sax/test_parser_context.rb+9 −0 modified@@ -40,6 +40,15 @@ def test_from_file ctx.parse_with(parser) # end end + + def test_graceful_handling_of_invalid_types + assert_raises(TypeError) { ParserContext.new(0xcafecafe) } + assert_raises(TypeError) { ParserContext.memory(0xcafecafe, "UTF-8") } + assert_raises(TypeError) { ParserContext.io(0xcafecafe, 1) } + assert_raises(TypeError) { ParserContext.io(StringIO.new("asdf"), "should be an index into ENCODINGS") } + assert_raises(TypeError) { ParserContext.file(0xcafecafe, "UTF-8") } + assert_raises(TypeError) { ParserContext.file("path/to/file", 0xcafecafe) } + end end end end
test/html4/sax/test_parser.rb+7 −1 modified@@ -54,7 +54,7 @@ def test_parse_file_with_dir end def test_parse_memory_nil - assert_raises(ArgumentError) do + assert_raises(TypeError) do @parser.parse_memory(nil) end end @@ -161,6 +161,12 @@ def test_parsing_dom_error_from_io def test_empty_processing_instruction @parser.parse_memory("<strong>this will segfault<?strong>") end + + it "handles invalid types gracefully" do + assert_raises(TypeError) { Nokogiri::HTML::SAX::Parser.new.parse(0xcafecafe) } + assert_raises(TypeError) { Nokogiri::HTML::SAX::Parser.new.parse_memory(0xcafecafe) } + assert_raises(TypeError) { Nokogiri::HTML::SAX::Parser.new.parse_io(0xcafecafe) } + end end end end
test/xml/sax/test_parser_context.rb+7 −0 modified@@ -80,6 +80,13 @@ def test_recovery assert(pc.recovery) end + def test_graceful_handling_of_invalid_types + assert_raises(TypeError) { ParserContext.new(0xcafecafe) } + assert_raises(TypeError) { ParserContext.memory(0xcafecafe) } + assert_raises(TypeError) { ParserContext.io(0xcafecafe, 1) } + assert_raises(TypeError) { ParserContext.io(StringIO.new("asdf"), "should be an index into ENCODINGS") } + end + def test_from_io ctx = ParserContext.new(StringIO.new("fo"), "UTF-8") assert(ctx)
test/xml/sax/test_parser.rb+7 −1 modified@@ -71,6 +71,12 @@ class Nokogiri::SAX::TestCase end end + it "handles invalid types gracefully" do + assert_raises(TypeError) { Nokogiri::XML::SAX::Parser.new.parse(0xcafecafe) } + assert_raises(TypeError) { Nokogiri::XML::SAX::Parser.new.parse_memory(0xcafecafe) } + assert_raises(TypeError) { Nokogiri::XML::SAX::Parser.new.parse_io(0xcafecafe) } + end + it :test_namespace_declaration_order_is_saved do parser.parse(<<~EOF) <root xmlns:foo='http://foo.example.com/' xmlns='http://example.com/'> @@ -261,7 +267,7 @@ def call_parse_io_with_encoding(encoding) end it :test_render_parse_nil_param do - assert_raises(ArgumentError) { parser.parse_memory(nil) } + assert_raises(TypeError) { parser.parse_memory(nil) } end it :test_bad_encoding_args do
db05ba9a1bd4fix: {HTML4,XML}::SAX::{Parser,ParserContext} check arg types
11 files changed · +68 −13
CHANGELOG.md+2 −1 modified@@ -28,6 +28,7 @@ This version of Nokogiri uses [`jar-dependencies`](https://github.com/mkristian/ * [CRuby] UTF-16-encoded documents longer than ~4000 code points now serialize properly. Previously the serialized document was corrupted when it exceeded the length of libxml2's internal string buffer. [[#752](https://github.com/sparklemotion/nokogiri/issues/752)] * [HTML5] The Gumbo parser now correctly handles text at the end of `form` elements. +* `{HTML4,XML}::SAX::{Parser,ParserContext}` constructor methods now raise `TypeError` instead of segfaulting when an incorrect type is passed. (Thanks to [@agustingianni](https://github.com/agustingianni) from the Github Security Lab for reporting!) ### Improved @@ -36,7 +37,7 @@ This version of Nokogiri uses [`jar-dependencies`](https://github.com/mkristian/ * Avoid compile-time conflict with system-installed `gumbo.h` on OpenBSD. [[#2464](https://github.com/sparklemotion/nokogiri/issues/2464)] * Remove calls to `vasprintf` in favor of platform-independent `rb_vsprintf` * Prefer `ruby_xmalloc` to `malloc` within the C extension. [[#2480](https://github.com/sparklemotion/nokogiri/issues/2480)] (Thanks, [@Garfield96](https://github.com/Garfield96)!) -* Installation from source on systems missing libiconv will once again generate a helpful error message (broken since v1.11.0). [#2505] +* Installation from source on systems missing libiconv will once again generate a helpful error message (broken since v1.11.0). [[#2505](https://github.com/sparklemotion/nokogiri/issues/2505)] ## 1.13.5 / 2022-05-04
ext/java/nokogiri/Html4SaxParserContext.java+11 −0 modified@@ -231,6 +231,13 @@ static EncodingType get(final int ordinal) IRubyObject data, IRubyObject encoding) { + if (!(data instanceof RubyString)) { + throw context.getRuntime().newTypeError("data must be kind_of String"); + } + if (!(encoding instanceof RubyString)) { + throw context.getRuntime().newTypeError("data must be kind_of String"); + } + Html4SaxParserContext ctx = Html4SaxParserContext.newInstance(context.runtime, (RubyClass) klass); ctx.setInputSourceFile(context, data); String javaEncoding = findEncodingName(context, encoding); @@ -247,6 +254,10 @@ static EncodingType get(final int ordinal) IRubyObject data, IRubyObject encoding) { + if (!(encoding instanceof RubyFixnum)) { + throw context.getRuntime().newTypeError("encoding must be kind_of String"); + } + Html4SaxParserContext ctx = Html4SaxParserContext.newInstance(context.runtime, (RubyClass) klass); ctx.setIOInputSource(context, data, context.nil); String javaEncoding = findEncodingName(context, encoding);
ext/java/nokogiri/internals/ParserContext.java+7 −1 modified@@ -60,6 +60,12 @@ public abstract class ParserContext extends RubyObject source = new InputSource(); ParserContext.setUrl(context, source, url); + Ruby ruby = context.getRuntime(); + + if (!(data.respondsTo("read"))) { + throw ruby.newTypeError("must respond to :read"); + } + source.setByteStream(new IOInputStream(data)); if (java_encoding != null) { source.setEncoding(java_encoding); @@ -75,7 +81,7 @@ public abstract class ParserContext extends RubyObject Ruby ruby = context.getRuntime(); if (!(data instanceof RubyString)) { - throw ruby.newArgumentError("must be kind_of String"); + throw ruby.newTypeError("must be kind_of String"); } RubyString stringData = (RubyString) data;
ext/java/nokogiri/XmlSaxParserContext.java+5 −2 modified@@ -131,9 +131,12 @@ public class XmlSaxParserContext extends ParserContext parse_io(ThreadContext context, IRubyObject klazz, IRubyObject data, - IRubyObject enc) + IRubyObject encoding) { - //int encoding = (int)enc.convertToInteger().getLongValue(); + // check the type of the unused encoding to match behavior of CRuby + if (!(encoding instanceof RubyFixnum)) { + throw context.getRuntime().newTypeError("encoding must be kind_of String"); + } final Ruby runtime = context.runtime; XmlSaxParserContext ctx = newInstance(runtime, (RubyClass) klazz); ctx.initialize(runtime);
ext/nokogiri/html4_sax_parser_context.c+2 −3 modified@@ -19,9 +19,8 @@ parse_memory(VALUE klass, VALUE data, VALUE encoding) { htmlParserCtxtPtr ctxt; - if (NIL_P(data)) { - rb_raise(rb_eArgError, "data cannot be nil"); - } + Check_Type(data, T_STRING); + if (!(int)RSTRING_LEN(data)) { rb_raise(rb_eRuntimeError, "data cannot be empty"); }
ext/nokogiri/xml_sax_parser_context.c+10 −3 modified@@ -2,6 +2,8 @@ VALUE cNokogiriXmlSaxParserContext ; +static ID id_read; + static void deallocate(xmlParserCtxtPtr ctxt) { @@ -26,6 +28,10 @@ parse_io(VALUE klass, VALUE io, VALUE encoding) xmlParserCtxtPtr ctxt; xmlCharEncoding enc = (xmlCharEncoding)NUM2INT(encoding); + if (!rb_respond_to(io, id_read)) { + rb_raise(rb_eTypeError, "argument expected to respond to :read"); + } + ctxt = xmlCreateIOParserCtxt(NULL, NULL, (xmlInputReadCallback)noko_io_read, (xmlInputCloseCallback)noko_io_close, @@ -62,9 +68,8 @@ parse_memory(VALUE klass, VALUE data) { xmlParserCtxtPtr ctxt; - if (NIL_P(data)) { - rb_raise(rb_eArgError, "data cannot be nil"); - } + Check_Type(data, T_STRING); + if (!(int)RSTRING_LEN(data)) { rb_raise(rb_eRuntimeError, "data cannot be empty"); } @@ -278,4 +283,6 @@ noko_init_xml_sax_parser_context() rb_define_method(cNokogiriXmlSaxParserContext, "recovery", get_recovery, 0); rb_define_method(cNokogiriXmlSaxParserContext, "line", line, 0); rb_define_method(cNokogiriXmlSaxParserContext, "column", column, 0); + + id_read = rb_intern("read"); }
lib/nokogiri/html4/sax/parser.rb+1 −1 modified@@ -28,7 +28,7 @@ class Parser < Nokogiri::XML::SAX::Parser ### # Parse html stored in +data+ using +encoding+ def parse_memory(data, encoding = "UTF-8") - raise ArgumentError unless data + raise TypeError unless String === data return if data.empty? ctx = ParserContext.memory(data, encoding)
test/html4/sax/test_parser_context.rb+9 −0 modified@@ -40,6 +40,15 @@ def test_from_file ctx.parse_with(parser) # end end + + def test_graceful_handling_of_invalid_types + assert_raises(TypeError) { ParserContext.new(0xcafecafe) } + assert_raises(TypeError) { ParserContext.memory(0xcafecafe, "UTF-8") } + assert_raises(TypeError) { ParserContext.io(0xcafecafe, 1) } + assert_raises(TypeError) { ParserContext.io(StringIO.new("asdf"), "should be an index into ENCODINGS") } + assert_raises(TypeError) { ParserContext.file(0xcafecafe, "UTF-8") } + assert_raises(TypeError) { ParserContext.file("path/to/file", 0xcafecafe) } + end end end end
test/html4/sax/test_parser.rb+7 −1 modified@@ -54,7 +54,7 @@ def test_parse_file_with_dir end def test_parse_memory_nil - assert_raises(ArgumentError) do + assert_raises(TypeError) do @parser.parse_memory(nil) end end @@ -161,6 +161,12 @@ def test_parsing_dom_error_from_io def test_empty_processing_instruction @parser.parse_memory("<strong>this will segfault<?strong>") end + + it "handles invalid types gracefully" do + assert_raises(TypeError) { Nokogiri::HTML::SAX::Parser.new.parse(0xcafecafe) } + assert_raises(TypeError) { Nokogiri::HTML::SAX::Parser.new.parse_memory(0xcafecafe) } + assert_raises(TypeError) { Nokogiri::HTML::SAX::Parser.new.parse_io(0xcafecafe) } + end end end end
test/xml/sax/test_parser_context.rb+7 −0 modified@@ -80,6 +80,13 @@ def test_recovery assert(pc.recovery) end + def test_graceful_handling_of_invalid_types + assert_raises(TypeError) { ParserContext.new(0xcafecafe) } + assert_raises(TypeError) { ParserContext.memory(0xcafecafe) } + assert_raises(TypeError) { ParserContext.io(0xcafecafe, 1) } + assert_raises(TypeError) { ParserContext.io(StringIO.new("asdf"), "should be an index into ENCODINGS") } + end + def test_from_io ctx = ParserContext.new(StringIO.new("fo"), "UTF-8") assert(ctx)
test/xml/sax/test_parser.rb+7 −1 modified@@ -73,6 +73,12 @@ class TestCase end end + it "handles invalid types gracefully" do + assert_raises(TypeError) { Nokogiri::XML::SAX::Parser.new.parse(0xcafecafe) } + assert_raises(TypeError) { Nokogiri::XML::SAX::Parser.new.parse_memory(0xcafecafe) } + assert_raises(TypeError) { Nokogiri::XML::SAX::Parser.new.parse_io(0xcafecafe) } + end + it :test_namespace_declaration_order_is_saved do parser.parse(<<~EOF) <root xmlns:foo='http://foo.example.com/' xmlns='http://example.com/'> @@ -263,7 +269,7 @@ def call_parse_io_with_encoding(encoding) end it :test_render_parse_nil_param do - assert_raises(ArgumentError) { parser.parse_memory(nil) } + assert_raises(TypeError) { parser.parse_memory(nil) } end it :test_bad_encoding_args do
Vulnerability mechanics
Generated on May 9, 2026. Inputs: CWE entries + fix-commit diffs from this CVE's patches. Citations validated against bundle.
References
11- github.com/advisories/GHSA-xh29-r2w5-wx8mghsaADVISORY
- nvd.nist.gov/vuln/detail/CVE-2022-29181ghsaADVISORY
- securitylab.github.com/advisories/GHSL-2022-031_GHSL-2022-032_Nokogirighsax_refsource_MISCADVISORY
- seclists.org/fulldisclosure/2022/Dec/23ghsaWEB
- github.com/rubysec/ruby-advisory-db/blob/master/gems/nokogiri/CVE-2022-29181.ymlghsaWEB
- github.com/sparklemotion/nokogiri/commit/83cc451c3f29df397caa890afc3b714eae6ab8f7ghsax_refsource_MISCWEB
- github.com/sparklemotion/nokogiri/commit/db05ba9a1bd4b90aa6c76742cf6102a7c7297267ghsax_refsource_MISCWEB
- github.com/sparklemotion/nokogiri/releases/tag/v1.13.6ghsax_refsource_MISCWEB
- github.com/sparklemotion/nokogiri/security/advisories/GHSA-xh29-r2w5-wx8mghsax_refsource_CONFIRMWEB
- security.gentoo.org/glsa/202208-29ghsaWEB
- support.apple.com/kb/HT213532ghsaWEB
News mentions
0No linked articles in our index yet.