Incorrect handling of invalid surrogate pair characters in ujson
Description
UltraJSON before 5.4.0 improperly decodes lone and invalid surrogate characters in JSON strings, potentially leading to key confusion and dictionary value overwriting.
AI Insight
LLM-synthesized narrative grounded in this CVE's description and references.
UltraJSON before 5.4.0 improperly decodes lone and invalid surrogate characters in JSON strings, potentially leading to key confusion and dictionary value overwriting.
Vulnerability
Description
UltraJSON versions prior to 5.4.0 contain a vulnerability in their string decoding logic that mishandles JSON-escaped surrogate characters not forming a proper surrogate pair. The root cause is the decoder's incorrect treatment of lone high surrogates (e.g., \uD800) and invalid sequences, which led to dropped characters or improper pairing with subsequent surrogates [1][4]. This behavior deviates from the JSON specification, which expects such malformed input to be preserved or rejected.
Exploitation
Exploitation requires only that an application parses JSON from an untrusted source using an affected UltraJSON version. An attacker can craft a JSON payload containing lone surrogate escape sequences (e.g., "\uD800" or "\uD800hello") that, when decoded, corrupts the resulting string. This corruption can lead to key confusion or value overwriting in dictionaries by causing different keys to collide or by altering parsed values [2][4]. No authentication or special network position is necessary beyond delivering the malicious JSON to the parser.
Impact
Successful exploitation allows an attacker to manipulate the structure of parsed Python dictionaries, potentially overriding intended dictionary entries or causing logical errors in downstream processing. This could lead to security bypasses, data integrity violations, or unexpected application behavior [2][4]. The advisory notes that both string corruption and dictionary manipulation are possible outcomes.
Mitigation
The vulnerability is fixed in UltraJSON version 5.4.0, which now decodes lone surrogates consistently with Python's standard library json module by preserving them in the output [1][4]. No known workarounds exist; users who cannot upgrade are advised to switch to an alternative JSON library (e.g., orjson) or restrict input sources [3].
AI Insight generated on May 21, 2026. Synthesized from this CVE's description and the cited reference URLs; citations are validated against the source bundle.
Affected packages
Versions sourced from the GitHub Security Advisory.
| Package | Affected versions | Patched versions |
|---|---|---|
ujsonPyPI | < 5.4.0 | 5.4.0 |
Affected products
9- ghsa-coords8 versionspkg:pypi/ujsonpkg:rpm/opensuse/python-ujson&distro=openSUSE%20Leap%2015.3pkg:rpm/opensuse/python-ujson&distro=openSUSE%20Leap%2015.4pkg:rpm/opensuse/python-ujson&distro=openSUSE%20Tumbleweedpkg:rpm/suse/python-ujson&distro=SUSE%20Linux%20Enterprise%20Module%20for%20Development%20Tools%2015%20SP3pkg:rpm/suse/python-ujson&distro=SUSE%20Linux%20Enterprise%20Module%20for%20Development%20Tools%2015%20SP4pkg:rpm/suse/python-ujson&distro=SUSE%20Linux%20Enterprise%20Module%20for%20Package%20Hub%2015%20SP3pkg:rpm/suse/python-ujson&distro=SUSE%20Linux%20Enterprise%20Module%20for%20Package%20Hub%2015%20SP4
< 5.4.0+ 7 more
- (no CPE)range: < 5.4.0
- (no CPE)range: < 1.35-150100.3.5.1
- (no CPE)range: < 1.35-150100.3.5.1
- (no CPE)range: < 5.10.0-1.5
- (no CPE)range: < 1.35-150100.3.5.1
- (no CPE)range: < 1.35-150100.3.5.1
- (no CPE)range: < 1.35-150100.3.5.1
- (no CPE)range: < 1.35-150100.3.5.1
Patches
167ec07183342Merge pull request #555 from JustAnotherArchivist/fix-decode-surrogates-2
4 files changed · +37 −58
lib/ultrajsondec.c+25 −45 modified@@ -41,7 +41,6 @@ Numeric decoder derived from from TCL library #include <assert.h> #include <string.h> #include <limits.h> -#include <wchar.h> #include <stdlib.h> #include <errno.h> #include <stdint.h> @@ -58,8 +57,8 @@ struct DecoderState { char *start; char *end; - wchar_t *escStart; - wchar_t *escEnd; + JSUINT32 *escStart; + JSUINT32 *escEnd; int escHeap; int lastType; JSUINT32 objDepth; @@ -361,14 +360,12 @@ static const JSUINT8 g_decoderLookup[256] = static FASTCALL_ATTR JSOBJ FASTCALL_MSVC decode_string ( struct DecoderState *ds) { int index; - wchar_t *escOffset; - wchar_t *escStart; + JSUINT32 *escOffset; + JSUINT32 *escStart; size_t escLen = (ds->escEnd - ds->escStart); JSUINT8 *inputOffset; JSUTF16 ch = 0; -#if WCHAR_MAX >= 0x10FFFF JSUINT8 *lastHighSurrogate = NULL; -#endif JSUINT8 oct; JSUTF32 ucs; ds->lastType = JT_INVALID; @@ -380,11 +377,11 @@ static FASTCALL_ATTR JSOBJ FASTCALL_MSVC decode_string ( struct DecoderState *ds if (ds->escHeap) { - if (newSize > (SIZE_MAX / sizeof(wchar_t))) + if (newSize > (SIZE_MAX / sizeof(JSUINT32))) { return SetError(ds, -1, "Could not reserve memory block"); } - escStart = (wchar_t *)ds->dec->realloc(ds->escStart, newSize * sizeof(wchar_t)); + escStart = (JSUINT32 *)ds->dec->realloc(ds->escStart, newSize * sizeof(JSUINT32)); if (!escStart) { ds->dec->free(ds->escStart); @@ -394,18 +391,18 @@ static FASTCALL_ATTR JSOBJ FASTCALL_MSVC decode_string ( struct DecoderState *ds } else { - wchar_t *oldStart = ds->escStart; - if (newSize > (SIZE_MAX / sizeof(wchar_t))) + JSUINT32 *oldStart = ds->escStart; + if (newSize > (SIZE_MAX / sizeof(JSUINT32))) { return SetError(ds, -1, "Could not reserve memory block"); } - ds->escStart = (wchar_t *) ds->dec->malloc(newSize * sizeof(wchar_t)); + ds->escStart = (JSUINT32 *) ds->dec->malloc(newSize * sizeof(JSUINT32)); if (!ds->escStart) { return SetError(ds, -1, "Could not reserve memory block"); } ds->escHeap = 1; - memcpy(ds->escStart, oldStart, escLen * sizeof(wchar_t)); + memcpy(ds->escStart, oldStart, escLen * sizeof(JSUINT32)); } ds->escEnd = ds->escStart + newSize; @@ -438,14 +435,14 @@ static FASTCALL_ATTR JSOBJ FASTCALL_MSVC decode_string ( struct DecoderState *ds inputOffset ++; switch (*inputOffset) { - case '\\': *(escOffset++) = L'\\'; inputOffset++; continue; - case '\"': *(escOffset++) = L'\"'; inputOffset++; continue; - case '/': *(escOffset++) = L'/'; inputOffset++; continue; - case 'b': *(escOffset++) = L'\b'; inputOffset++; continue; - case 'f': *(escOffset++) = L'\f'; inputOffset++; continue; - case 'n': *(escOffset++) = L'\n'; inputOffset++; continue; - case 'r': *(escOffset++) = L'\r'; inputOffset++; continue; - case 't': *(escOffset++) = L'\t'; inputOffset++; continue; + case '\\': *(escOffset++) = '\\'; inputOffset++; continue; + case '\"': *(escOffset++) = '\"'; inputOffset++; continue; + case '/': *(escOffset++) = '/'; inputOffset++; continue; + case 'b': *(escOffset++) = '\b'; inputOffset++; continue; + case 'f': *(escOffset++) = '\f'; inputOffset++; continue; + case 'n': *(escOffset++) = '\n'; inputOffset++; continue; + case 'r': *(escOffset++) = '\r'; inputOffset++; continue; + case 't': *(escOffset++) = '\t'; inputOffset++; continue; case 'u': { @@ -494,24 +491,20 @@ static FASTCALL_ATTR JSOBJ FASTCALL_MSVC decode_string ( struct DecoderState *ds inputOffset ++; } -#if WCHAR_MAX >= 0x10FFFF if ((ch & 0xfc00) == 0xdc00 && lastHighSurrogate == inputOffset - 6 * sizeof(*inputOffset)) { // Low surrogate immediately following a high surrogate // Overwrite existing high surrogate with combined character *(escOffset-1) = (((*(escOffset-1) - 0xd800) <<10) | (ch - 0xdc00)) + 0x10000; } else -#endif { - *(escOffset++) = (wchar_t) ch; + *(escOffset++) = (JSUINT32) ch; } -#if WCHAR_MAX >= 0x10FFFF if ((ch & 0xfc00) == 0xd800) { lastHighSurrogate = inputOffset; } -#endif break; } @@ -523,7 +516,7 @@ static FASTCALL_ATTR JSOBJ FASTCALL_MSVC decode_string ( struct DecoderState *ds case 1: { - *(escOffset++) = (wchar_t) (*inputOffset++); + *(escOffset++) = (JSUINT32) (*inputOffset++); break; } @@ -537,7 +530,7 @@ static FASTCALL_ATTR JSOBJ FASTCALL_MSVC decode_string ( struct DecoderState *ds } ucs |= (*inputOffset++) & 0x3f; if (ucs < 0x80) return SetError (ds, -1, "Overlong 2 byte UTF-8 sequence detected when decoding 'string'"); - *(escOffset++) = (wchar_t) ucs; + *(escOffset++) = (JSUINT32) ucs; break; } @@ -560,7 +553,7 @@ static FASTCALL_ATTR JSOBJ FASTCALL_MSVC decode_string ( struct DecoderState *ds } if (ucs < 0x800) return SetError (ds, -1, "Overlong 3 byte UTF-8 sequence detected when encoding string"); - *(escOffset++) = (wchar_t) ucs; + *(escOffset++) = (JSUINT32) ucs; break; } @@ -584,20 +577,7 @@ static FASTCALL_ATTR JSOBJ FASTCALL_MSVC decode_string ( struct DecoderState *ds if (ucs < 0x10000) return SetError (ds, -1, "Overlong 4 byte UTF-8 sequence detected when decoding 'string'"); -#if WCHAR_MAX == 0xffff - if (ucs >= 0x10000) - { - ucs -= 0x10000; - *(escOffset++) = (wchar_t) (ucs >> 10) + 0xd800; - *(escOffset++) = (wchar_t) (ucs & 0x3ff) + 0xdc00; - } - else - { - *(escOffset++) = (wchar_t) ucs; - } -#else - *(escOffset++) = (wchar_t) ucs; -#endif + *(escOffset++) = (JSUINT32) ucs; break; } } @@ -810,14 +790,14 @@ JSOBJ JSON_DecodeObject(JSONObjectDecoder *dec, const char *buffer, size_t cbBuf /* FIXME: Base the size of escBuffer of that of cbBuffer so that the unicode escaping doesn't run into the wall each time */ struct DecoderState ds; - wchar_t escBuffer[(JSON_MAX_STACK_BUFFER_SIZE / sizeof(wchar_t))]; + JSUINT32 escBuffer[(JSON_MAX_STACK_BUFFER_SIZE / sizeof(JSUINT32))]; JSOBJ ret; ds.start = (char *) buffer; ds.end = ds.start + cbBuffer; ds.escStart = escBuffer; - ds.escEnd = ds.escStart + (JSON_MAX_STACK_BUFFER_SIZE / sizeof(wchar_t)); + ds.escEnd = ds.escStart + (JSON_MAX_STACK_BUFFER_SIZE / sizeof(JSUINT32)); ds.escHeap = 0; ds.prv = dec->prv; ds.dec = dec;
lib/ultrajson.h+1 −2 modified@@ -54,7 +54,6 @@ tree doesn't have cyclic references. #define __ULTRAJSON_H__ #include <stdio.h> -#include <wchar.h> // Max decimals to encode double floating point numbers with #ifndef JSON_DOUBLE_MAX_DECIMALS @@ -318,7 +317,7 @@ EXPORTFUNCTION char *JSON_EncodeObject(JSOBJ obj, JSONObjectEncoder *enc, char * typedef struct __JSONObjectDecoder { - JSOBJ (*newString)(void *prv, wchar_t *start, wchar_t *end); + JSOBJ (*newString)(void *prv, JSUINT32 *start, JSUINT32 *end); void (*objectAddKey)(void *prv, JSOBJ obj, JSOBJ name, JSOBJ value); void (*arrayAddItem)(void *prv, JSOBJ obj, JSOBJ value); JSOBJ (*newTrue)(void *prv);
python/JSONtoObj.c+11 −2 modified@@ -59,9 +59,18 @@ static void Object_arrayAddItem(void *prv, JSOBJ obj, JSOBJ value) return; } -static JSOBJ Object_newString(void *prv, wchar_t *start, wchar_t *end) +/* +Check that Py_UCS4 is the same as JSUINT32, else Object_newString will fail. +Based on Linux's check in vbox_vmmdev_types.h. +This should be replaced with + _Static_assert(sizeof(Py_UCS4) == sizeof(JSUINT32)); +when C11 is made mandatory (CPython 3.11+, PyPy ?). +*/ +typedef char assert_py_ucs4_is_jsuint32[1 - 2*!(sizeof(Py_UCS4) == sizeof(JSUINT32))]; + +static JSOBJ Object_newString(void *prv, JSUINT32 *start, JSUINT32 *end) { - return PyUnicode_FromWideChar (start, (end - start)); + return PyUnicode_FromKindAndData (PyUnicode_4BYTE_KIND, (Py_UCS4 *) start, (end - start)); } static JSOBJ Object_newTrue(void *prv)
tests/test_ujson.py+0 −9 modified@@ -1,4 +1,3 @@ -import ctypes import datetime as dt import decimal import io @@ -515,10 +514,6 @@ def test_encode_surrogate_characters(): assert ujson.dumps({"\ud800": "\udfff"}, ensure_ascii=False, sort_keys=True) == out2 -@pytest.mark.xfail( - hasattr(sys, "pypy_version_info") and os.name == "nt", - reason="This feature needs fixing! See #552", -) @pytest.mark.parametrize( "test_input, expected", [ @@ -543,10 +538,6 @@ def test_encode_surrogate_characters(): ], ) def test_decode_surrogate_characters(test_input, expected): - # FIXME Wrong output (combined char) on platforms with 16-bit wchar_t - if test_input == '"\uD83D\uDCA9"' and ctypes.sizeof(ctypes.c_wchar) == 2: - pytest.skip("Raw surrogate pairs are not supported with 16-bit wchar_t") - assert ujson.loads(test_input) == expected assert ujson.loads(test_input.encode("utf-8", "surrogatepass")) == expected
Vulnerability mechanics
Generated on May 9, 2026. Inputs: CWE entries + fix-commit diffs from this CVE's patches. Citations validated against bundle.
References
8- github.com/advisories/GHSA-wpqr-jcpx-745rghsaADVISORY
- lists.fedoraproject.org/archives/list/package-announce%40lists.fedoraproject.org/message/NAU5N4A7EUK2AMUCOLYDD5ARXAJYZBD2/mitrevendor-advisoryx_refsource_FEDORA
- lists.fedoraproject.org/archives/list/package-announce%40lists.fedoraproject.org/message/OPPU5FZP3LCTXYORFH7NHUMYA5X66IA7/mitrevendor-advisoryx_refsource_FEDORA
- nvd.nist.gov/vuln/detail/CVE-2022-31116ghsaADVISORY
- github.com/ultrajson/ultrajson/commit/67ec07183342589d602e0fcf7bb1ff3e19272687ghsax_refsource_MISCWEB
- github.com/ultrajson/ultrajson/security/advisories/GHSA-wpqr-jcpx-745rghsax_refsource_CONFIRMWEB
- lists.fedoraproject.org/archives/list/package-announce@lists.fedoraproject.org/message/NAU5N4A7EUK2AMUCOLYDD5ARXAJYZBD2ghsaWEB
- lists.fedoraproject.org/archives/list/package-announce@lists.fedoraproject.org/message/OPPU5FZP3LCTXYORFH7NHUMYA5X66IA7ghsaWEB
News mentions
0No linked articles in our index yet.