VYPR
High severityNVD Advisory· Published Jul 5, 2022· Updated Apr 22, 2025

Incorrect handling of invalid surrogate pair characters in ujson

CVE-2022-31116

Description

UltraJSON before 5.4.0 improperly decodes lone and invalid surrogate characters in JSON strings, potentially leading to key confusion and dictionary value overwriting.

AI Insight

LLM-synthesized narrative grounded in this CVE's description and references.

UltraJSON before 5.4.0 improperly decodes lone and invalid surrogate characters in JSON strings, potentially leading to key confusion and dictionary value overwriting.

Vulnerability

Description

UltraJSON versions prior to 5.4.0 contain a vulnerability in their string decoding logic that mishandles JSON-escaped surrogate characters not forming a proper surrogate pair. The root cause is the decoder's incorrect treatment of lone high surrogates (e.g., \uD800) and invalid sequences, which led to dropped characters or improper pairing with subsequent surrogates [1][4]. This behavior deviates from the JSON specification, which expects such malformed input to be preserved or rejected.

Exploitation

Exploitation requires only that an application parses JSON from an untrusted source using an affected UltraJSON version. An attacker can craft a JSON payload containing lone surrogate escape sequences (e.g., "\uD800" or "\uD800hello") that, when decoded, corrupts the resulting string. This corruption can lead to key confusion or value overwriting in dictionaries by causing different keys to collide or by altering parsed values [2][4]. No authentication or special network position is necessary beyond delivering the malicious JSON to the parser.

Impact

Successful exploitation allows an attacker to manipulate the structure of parsed Python dictionaries, potentially overriding intended dictionary entries or causing logical errors in downstream processing. This could lead to security bypasses, data integrity violations, or unexpected application behavior [2][4]. The advisory notes that both string corruption and dictionary manipulation are possible outcomes.

Mitigation

The vulnerability is fixed in UltraJSON version 5.4.0, which now decodes lone surrogates consistently with Python's standard library json module by preserving them in the output [1][4]. No known workarounds exist; users who cannot upgrade are advised to switch to an alternative JSON library (e.g., orjson) or restrict input sources [3].

AI Insight generated on May 21, 2026. Synthesized from this CVE's description and the cited reference URLs; citations are validated against the source bundle.

Affected packages

Versions sourced from the GitHub Security Advisory.

PackageAffected versionsPatched versions
ujsonPyPI
< 5.4.05.4.0

Affected products

9

Patches

1
67ec07183342

Merge pull request #555 from JustAnotherArchivist/fix-decode-surrogates-2

https://github.com/ultrajson/ultrajsonHugo van KemenadeJun 20, 2022via ghsa
4 files changed · +37 58
  • lib/ultrajsondec.c+25 45 modified
    @@ -41,7 +41,6 @@ Numeric decoder derived from from TCL library
     #include <assert.h>
     #include <string.h>
     #include <limits.h>
    -#include <wchar.h>
     #include <stdlib.h>
     #include <errno.h>
     #include <stdint.h>
    @@ -58,8 +57,8 @@ struct DecoderState
     {
       char *start;
       char *end;
    -  wchar_t *escStart;
    -  wchar_t *escEnd;
    +  JSUINT32 *escStart;
    +  JSUINT32 *escEnd;
       int escHeap;
       int lastType;
       JSUINT32 objDepth;
    @@ -361,14 +360,12 @@ static const JSUINT8 g_decoderLookup[256] =
     static FASTCALL_ATTR JSOBJ FASTCALL_MSVC decode_string ( struct DecoderState *ds)
     {
       int index;
    -  wchar_t *escOffset;
    -  wchar_t *escStart;
    +  JSUINT32 *escOffset;
    +  JSUINT32 *escStart;
       size_t escLen = (ds->escEnd - ds->escStart);
       JSUINT8 *inputOffset;
       JSUTF16 ch = 0;
    -#if WCHAR_MAX >= 0x10FFFF
       JSUINT8 *lastHighSurrogate = NULL;
    -#endif
       JSUINT8 oct;
       JSUTF32 ucs;
       ds->lastType = JT_INVALID;
    @@ -380,11 +377,11 @@ static FASTCALL_ATTR JSOBJ FASTCALL_MSVC decode_string ( struct DecoderState *ds
     
         if (ds->escHeap)
         {
    -      if (newSize > (SIZE_MAX / sizeof(wchar_t)))
    +      if (newSize > (SIZE_MAX / sizeof(JSUINT32)))
           {
             return SetError(ds, -1, "Could not reserve memory block");
           }
    -      escStart = (wchar_t *)ds->dec->realloc(ds->escStart, newSize * sizeof(wchar_t));
    +      escStart = (JSUINT32 *)ds->dec->realloc(ds->escStart, newSize * sizeof(JSUINT32));
           if (!escStart)
           {
             ds->dec->free(ds->escStart);
    @@ -394,18 +391,18 @@ static FASTCALL_ATTR JSOBJ FASTCALL_MSVC decode_string ( struct DecoderState *ds
         }
         else
         {
    -      wchar_t *oldStart = ds->escStart;
    -      if (newSize > (SIZE_MAX / sizeof(wchar_t)))
    +      JSUINT32 *oldStart = ds->escStart;
    +      if (newSize > (SIZE_MAX / sizeof(JSUINT32)))
           {
             return SetError(ds, -1, "Could not reserve memory block");
           }
    -      ds->escStart = (wchar_t *) ds->dec->malloc(newSize * sizeof(wchar_t));
    +      ds->escStart = (JSUINT32 *) ds->dec->malloc(newSize * sizeof(JSUINT32));
           if (!ds->escStart)
           {
             return SetError(ds, -1, "Could not reserve memory block");
           }
           ds->escHeap = 1;
    -      memcpy(ds->escStart, oldStart, escLen * sizeof(wchar_t));
    +      memcpy(ds->escStart, oldStart, escLen * sizeof(JSUINT32));
         }
     
         ds->escEnd = ds->escStart + newSize;
    @@ -438,14 +435,14 @@ static FASTCALL_ATTR JSOBJ FASTCALL_MSVC decode_string ( struct DecoderState *ds
             inputOffset ++;
             switch (*inputOffset)
             {
    -          case '\\': *(escOffset++) = L'\\'; inputOffset++; continue;
    -          case '\"': *(escOffset++) = L'\"'; inputOffset++; continue;
    -          case '/':  *(escOffset++) = L'/';  inputOffset++; continue;
    -          case 'b':  *(escOffset++) = L'\b'; inputOffset++; continue;
    -          case 'f':  *(escOffset++) = L'\f'; inputOffset++; continue;
    -          case 'n':  *(escOffset++) = L'\n'; inputOffset++; continue;
    -          case 'r':  *(escOffset++) = L'\r'; inputOffset++; continue;
    -          case 't':  *(escOffset++) = L'\t'; inputOffset++; continue;
    +          case '\\': *(escOffset++) = '\\'; inputOffset++; continue;
    +          case '\"': *(escOffset++) = '\"'; inputOffset++; continue;
    +          case '/':  *(escOffset++) = '/';  inputOffset++; continue;
    +          case 'b':  *(escOffset++) = '\b'; inputOffset++; continue;
    +          case 'f':  *(escOffset++) = '\f'; inputOffset++; continue;
    +          case 'n':  *(escOffset++) = '\n'; inputOffset++; continue;
    +          case 'r':  *(escOffset++) = '\r'; inputOffset++; continue;
    +          case 't':  *(escOffset++) = '\t'; inputOffset++; continue;
     
               case 'u':
               {
    @@ -494,24 +491,20 @@ static FASTCALL_ATTR JSOBJ FASTCALL_MSVC decode_string ( struct DecoderState *ds
                   inputOffset ++;
                 }
     
    -#if WCHAR_MAX >= 0x10FFFF
                 if ((ch & 0xfc00) == 0xdc00 && lastHighSurrogate == inputOffset - 6 * sizeof(*inputOffset))
                 {
                   // Low surrogate immediately following a high surrogate
                   // Overwrite existing high surrogate with combined character
                   *(escOffset-1) = (((*(escOffset-1) - 0xd800) <<10) | (ch - 0xdc00)) + 0x10000;
                 }
                 else
    -#endif
                 {
    -              *(escOffset++) = (wchar_t) ch;
    +              *(escOffset++) = (JSUINT32) ch;
                 }
    -#if WCHAR_MAX >= 0x10FFFF
                 if ((ch & 0xfc00) == 0xd800)
                 {
                   lastHighSurrogate = inputOffset;
                 }
    -#endif
                 break;
               }
     
    @@ -523,7 +516,7 @@ static FASTCALL_ATTR JSOBJ FASTCALL_MSVC decode_string ( struct DecoderState *ds
     
           case 1:
           {
    -        *(escOffset++) = (wchar_t) (*inputOffset++);
    +        *(escOffset++) = (JSUINT32) (*inputOffset++);
             break;
           }
     
    @@ -537,7 +530,7 @@ static FASTCALL_ATTR JSOBJ FASTCALL_MSVC decode_string ( struct DecoderState *ds
             }
             ucs |= (*inputOffset++) & 0x3f;
             if (ucs < 0x80) return SetError (ds, -1, "Overlong 2 byte UTF-8 sequence detected when decoding 'string'");
    -        *(escOffset++) = (wchar_t) ucs;
    +        *(escOffset++) = (JSUINT32) ucs;
             break;
           }
     
    @@ -560,7 +553,7 @@ static FASTCALL_ATTR JSOBJ FASTCALL_MSVC decode_string ( struct DecoderState *ds
             }
     
             if (ucs < 0x800) return SetError (ds, -1, "Overlong 3 byte UTF-8 sequence detected when encoding string");
    -        *(escOffset++) = (wchar_t) ucs;
    +        *(escOffset++) = (JSUINT32) ucs;
             break;
           }
     
    @@ -584,20 +577,7 @@ static FASTCALL_ATTR JSOBJ FASTCALL_MSVC decode_string ( struct DecoderState *ds
     
             if (ucs < 0x10000) return SetError (ds, -1, "Overlong 4 byte UTF-8 sequence detected when decoding 'string'");
     
    -#if WCHAR_MAX == 0xffff
    -        if (ucs >= 0x10000)
    -        {
    -          ucs -= 0x10000;
    -          *(escOffset++) = (wchar_t) (ucs >> 10) + 0xd800;
    -          *(escOffset++) = (wchar_t) (ucs & 0x3ff) + 0xdc00;
    -        }
    -        else
    -        {
    -          *(escOffset++) = (wchar_t) ucs;
    -        }
    -#else
    -        *(escOffset++) = (wchar_t) ucs;
    -#endif
    +        *(escOffset++) = (JSUINT32) ucs;
             break;
           }
         }
    @@ -810,14 +790,14 @@ JSOBJ JSON_DecodeObject(JSONObjectDecoder *dec, const char *buffer, size_t cbBuf
       /*
       FIXME: Base the size of escBuffer of that of cbBuffer so that the unicode escaping doesn't run into the wall each time */
       struct DecoderState ds;
    -  wchar_t escBuffer[(JSON_MAX_STACK_BUFFER_SIZE / sizeof(wchar_t))];
    +  JSUINT32 escBuffer[(JSON_MAX_STACK_BUFFER_SIZE / sizeof(JSUINT32))];
       JSOBJ ret;
     
       ds.start = (char *) buffer;
       ds.end = ds.start + cbBuffer;
     
       ds.escStart = escBuffer;
    -  ds.escEnd = ds.escStart + (JSON_MAX_STACK_BUFFER_SIZE / sizeof(wchar_t));
    +  ds.escEnd = ds.escStart + (JSON_MAX_STACK_BUFFER_SIZE / sizeof(JSUINT32));
       ds.escHeap = 0;
       ds.prv = dec->prv;
       ds.dec = dec;
    
  • lib/ultrajson.h+1 2 modified
    @@ -54,7 +54,6 @@ tree doesn't have cyclic references.
     #define __ULTRAJSON_H__
     
     #include <stdio.h>
    -#include <wchar.h>
     
     // Max decimals to encode double floating point numbers with
     #ifndef JSON_DOUBLE_MAX_DECIMALS
    @@ -318,7 +317,7 @@ EXPORTFUNCTION char *JSON_EncodeObject(JSOBJ obj, JSONObjectEncoder *enc, char *
     
     typedef struct __JSONObjectDecoder
     {
    -  JSOBJ (*newString)(void *prv, wchar_t *start, wchar_t *end);
    +  JSOBJ (*newString)(void *prv, JSUINT32 *start, JSUINT32 *end);
       void (*objectAddKey)(void *prv, JSOBJ obj, JSOBJ name, JSOBJ value);
       void (*arrayAddItem)(void *prv, JSOBJ obj, JSOBJ value);
       JSOBJ (*newTrue)(void *prv);
    
  • python/JSONtoObj.c+11 2 modified
    @@ -59,9 +59,18 @@ static void Object_arrayAddItem(void *prv, JSOBJ obj, JSOBJ value)
       return;
     }
     
    -static JSOBJ Object_newString(void *prv, wchar_t *start, wchar_t *end)
    +/*
    +Check that Py_UCS4 is the same as JSUINT32, else Object_newString will fail.
    +Based on Linux's check in vbox_vmmdev_types.h.
    +This should be replaced with
    +  _Static_assert(sizeof(Py_UCS4) == sizeof(JSUINT32));
    +when C11 is made mandatory (CPython 3.11+, PyPy ?).
    +*/
    +typedef char assert_py_ucs4_is_jsuint32[1 - 2*!(sizeof(Py_UCS4) == sizeof(JSUINT32))];
    +
    +static JSOBJ Object_newString(void *prv, JSUINT32 *start, JSUINT32 *end)
     {
    -  return PyUnicode_FromWideChar (start, (end - start));
    +  return PyUnicode_FromKindAndData (PyUnicode_4BYTE_KIND, (Py_UCS4 *) start, (end - start));
     }
     
     static JSOBJ Object_newTrue(void *prv)
    
  • tests/test_ujson.py+0 9 modified
    @@ -1,4 +1,3 @@
    -import ctypes
     import datetime as dt
     import decimal
     import io
    @@ -515,10 +514,6 @@ def test_encode_surrogate_characters():
         assert ujson.dumps({"\ud800": "\udfff"}, ensure_ascii=False, sort_keys=True) == out2
     
     
    -@pytest.mark.xfail(
    -    hasattr(sys, "pypy_version_info") and os.name == "nt",
    -    reason="This feature needs fixing! See #552",
    -)
     @pytest.mark.parametrize(
         "test_input, expected",
         [
    @@ -543,10 +538,6 @@ def test_encode_surrogate_characters():
         ],
     )
     def test_decode_surrogate_characters(test_input, expected):
    -    # FIXME Wrong output (combined char) on platforms with 16-bit wchar_t
    -    if test_input == '"\uD83D\uDCA9"' and ctypes.sizeof(ctypes.c_wchar) == 2:
    -        pytest.skip("Raw surrogate pairs are not supported with 16-bit wchar_t")
    -
         assert ujson.loads(test_input) == expected
         assert ujson.loads(test_input.encode("utf-8", "surrogatepass")) == expected
     
    

Vulnerability mechanics

Generated on May 9, 2026. Inputs: CWE entries + fix-commit diffs from this CVE's patches. Citations validated against bundle.

References

8

News mentions

0

No linked articles in our index yet.