VYPR
Medium severity6.5NVD Advisory· Published Jun 19, 2026· Updated Jun 19, 2026

UltraJSON: Malformed/Truncated UTF-8 Accepted and Silently Rewritten in ujson.dumps()

CVE-2026-54911

Description

Summary

ujson.dumps() (or ujson.dump() or ujson.encode()) have a reject_bytes=False option. When set, they may accept malformed or truncated UTF-8 byte sequences, silently rewriting them into different Unicode characters instead of rejecting them. This leads to input validation bypass and data integrity issues.

Details

The expected behavior is that for x being any bytes string, x == ujson.loads(ujson.dumps(x, reject_bytes=False)).encode(errors="surrogatepass") should always either be true or ujson.dumps() will throw an exception. In reality, some strings which should've been errors are silently rewritten as other strings:

  • Invalid continuation bytes are replaced with valid ones: b'\xcf\x13' -> b'\xcf\x93'
  • Unterminated sequence completes the sequence: b'\xc3' -> b'\xc3\x80'
  • ... or leads to reading past the end of string: b'\xf0\x90\x94' -> b"\xf0\x90\x94\x80inxcontrib'"

Impact

An application relying on reject_bytes=False for UTF-8 handling may experience:

  • Data integrity issues
  • Experience validation bypass if said validation occurs before serialisation

Remediation

The missing/broken UTF-8 validation checks were added/fixed in https://github.com/ultrajson/ultrajson/commit/169eaf36b1116fece5034ee79a7a0ef3f6deedcf. We recommend upgrading to UltraJSON 5.13.0.

Workarounds

Decoding bytes to strings in Python before passing them to ujson.dumps() avoids this issue.

AI Insight

LLM-synthesized narrative grounded in this CVE's description and references.

Affected products

1

Patches

Vulnerability mechanics

Root cause

"Missing UTF-8 validation in the encoder allows malformed byte sequences to be silently rewritten into different Unicode characters."

Attack vector

An attacker provides a bytes object containing malformed or truncated UTF-8 byte sequences to `ujson.dumps()` with `reject_bytes=False`. Instead of raising an error, the encoder silently rewrites the invalid bytes into different Unicode characters — for example, `b'\xcf\x13'` becomes `b'\xcf\x93'` and `b'\xc3'` becomes `b'\xc3\x80'`. This bypasses any input validation that occurs before serialization and corrupts data integrity [ref_id=1].

Affected code

The vulnerability resides in `src/ujson/lib/ultrajsonenc.c` in the `Buffer_EscapeStringValidated` function, which handles UTF-8 validation when `reject_bytes=False` is passed to `ujson.dumps()`, `ujson.dump()`, or `ujson.encode()`. The encoder lacked checks for invalid continuation bytes, unterminated sequences, overlong sequences, and codepoints above U+10FFFF. The decoder in `src/ujson/lib/ultrajsondec.c` also had minor error message inconsistencies.

What the fix does

The patch adds missing UTF-8 validation checks in `Buffer_EscapeStringValidated` [patch_id=6627426]. For 2-byte sequences, it now verifies the continuation byte matches `0b10xx_xxxx` and checks the remaining length is at least 2 bytes. For 3-byte and 4-byte sequences, similar continuation-byte and length checks are added, plus a new check that 4-byte sequences do not encode codepoints above U+10FFFF. These changes ensure that malformed byte sequences are rejected with an error instead of being silently rewritten.

Preconditions

  • configThe application must call ujson.dumps(), ujson.dump(), or ujson.encode() with reject_bytes=False
  • inputThe attacker must be able to supply a bytes object containing malformed UTF-8 sequences to the serialization call

Generated on Jun 19, 2026. Inputs: CWE entries + fix-commit diffs from this CVE's patches. Citations validated against bundle.

References

4

News mentions

0

No linked articles in our index yet.