Apache Fory, Apache Fory: Python RCE via unguarded pickle fallback serializer in pyfory
Description
Deserialization of untrusted data in python in pyfory versions 0.12.0 through 0.12.2, or the legacy pyfury versions from 0.1.0 through 0.10.3: allows arbitrary code execution. An application is vulnerable if it reads pyfory serialized data from untrusted sources. An attacker can craft a data stream that selects pickle-fallback serializer during deserialization, leading to the execution of pickle.loads, which is vulnerable to remote code execution.
Users are recommended to upgrade to pyfory version 0.12.3 or later, which has removed pickle fallback serializer and thus fixes this issue.
AI Insight
LLM-synthesized narrative grounded in this CVE's description and references.
Deserialization of untrusted data in pyfory (0.12.0-0.12.2) or legacy pyfury (0.1.0-0.10.3) allows arbitrary code execution via pickle fallback.
CVE-2025-61622 is a deserialization vulnerability in the Python serialization library pyfory (and legacy pyfury). The vulnerability stems from a pickle fallback serializer that can be triggered when deserializing untrusted data. An attacker can craft a data stream that selects pickle during deserialization, leading to execution of pickle.loads, which is inherently unsafe [3].
Exploitation
To exploit, the attacker must cause an application to deserialize untrusted pyfory data. No special privileges are needed; the attack vector is network-based if the application reads serialized data from untrusted sources. The pickle fallback was intended for compatibility but introduced a code execution risk [1][3].
Impact
Successful exploitation results in arbitrary code execution in the context of the application, potentially leading to full system compromise, data exfiltration, or further lateral movement.
Mitigation
The issue is fixed in pyfory version 0.12.3, which removes the pickle fallback serializer. Users are strongly advised to upgrade. Additionally, applications should avoid deserializing data from untrusted sources [2][4].
- GitHub - apache/fory: A blazingly fast multi-language serialization framework for idiomatic domain objects, schema IDL, and cross-language data exchange.
- feat(python): drop-in replacement for pickle serialization (#2629) · apache/fory@379b948
- NVD - CVE-2025-61622
- feat(python): drop-in replacement for pickle serialization by chaokunyang · Pull Request #2629 · apache/fory
AI Insight generated on May 19, 2026. Synthesized from this CVE's description and the cited reference URLs; citations are validated against the source bundle.
Affected packages
Versions sourced from the GitHub Security Advisory.
| Package | Affected versions | Patched versions |
|---|---|---|
pyforyPyPI | >= 0.12.0, < 0.12.3 | 0.12.3 |
pyfuryPyPI | >= 0.1.0, <= 0.10.3 | — |
Affected products
1- Apache Software Foundation/Apache Foryv5Range: 0.1.0
Patches
1379b948ecae5feat(python): drop-in replacement for pickle serialization (#2629)
8 files changed · +239 −285
python/pyfory/_fory.py+4 −52 modified@@ -18,7 +18,6 @@ import enum import logging import os -import warnings from abc import ABC, abstractmethod from typing import Union, Iterable, TypeVar @@ -37,9 +36,6 @@ except ImportError: np = None -from cloudpickle import Pickler - -from pickle import Unpickler logger = logging.getLogger(__name__) @@ -105,8 +101,6 @@ class Fory: "serialization_context", "require_type_registration", "buffer", - "pickler", - "unpickler", "_buffer_callback", "_buffers", "metastring_resolver", @@ -160,17 +154,6 @@ def __init__( self.type_resolver.initialize() self.buffer = Buffer.allocate(32) - if not require_type_registration: - warnings.warn( - "Type registration is disabled, unknown types can be deserialized which may be insecure.", - RuntimeWarning, - stacklevel=2, - ) - self.pickler = Pickler(self.buffer) - self.unpickler = None - else: - self.pickler = _PicklerStub() - self.unpickler = _UnpicklerStub() self._buffer_callback = None self._buffers = None self._unsupported_callback = None @@ -237,9 +220,7 @@ def _serialize( ) -> Union[Buffer, bytes]: self._buffer_callback = buffer_callback self._unsupported_callback = unsupported_callback - if buffer is not None: - self.pickler = Pickler(buffer) - else: + if buffer is None: self.buffer.writer_index = 0 buffer = self.buffer if self.language == Language.XLANG: @@ -493,21 +474,11 @@ def read_buffer_object(self, buffer) -> Buffer: def handle_unsupported_write(self, buffer, obj): if self._unsupported_callback is None or self._unsupported_callback(obj): - buffer.write_bool(True) - self.pickler.dump(obj) - else: - buffer.write_bool(False) + raise NotImplementedError(f"{type(obj)} is not supported for write") def handle_unsupported_read(self, buffer): - in_band = buffer.read_bool() - if in_band: - unpickler = self.unpickler - if unpickler is None: - self.unpickler = unpickler = Unpickler(buffer) - return unpickler.load() - else: - assert self._unsupported_objects is not None - return next(self._unsupported_objects) + assert self._unsupported_objects is not None + return next(self._unsupported_objects) def write_ref_pyobject(self, buffer, value, typeinfo=None): if self.ref_resolver.write_ref_or_null(buffer, value): @@ -525,7 +496,6 @@ def reset_write(self): self.type_resolver.reset_write() self.serialization_context.reset_write() self.metastring_resolver.reset_write() - self.pickler.clear_memo() self._buffer_callback = None self._unsupported_callback = None @@ -535,7 +505,6 @@ def reset_read(self): self.type_resolver.reset_read() self.serialization_context.reset_read() self.metastring_resolver.reset_write() - self.unpickler = None self._buffers = None self._unsupported_objects = None @@ -562,20 +531,3 @@ def throw_depth_limit_exceeded_exception(self): "1", "true", } - - -class _PicklerStub: - def dump(self, o): - raise ValueError( - f"Type {type(o)} is not registered, " - f"pickle is not allowed when type registration enabled, " - f"Please register the type or pass unsupported_callback" - ) - - def clear_memo(self): - pass - - -class _UnpicklerStub: - def load(self): - raise ValueError("pickle is not allowed when type registration enabled, Please register the type or pass unsupported_callback")
python/pyfory/_registry.py+51 −48 modified@@ -25,7 +25,7 @@ from typing import TypeVar, Union from enum import Enum -from pyfory._serialization import ENABLE_FORY_CYTHON_SERIALIZATION +from pyfory import ENABLE_FORY_CYTHON_SERIALIZATION from pyfory import Language from pyfory.error import TypeUnregisteredError @@ -35,9 +35,6 @@ NDArraySerializer, PyArraySerializer, DynamicPyArraySerializer, - _PickleStub, - PickleStrongCacheStub, - PickleCacheStub, NoneSerializer, BooleanSerializer, ByteSerializer, @@ -56,15 +53,16 @@ SetSerializer, EnumSerializer, SliceSerializer, - PickleCacheSerializer, - PickleStrongCacheSerializer, - PickleSerializer, DataClassSerializer, DataClassStubSerializer, StatefulSerializer, ReduceSerializer, FunctionSerializer, ObjectSerializer, + TypeSerializer, + MethodSerializer, + UnsupportedSerializer, + NativeFuncMethodSerializer, ) from pyfory.meta.metastring import MetaStringEncoder, MetaStringDecoder from pyfory.meta.meta_compressor import DeflaterMetaCompressor @@ -78,6 +76,7 @@ Float64Type, load_class, is_struct_type, + record_class_factory, ) from pyfory._fory import ( DYNAMIC_TYPE_ID, @@ -171,6 +170,7 @@ class TypeResolver: "_meta_shared_typeinfo", "meta_share", "serialization_context", + "_internal_py_serializer_map", ) def __init__(self, fory, meta_share=False): @@ -199,35 +199,42 @@ def __init__(self, fory, meta_share=False): self.typename_decoder = MetaStringDecoder("$", "_") self.meta_compressor = DeflaterMetaCompressor() self.meta_share = meta_share + self._internal_py_serializer_map = {} def initialize(self): - self._initialize_xlang() + self._initialize_common() if self.fory.language == Language.PYTHON: self._initialize_py() + else: + self._initialize_xlang() self.serialization_context = self.fory.serialization_context def _initialize_py(self): register = functools.partial(self._register_type, internal=True) - register( - _PickleStub, - type_id=PickleSerializer.PICKLE_TYPE_ID, - serializer=PickleSerializer, - ) - register( - PickleStrongCacheStub, - type_id=97, - serializer=PickleStrongCacheSerializer(self.fory), - ) - register( - PickleCacheStub, - type_id=98, - serializer=PickleCacheSerializer(self.fory), - ) register(type(None), serializer=NoneSerializer) register(tuple, serializer=TupleSerializer) register(slice, serializer=SliceSerializer) + register(np.ndarray, serializer=NDArraySerializer) + register(array.array, serializer=DynamicPyArraySerializer) + self._internal_py_serializer_map = { + ReduceSerializer: (self._stub_cls("__Reduce__"), self._next_type_id()), + TypeSerializer: (self._stub_cls("__Type__"), self._next_type_id()), + MethodSerializer: (self._stub_cls("__Method__"), self._next_type_id()), + NativeFuncMethodSerializer: (self._stub_cls("__NativeFunction__"), self._next_type_id()), + } + for serializer, (stub_cls, type_id) in self._internal_py_serializer_map.items(): + register(stub_cls, serializer=serializer, type_id=type_id) + + @staticmethod + def _stub_cls(name: str): + return record_class_factory(name, []) def _initialize_xlang(self): + register = functools.partial(self._register_type, internal=True) + register(array.array, type_id=DYNAMIC_TYPE_ID, serializer=DynamicPyArraySerializer) + register(np.ndarray, type_id=DYNAMIC_TYPE_ID, serializer=NDArraySerializer) + + def _initialize_common(self): register = functools.partial(self._register_type, internal=True) register(None, type_id=TypeId.NA, serializer=NoneSerializer) register(bool, type_id=TypeId.BOOL, serializer=BooleanSerializer) @@ -258,7 +265,6 @@ def _initialize_xlang(self): type_id=typeid, serializer=PyArraySerializer(self.fory, ftype, typeid), ) - register(array.array, type_id=DYNAMIC_TYPE_ID, serializer=DynamicPyArraySerializer) if np: # overwrite pyarray with same type id. # if pyarray are needed, one must annotate that value with XXXArrayType @@ -274,7 +280,6 @@ def _initialize_xlang(self): type_id=typeid, serializer=Numpy1DArraySerializer(self.fory, ftype, dtype), ) - register(np.ndarray, type_id=DYNAMIC_TYPE_ID, serializer=NDArraySerializer) register(list, type_id=TypeId.LIST, serializer=ListSerializer) register(set, type_id=TypeId.SET, serializer=SetSerializer) register(dict, type_id=TypeId.MAP, serializer=MapSerializer) @@ -447,7 +452,7 @@ def __register_type( self._named_type_to_typeinfo[(namespace, typename)] = typeinfo self._ns_type_to_typeinfo[(ns_meta_bytes, type_meta_bytes)] = typeinfo self._types_info[cls] = typeinfo - if type_id > 0 and (self.language == Language.PYTHON or not TypeId.is_namespaced_type(type_id)): + if type_id is not None and type_id != 0 and (self.language == Language.PYTHON or not TypeId.is_namespaced_type(type_id)): if type_id not in self._type_id_to_typeinfo or not internal: self._type_id_to_typeinfo[type_id] = typeinfo self._types_info[cls] = typeinfo @@ -500,12 +505,12 @@ def get_typeinfo(self, cls, create=True): if self.language == Language.PYTHON: if isinstance(serializer, EnumSerializer): type_id = TypeId.NAMED_ENUM - elif type(serializer) is PickleSerializer: - type_id = PickleSerializer.PICKLE_TYPE_ID elif isinstance(serializer, FunctionSerializer): type_id = TypeId.NAMED_EXT - elif isinstance(serializer, (ObjectSerializer, StatefulSerializer, ReduceSerializer)): + elif isinstance(serializer, (ObjectSerializer, StatefulSerializer)): type_id = TypeId.NAMED_EXT + elif self._internal_py_serializer_map.get(type(serializer)) is not None: + type_id = self._internal_py_serializer_map.get(type(serializer))[1] if not self.require_registration: if isinstance(serializer, DataClassSerializer): type_id = TypeId.NAMED_STRUCT @@ -552,35 +557,33 @@ def _create_serializer(self, cls): serializer = DataClassStubSerializer(self.fory, cls, xlang=not self.fory.is_py) elif issubclass(cls, enum.Enum): serializer = EnumSerializer(self.fory, cls) + elif ("builtin_function_or_method" in str(cls) or "cython_function_or_method" in str(cls)) and "<locals>" not in str(cls): + serializer = NativeFuncMethodSerializer(self.fory, cls) + elif cls is type(self.initialize): + # Handle bound method objects + serializer = MethodSerializer(self.fory, cls) + elif issubclass(cls, type): + # Handle Python type objects and metaclass such as numpy._DTypeMeta(i.e. np.dtype) + serializer = TypeSerializer(self.fory, cls) + elif cls is array.array: + # Handle array.array objects with DynamicPyArraySerializer + # Note: This will use DynamicPyArraySerializer for all array.array objects + serializer = DynamicPyArraySerializer(self.fory, cls) elif (hasattr(cls, "__reduce__") and cls.__reduce__ is not object.__reduce__) or ( hasattr(cls, "__reduce_ex__") and cls.__reduce_ex__ is not object.__reduce_ex__ ): # Use ReduceSerializer for objects that have custom __reduce__ or __reduce_ex__ methods # This has higher precedence than StatefulSerializer and ObjectSerializer # Only use it for objects with custom reduce methods, not default ones from the object - module_name = getattr(cls, "__module__", "") - if module_name.startswith("pandas.") or module_name == "builtins" or cls.__name__ in ("type", "function", "method"): - # Exclude pandas, built-ins, and certain system types - serializer = PickleSerializer(self.fory, cls) - else: - serializer = ReduceSerializer(self.fory, cls) + serializer = ReduceSerializer(self.fory, cls) elif hasattr(cls, "__getstate__") and hasattr(cls, "__setstate__"): # Use StatefulSerializer for objects that support __getstate__ and __setstate__ - # But exclude certain types that have incompatible state methods - module_name = getattr(cls, "__module__", "") - if module_name.startswith("pandas."): - # Pandas objects have __getstate__/__setstate__ but use incompatible pickle formats - serializer = PickleSerializer(self.fory, cls) - else: - serializer = StatefulSerializer(self.fory, cls) - elif ( - cls is not type - and (hasattr(cls, "__dict__") or hasattr(cls, "__slots__")) - and not (np and (issubclass(cls, np.dtype) or cls is type(np.dtype))) - ): + serializer = StatefulSerializer(self.fory, cls) + elif hasattr(cls, "__dict__") or hasattr(cls, "__slots__"): serializer = ObjectSerializer(self.fory, cls) else: - serializer = PickleSerializer(self.fory, cls) + # c-extension types will go to here + serializer = UnsupportedSerializer(self.fory, cls) return serializer def is_registered_by_name(self, cls):
python/pyfory/_serialization.pyx+7 −37 modified@@ -30,7 +30,6 @@ from typing import TypeVar, Union, Iterable from pyfory._util import get_bit, set_bit, clear_bit from pyfory import _fory as fmod from pyfory._fory import Language -from pyfory._fory import _PicklerStub, _UnpicklerStub, Pickler, Unpickler from pyfory._fory import _ENABLE_TYPE_REGISTRATION_FORCIBLY from pyfory.lib import mmh3 from pyfory.meta.metastring import Encoding @@ -792,8 +791,6 @@ cdef class Fory: cdef readonly MetaStringResolver metastring_resolver cdef readonly SerializationContext serialization_context cdef Buffer buffer - cdef public object pickler # pickle.Pickler - cdef public object unpickler # Optional[pickle.Unpickler] cdef object _buffer_callback cdef object _buffers # iterator cdef object _unsupported_callback @@ -841,18 +838,6 @@ cdef class Fory: self.serialization_context = SerializationContext(fory=self, scoped_meta_share_enabled=compatible) self.type_resolver.initialize() self.buffer = Buffer.allocate(32) - if not require_type_registration: - warnings.warn( - "Type registration is disabled, unknown types can be deserialized " - "which may be insecure.", - RuntimeWarning, - stacklevel=2, - ) - self.pickler = Pickler(self.buffer) - else: - self.pickler = _PicklerStub() - self.unpickler = _UnpicklerStub() - self.unpickler = None self._buffer_callback = None self._buffers = None self._unsupported_callback = None @@ -907,9 +892,7 @@ cdef class Fory: self, obj, Buffer buffer, buffer_callback=None, unsupported_callback=None): self._buffer_callback = buffer_callback self._unsupported_callback = unsupported_callback - if buffer is not None: - self.pickler = Pickler(self.buffer) - else: + if buffer is None: self.buffer.writer_index = 0 buffer = self.buffer if self.language == Language.XLANG: @@ -1053,8 +1036,6 @@ cdef class Fory: cpdef inline _deserialize( self, Buffer buffer, buffers=None, unsupported_objects=None): - if not self.require_type_registration: - self.unpickler = Unpickler(buffer) if unsupported_objects is not None: self._unsupported_objects = iter(unsupported_objects) if self.language == Language.XLANG: @@ -1217,22 +1198,13 @@ cdef class Fory: buffer.reader_index += size return buf - cpdef inline handle_unsupported_write(self, Buffer buffer, obj): + cpdef handle_unsupported_write(self, buffer, obj): if self._unsupported_callback is None or self._unsupported_callback(obj): - buffer.write_bool(True) - self.pickler.dump(obj) - else: - buffer.write_bool(False) + raise NotImplementedError(f"{type(obj)} is not supported for write") - cpdef inline handle_unsupported_read(self, Buffer buffer): - cdef c_bool in_band = buffer.read_bool() - if in_band: - if self.unpickler is None: - self.unpickler.buffer = Unpickler(buffer) - return self.unpickler.load() - else: - assert self._unsupported_objects is not None - return next(self._unsupported_objects) + cpdef handle_unsupported_read(self, buffer): + assert self._unsupported_objects is not None + return next(self._unsupported_objects) cpdef inline write_ref_pyobject( self, Buffer buffer, value, TypeInfo typeinfo=None): @@ -1261,7 +1233,6 @@ cdef class Fory: self.type_resolver.reset_write() self.metastring_resolver.reset_write() self.serialization_context.reset_write() - self.pickler.clear_memo() self._unsupported_callback = None cpdef inline reset_read(self): @@ -1271,7 +1242,6 @@ cdef class Fory: self.metastring_resolver.reset_read() self.serialization_context.reset_read() self._buffers = None - self.unpickler = None self._unsupported_objects = None cpdef inline reset(self): @@ -1733,7 +1703,7 @@ cdef class CollectionSerializer(Serializer): ref_resolver.set_read_object(ref_id, obj) self._add_element(collection_, i, obj) self.fory.dec_depth() - + cpdef _add_element(self, object collection_, int64_t index, object element): raise NotImplementedError
python/pyfory/_serializer.py+3 −5 modified@@ -19,7 +19,7 @@ import logging import platform import time -from abc import ABC, abstractmethod +from abc import ABC from typing import Dict from pyfory._fory import NOT_NULL_INT64_FLAG @@ -74,13 +74,11 @@ def write(self, buffer, value): def read(self, buffer): raise NotImplementedError - @abstractmethod def xwrite(self, buffer, value): - pass + raise NotImplementedError - @abstractmethod def xread(self, buffer): - pass + raise NotImplementedError @classmethod def support_subclass(cls) -> bool:
python/pyfory/serializer.py+170 −119 modified@@ -18,26 +18,24 @@ import array import builtins import dataclasses +import importlib +import inspect import itertools import marshal import logging import os -import pickle import types import typing from typing import List import warnings -from weakref import WeakValueDictionary -import pyfory.lib.mmh3 from pyfory.buffer import Buffer from pyfory.codegen import ( gen_write_nullable_basic_stmts, gen_read_nullable_basic_stmts, compile_function, ) from pyfory.error import TypeNotCompatibleError -from pyfory.lib.collection import WeakIdentityKeyDictionary from pyfory.resolver import NULL_FLAG, NOT_NULL_VALUE_FLAG from pyfory import Language @@ -139,82 +137,31 @@ def read(self, buffer): return None -class _PickleStub: - pass +class TypeSerializer(Serializer): + """Serializer for Python type objects (classes).""" - -class PickleStrongCacheStub: - pass - - -class PickleCacheStub: - pass - - -class PickleStrongCacheSerializer(Serializer): - """If we can't create weak ref to object, use this cache serializer instead. - clear cache by threshold to avoid memory leak.""" - - __slots__ = "_cached", "_clear_threshold", "_counter" - - def __init__(self, fory, clear_threshold: int = 1000): - super().__init__(fory, PickleStrongCacheStub) - self._cached = {} - self._clear_threshold = clear_threshold + def __init__(self, fory, cls): + super().__init__(fory, cls) + self.cls = cls def write(self, buffer, value): - serialized = self._cached.get(value) - if serialized is None: - serialized = pickle.dumps(value) - self._cached[value] = serialized - buffer.write_bytes_and_size(serialized) - if len(self._cached) == self._clear_threshold: - self._cached.clear() + # Serialize the type by its module and name + module_name = getattr(value, "__module__", "") + type_name = getattr(value, "__name__", "") + buffer.write_string(module_name) + buffer.write_string(type_name) def read(self, buffer): - return pickle.loads(buffer.read_bytes_and_size()) + module_name = buffer.read_string() + type_name = buffer.read_string() - def xwrite(self, buffer, value): - raise NotImplementedError - - def xread(self, buffer): - raise NotImplementedError - - -class PickleCacheSerializer(Serializer): - __slots__ = "_cached", "_reverse_cached" - - def __init__(self, fory): - super().__init__(fory, PickleCacheStub) - self._cached = WeakIdentityKeyDictionary() - self._reverse_cached = WeakValueDictionary() - - def write(self, buffer, value): - cache = self._cached.get(value) - if cache is None: - serialized = pickle.dumps(value) - value_hash = pyfory.lib.mmh3.hash_buffer(serialized)[0] - cache = value_hash, serialized - self._cached[value] = cache - buffer.write_int64(cache[0]) - buffer.write_bytes_and_size(cache[1]) - - def read(self, buffer): - value_hash = buffer.read_int64() - value = self._reverse_cached.get(value_hash) - if value is None: - value = pickle.loads(buffer.read_bytes_and_size()) - self._reverse_cached[value_hash] = value + # Import the module and get the type + if module_name and module_name != "builtins": + module = __import__(module_name, fromlist=[type_name]) + return getattr(module, type_name) else: - size = buffer.read_int32() - buffer.skip(size) - return value - - def xwrite(self, buffer, value): - raise NotImplementedError - - def xread(self, buffer): - raise NotImplementedError + # Handle built-in types + return getattr(builtins, type_name, type) class PandasRangeIndexSerializer(Serializer): @@ -292,9 +239,7 @@ def xread(self, buffer): "1", ) -# Moved from L32 to here, after all Serializer base classes and specific serializers -# like ListSerializer, MapSerializer, PickleSerializer are defined or imported -# and before DataClassSerializer which uses StructFieldSerializerVisitor from _struct. + from pyfory._struct import _get_hash, _sort_fields, StructFieldSerializerVisitor @@ -707,12 +652,18 @@ def write(self, buffer, value: array.array): def read(self, buffer): typecode = buffer.read_string() data = buffer.read_bytes_and_size() - arr = array.array(typecode, []) + arr = array.array(typecode[0], []) # Take first character arr.frombytes(data) return arr class DynamicPyArraySerializer(Serializer): + """Serializer for dynamic Python arrays that handles any typecode.""" + + def __init__(self, fory, cls): + super().__init__(fory, cls) + self._serializer = ReduceSerializer(fory, cls) + def xwrite(self, buffer, value): itemsize, ftype, type_id = typecode_dict[value.typecode] view = memoryview(value) @@ -733,11 +684,10 @@ def xread(self, buffer): return arr def write(self, buffer, value): - buffer.write_varuint32(PickleSerializer.PICKLE_TYPE_ID) - self.fory.handle_unsupported_write(buffer, value) + self._serializer.write(buffer, value) def read(self, buffer): - return self.fory.handle_unsupported_read(buffer) + return self._serializer.read(buffer) if np: @@ -772,6 +722,7 @@ def __init__(self, fory, ftype, dtype): super().__init__(fory, ftype) self.dtype = dtype self.itemsize, self.format, self.typecode, self.type_id = _np_dtypes_dict[self.dtype] + self._serializer = ReduceSerializer(fory, np.ndarray) def xwrite(self, buffer, value): assert value.itemsize == self.itemsize @@ -793,11 +744,10 @@ def xread(self, buffer): return np.frombuffer(data, dtype=self.dtype) def write(self, buffer, value): - buffer.write_int8(PickleSerializer.PICKLE_TYPE_ID) - self.fory.handle_unsupported_write(buffer, value) + self._serializer.write(buffer, value) def read(self, buffer): - return self.fory.handle_unsupported_read(buffer) + return self._serializer.read(buffer) class NDArraySerializer(Serializer): @@ -816,11 +766,32 @@ def xread(self, buffer): raise NotImplementedError("Multi-dimensional array not supported currently") def write(self, buffer, value): - buffer.write_int8(PickleSerializer.PICKLE_TYPE_ID) - self.fory.handle_unsupported_write(buffer, value) + # Serialize numpy ND array using native format + dtype = value.dtype + fory = self.fory + fory.serialize_ref(buffer, dtype) + buffer.write_varuint32(len(value.shape)) + for dim in value.shape: + buffer.write_varuint32(dim) + if dtype.kind == "O": + buffer.write_varint32(len(value)) + for item in value: + fory.serialize_ref(buffer, item) + else: + data = value.tobytes() + buffer.write_bytes_and_size(data) def read(self, buffer): - return self.fory.handle_unsupported_read(buffer) + fory = self.fory + dtype = fory.deserialize_ref(buffer) + ndim = buffer.read_varuint32() + shape = tuple(buffer.read_varuint32() for _ in range(ndim)) + if dtype.kind == "O": + length = buffer.read_varint32() + items = [fory.deserialize_ref(buffer) for _ in range(length)] + return np.array(items, dtype=object) + data = buffer.read_bytes_and_size() + return np.frombuffer(data, dtype=dtype).reshape(shape) class BytesSerializer(CrossLanguageCompatibleSerializer): @@ -927,39 +898,58 @@ def write(self, buffer, value): # Handle different __reduce__ return formats if isinstance(reduce_result, str): # Case 1: Just a global name (simple case) - self.fory.serialize_ref(buffer, ("global", reduce_result, None, None, None)) + reduce_data = ("global", reduce_result) elif isinstance(reduce_result, tuple): if len(reduce_result) == 2: # Case 2: (callable, args) callable_obj, args = reduce_result - self.fory.serialize_ref(buffer, ("callable", callable_obj, args, None, None)) + reduce_data = ("callable", callable_obj, args) elif len(reduce_result) == 3: # Case 3: (callable, args, state) callable_obj, args, state = reduce_result - self.fory.serialize_ref(buffer, ("callable", callable_obj, args, state, None)) + reduce_data = ("callable", callable_obj, args, state) elif len(reduce_result) == 4: # Case 4: (callable, args, state, listitems) callable_obj, args, state, listitems = reduce_result - self.fory.serialize_ref(buffer, ("callable", callable_obj, args, state, listitems)) + reduce_data = ("callable", callable_obj, args, state, listitems) elif len(reduce_result) == 5: # Case 5: (callable, args, state, listitems, dictitems) callable_obj, args, state, listitems, dictitems = reduce_result - self.fory.serialize_ref(buffer, ("callable", callable_obj, args, state, listitems, dictitems)) + reduce_data = ("callable", callable_obj, args, state, listitems, dictitems) else: raise ValueError(f"Invalid __reduce__ result length: {len(reduce_result)}") else: raise ValueError(f"Invalid __reduce__ result type: {type(reduce_result)}") + buffer.write_varuint32(len(reduce_data)) + fory = self.fory + for item in reduce_data: + fory.serialize_ref(buffer, item) def read(self, buffer): - reduce_data = self.fory.deserialize_ref(buffer) + reduce_data_num_items = buffer.read_varuint32() + assert reduce_data_num_items <= 6, buffer + reduce_data = [None] * 6 + fory = self.fory + for i in range(reduce_data_num_items): + reduce_data[i] = fory.deserialize_ref(buffer) if reduce_data[0] == "global": # Case 1: Global name global_name = reduce_data[1] # Import and return the global object - module_name, obj_name = global_name.rsplit(".", 1) - module = __import__(module_name, fromlist=[obj_name]) - return getattr(module, obj_name) + if "." in global_name: + module_name, obj_name = global_name.rsplit(".", 1) + module = __import__(module_name, fromlist=[obj_name]) + return getattr(module, obj_name) + else: + # Handle case where global_name doesn't contain a dot + # This might be a built-in type or a simple name + try: + import builtins + + return getattr(builtins, global_name) + except AttributeError: + raise ValueError(f"Cannot resolve global name: {global_name}") elif reduce_data[0] == "callable": # Case 2-5: Callable with args and optional state/items callable_obj = reduce_data[1] @@ -1018,33 +1008,41 @@ class FunctionSerializer(CrossLanguageCompatibleSerializer): def _serialize_function(self, buffer, func): """Serialize a function by capturing all its components.""" # Get function metadata - is_method = hasattr(func, "__self__") - if is_method: + instance = getattr(func, "__self__", None) + if instance is not None and not inspect.ismodule(instance): # Handle bound methods - self_obj = func.__self__ + self_obj = instance func_name = func.__name__ # Serialize as a tuple (is_method, self_obj, method_name) - buffer.write_bool(True) # is a method + buffer.write_int8(0) # is a method # For the 'self' object, we need to use fory's serialization self.fory.serialize_ref(buffer, self_obj) buffer.write_string(func_name) return + import types # Regular function or lambda code = func.__code__ name = func.__name__ - defaults = func.__defaults__ - closure = func.__closure__ - globals_dict = func.__globals__ module = func.__module__ qualname = func.__qualname__ + if "<locals>" not in qualname and module != "__main__": + buffer.write_int8(1) # Not a method + buffer.write_string(name) + buffer.write_string(module) + return + # Serialize function metadata - buffer.write_bool(False) # Not a method + buffer.write_int8(2) # Not a method buffer.write_string(name) buffer.write_string(module) buffer.write_string(qualname) + defaults = func.__defaults__ + closure = func.__closure__ + globals_dict = func.__globals__ + # Instead of trying to serialize the code object in parts, use marshal # which is specifically designed for code objects marshalled_code = marshal.dumps(code) @@ -1112,16 +1110,21 @@ def _serialize_function(self, buffer, func): def _deserialize_function(self, buffer): """Deserialize a function from its components.""" - import sys # Check if it's a method - is_method = buffer.read_bool() - if is_method: + func_type_id = buffer.read_int8() + if func_type_id == 0: # Handle bound methods self_obj = self.fory.deserialize_ref(buffer) method_name = buffer.read_string() return getattr(self_obj, method_name) + if func_type_id == 1: + name = buffer.read_string() + module = buffer.read_string() + mod = importlib.import_module(module) + return getattr(mod, name) + # Regular function or lambda name = buffer.read_string() module = buffer.read_string() @@ -1169,7 +1172,7 @@ def _deserialize_function(self, buffer): # Create a globals dictionary with module's globals as the base func_globals = {} try: - mod = sys.modules.get(module) + mod = importlib.import_module(module) if mod: func_globals.update(mod.__dict__) except (KeyError, AttributeError): @@ -1197,12 +1200,10 @@ def _deserialize_function(self, buffer): return func def xwrite(self, buffer, value): - """Serialize a function for cross-language compatibility.""" - self._serialize_function(buffer, value) + raise NotImplementedError() def xread(self, buffer): - """Deserialize a function for cross-language compatibility.""" - return self._deserialize_function(buffer) + raise NotImplementedError() def write(self, buffer, value): """Serialize a function for Python-only mode.""" @@ -1213,20 +1214,56 @@ def read(self, buffer): return self._deserialize_function(buffer) -class PickleSerializer(Serializer): - PICKLE_TYPE_ID = 96 +class NativeFuncMethodSerializer(Serializer): + def write(self, buffer, func): + name = func.__name__ + buffer.write_string(name) + obj = getattr(func, "__self__", None) + if obj is None or inspect.ismodule(obj): + buffer.write_bool(True) + module = func.__module__ + buffer.write_string(module) + else: + buffer.write_bool(False) + self.fory.serialize_ref(buffer, obj) - def xwrite(self, buffer, value): - raise NotImplementedError + def read(self, buffer): + name = buffer.read_string() + if buffer.read_bool(): + module = buffer.read_string() + mod = importlib.import_module(module) + return getattr(mod, name) + else: + obj = self.fory.deserialize_ref(buffer) + return getattr(obj, name) - def xread(self, buffer): - raise NotImplementedError + +class MethodSerializer(Serializer): + """Serializer for bound method objects.""" + + def __init__(self, fory, cls): + super().__init__(fory, cls) + self.cls = cls def write(self, buffer, value): - self.fory.handle_unsupported_write(buffer, value) + # Serialize bound method as (instance, method_name) + instance = value.__self__ + method_name = value.__func__.__name__ + + self.fory.serialize_ref(buffer, instance) + buffer.write_string(method_name) def read(self, buffer): - return self.fory.handle_unsupported_read(buffer) + instance = self.fory.deserialize_ref(buffer) + method_name = buffer.read_string() + + return getattr(instance, method_name) + + def xwrite(self, buffer, value): + return self.write(buffer, value) + + def xread(self, buffer): + return self.read(buffer) class ObjectSerializer(Serializer): @@ -1316,3 +1353,17 @@ def xwrite(self, buffer, value): def xread(self, buffer): value = buffer.read_varuint32() return NonExistEnum(value=value) + + +class UnsupportedSerializer(Serializer): + def write(self, buffer, value): + self.fory.handle_unsupported_write(value) + + def read(self, buffer): + return self.fory.handle_unsupported_read(buffer) + + def xwrite(self, buffer, value): + raise NotImplementedError(f"{self.type_} is not supported for xwrite") + + def xread(self, buffer): + raise NotImplementedError(f"{self.type_} is not supported for xread")
python/pyfory/_struct.py+0 −9 modified@@ -93,14 +93,11 @@ def visit_customized(self, field_name, type_, types_path=None): return None def visit_other(self, field_name, type_, types_path=None): - from pyfory.serializer import PickleSerializer # Local import - if is_subclass(type_, enum.Enum): return self.fory.type_resolver.get_serializer(type_) if type_ not in basic_types and not is_py_array_type(type_): return None serializer = self.fory.type_resolver.get_serializer(type_) - assert not isinstance(serializer, (PickleSerializer,)) return serializer @@ -202,14 +199,11 @@ def visit_customized(self, field_name, type_, types_path=None): self._hash = self._compute_field_hash(self._hash, hash_value) def visit_other(self, field_name, type_, types_path=None): - from pyfory.serializer import PickleSerializer # Local import - typeinfo = self.fory.type_resolver.get_typeinfo(type_, create=False) if typeinfo is None: id_ = 0 else: serializer = typeinfo.serializer - assert not isinstance(serializer, (PickleSerializer,)) id_ = typeinfo.type_id assert id_ is not None, serializer if TypeId.is_namespaced_type(typeinfo.type_id): @@ -256,14 +250,11 @@ def visit_customized(self, field_name, type_, types_path=None): return [typeinfo.type_id] def visit_other(self, field_name, type_, types_path=None): - from pyfory.serializer import PickleSerializer # Local import - if is_subclass(type_, enum.Enum): return [self.fory.type_resolver.get_typeinfo(type_).type_id] if type_ not in basic_types and not is_py_array_type(type_): return None, None typeinfo = self.fory.type_resolver.get_typeinfo(type_) - assert not isinstance(typeinfo.serializer, (PickleSerializer,)) return [typeinfo.type_id]
python/pyfory/tests/test_serializer.py+4 −14 modified@@ -454,13 +454,16 @@ def xread(self, buffer): assert isinstance(fory.deserialize(fory.serialize(A.B.C())), A.B.C) -def test_pickle_fallback(): +def test_np_types(): fory = Fory(language=Language.PYTHON, ref_tracking=True, require_type_registration=False) o1 = [1, True, np.dtype(np.int32)] data1 = fory.serialize(o1) new_o1 = fory.deserialize(data1) assert o1 == new_o1 + +def test_pandas_dataframe(): + fory = Fory(language=Language.PYTHON, ref_tracking=True, require_type_registration=False) df = pd.DataFrame({"a": list(range(10))}) df2 = fory.deserialize(fory.serialize(df)) assert df2.equals(df) @@ -545,19 +548,6 @@ def test_duplicate_serialize(): assert ser_de(fory, EnumClass.E4) == EnumClass.E4 -@dataclass(unsafe_hash=True) -class CacheClass1: - f1: int - - -def test_cache_serializer(): - fory = Fory(language=Language.PYTHON, ref_tracking=True) - fory.register_type(CacheClass1, serializer=pyfory.PickleStrongCacheSerializer(fory)) - assert ser_de(fory, CacheClass1(1)) == CacheClass1(1) - fory.register_type(CacheClass1, serializer=pyfory.PickleCacheSerializer(fory)) - assert ser_de(fory, CacheClass1(1)) == CacheClass1(1) - - def test_pandas_range_index(): fory = Fory(language=Language.PYTHON, ref_tracking=True, require_type_registration=False) fory.register_type(pd.RangeIndex, serializer=pyfory.PandasRangeIndexSerializer(fory))
python/pyproject.toml+0 −1 modified@@ -50,7 +50,6 @@ classifiers = [ ] keywords = ["fory", "serialization", "multi-language", "fast", "row-format", "jit", "codegen", "polymorphic", "zero-copy"] dependencies = [ - "cloudpickle", ] [project.optional-dependencies]
Vulnerability mechanics
Generated on May 9, 2026. Inputs: CWE entries + fix-commit diffs from this CVE's patches. Citations validated against bundle.
References
7- github.com/advisories/GHSA-538v-3wq9-4h3rghsaADVISORY
- lists.apache.org/thread/vfn9hp9qt06db5yo1gmj3l114o3o2csdghsavendor-advisoryWEB
- nvd.nist.gov/vuln/detail/CVE-2025-61622ghsaADVISORY
- www.openwall.com/lists/oss-security/2025/09/29/3ghsaWEB
- github.com/apache/fory/commit/379b948ecae5c3b849e5bdb3997978c9a163e40bghsaWEB
- github.com/apache/fory/pull/2629ghsaWEB
- github.com/apache/fory/releases/tag/v0.12.3ghsaWEB
News mentions
0No linked articles in our index yet.