VYPR
High severityNVD Advisory· Published Oct 11, 2023· Updated Sep 18, 2024

vantage6's Pickle serialization is insecure

CVE-2023-23930

Description

vantage6 is privacy preserving federated learning infrastructure. Versions prior to 4.0.0 use pickle, which has known security issue, as a default serialization module but that has known security issues. All users of vantage6 that post tasks with the default serialization are affected. Version 4.0.0 contains a patch. Users may specify JSON serialization as a workaround.

Affected packages

Versions sourced from the GitHub Security Advisory.

PackageAffected versionsPatched versions
vantage6PyPI
< 4.0.24.0.2

Affected products

1

Patches

1
e62f03bacf22

Merge pull request from GHSA-5m22-cfq9-86x6

https://github.com/vantage6/vantage6Frank MartinJun 12, 2023via ghsa
15 files changed · +91 529
  • docs/algorithms/classic_tutorial.rst+0 26 modified
    @@ -519,29 +519,3 @@ address (harbor2.vantage6.ai) and the project name (demo).
     .. note::
         Reach out to us on `Discord <https://discord.gg/yAyFf6Y>`__ if you want to
         use our registries (harbor.vantage6.ai and harbor2.vantage6.ai).
    -
    -Cross-language serialization
    -----------------------------
    -
    -It is possible that a vantage6 algorithm is developed in one programming
    -language, but you would like to run the task from another language. For
    -these use-cases, the Python algorithm wrapper and client support
    -cross-language serialization. By default, input to the algorithms and
    -output back to the client are serialized using pickle. However, it is
    -possible to define a different serialization format.
    -
    -Input and output serialization can be specified as follows:
    -
    -.. code:: python
    -
    -   client.post_task(
    -       name='mytask',
    -       image='harbor2.vantage6.ai/testing/v6-test-py',
    -       collaboration_id=COLLABORATION_ID,
    -       organization_ids=ORGANIZATION_IDS,
    -       data_format='json', # Specify input format to the algorithm
    -       input_={
    -           'method': 'column_names',
    -           'kwargs': {'data_format': 'json'}, # Specify output format
    -       }
    -   )
    
  • docs/user/pyclient.rst+2 6 modified
    @@ -85,8 +85,6 @@ new user:
        #        Human readable description
        #    input : dict
        #        Algorithm input
    -   #    data_format : str, optional
    -   #        IO data format used, by default LEGACY
        #    database: str, optional
        #        Name of the database to use. This should match the key
        #        in the node configuration files. If not specified the
    @@ -396,8 +394,7 @@ us create a task that runs the master algorithm of the
                                          name="an-awesome-task",
                                          image="harbor2.vantage6.ai/demo/average",
                                          description='',
    -                                     input=input_,
    -                                     data_format='json')
    +                                     input=input_)
     
     Note that the ``kwargs`` we specified in the ``input_`` are specific to
     this algorithm: this algorithm expects an argument ``column_name`` to be
    @@ -431,8 +428,7 @@ master algorithm will normally do:
                                          name="an-awesome-task",
                                          image="harbor2.vantage6.ai/demo/average",
                                          description='',
    -                                     input=input_,
    -                                     data_format='json')
    +                                     input=input_)
     
     **Inspecting the results**
     
    
  • vantage6-client/tests/test_client.py+12 37 modified
    @@ -1,6 +1,6 @@
     import base64
     import json
    -import pickle
    +
     from unittest import TestCase
     from unittest.mock import patch, MagicMock
     
    @@ -26,48 +26,20 @@
     
     class TestClient(TestCase):
     
    -    def test_post_task_legacy_method(self):
    -        post_input = TestClient.post_task_on_mock_client(SAMPLE_INPUT, 'legacy')
    -        decoded_input = base64.b64decode(post_input)
    -        decoded_input = pickle.loads(decoded_input)
    -        assert {'method': 'test-task'} == decoded_input
    -
    -    def test_post_json_task(self):
    -        post_input = TestClient.post_task_on_mock_client(SAMPLE_INPUT, 'json')
    -        decoded_input = base64.b64decode(post_input)
    -        assert b'json.{"method": "test-task"}' == decoded_input
    -
    -    def test_post_pickle_task(self):
    -        post_input = TestClient.post_task_on_mock_client(SAMPLE_INPUT, 'pickle')
    +    def test_post_task(self):
    +        post_input = TestClient.post_task_on_mock_client(SAMPLE_INPUT)
             decoded_input = base64.b64decode(post_input)
    +        assert b'{"method": "test-task"}' == decoded_input
     
    -        assert b'pickle.' == decoded_input[0:7]
    -
    -        assert {'method': 'test-task'} == pickle.loads(decoded_input[7:])
    -
    -    def test_get_legacy_results(self):
    -        mock_result = pickle.dumps(1)
    -
    -        results = TestClient._receive_results_on_mock_client(mock_result)
    -
    -        assert results == [{'result': 1}]
    -
    -    def test_get_json_results(self):
    -        mock_result = b'json.' + json.dumps({'some_key': 'some_value'}).encode()
    +    def test_get_results(self):
    +        mock_result = json.dumps({'some_key': 'some_value'}).encode()
     
             results = TestClient._receive_results_on_mock_client(mock_result)
     
             assert results == [{'result': {'some_key': 'some_value'}}]
     
    -    def test_get_pickle_results(self):
    -        mock_result = b'pickle.' + pickle.dumps([1, 2, 3, 4, 5])
    -
    -        results = TestClient._receive_results_on_mock_client(mock_result)
    -
    -        assert results == [{'result': [1, 2, 3, 4, 5]}]
    -
         @staticmethod
    -    def post_task_on_mock_client(input_, serialization: str) -> dict[str, any]:
    +    def post_task_on_mock_client(input_) -> dict[str, any]:
             mock_requests = MagicMock()
             mock_requests.get.return_value.status_code = 200
             mock_requests.post.return_value.status_code = 200
    @@ -76,8 +48,11 @@ def post_task_on_mock_client(input_, serialization: str) -> dict[str, any]:
             with patch.multiple('vantage6.client', requests=mock_requests, jwt=mock_jwt):
                 client = TestClient.setup_client()
     
    -            client.post_task(name=TASK_NAME, image=TASK_IMAGE, collaboration_id=COLLABORATION_ID,
    -                             organization_ids=ORGANIZATION_IDS, input_=input_, data_format=serialization)
    +            client.post_task(
    +                name=TASK_NAME, image=TASK_IMAGE,
    +                collaboration_id=COLLABORATION_ID,
    +                organization_ids=ORGANIZATION_IDS, input_=input_
    +            )
     
                 # In a request.post call, json is provided with the keyword argument 'json'
                 # call_args provides a tuple with positional arguments followed by a dict with positional arguments
    
  • vantage6-client/tests/test_deserialization.py+1 16 modified
    @@ -1,7 +1,5 @@
    -import pickle
     from pathlib import Path
     from vantage6.tools import deserialization
    -from vantage6.tools.data_format import DataFormat
     
     SIMPLE_TARGET_DATA = {'key': 'value'}
     
    @@ -12,19 +10,6 @@ def test_deserialize_json(tmp_path: Path):
         json_path.write_text(data)
     
         with json_path.open('r') as f:
    -        result = deserialization.deserialize(f, DataFormat.JSON)
    +        result = deserialization.deserialize(f)
     
             assert SIMPLE_TARGET_DATA == result
    -
    -
    -def test_deserialize_pickle(tmp_path: Path):
    -    data = {'key': 'value'}
    -
    -    pickle_path = tmp_path / 'picklefile.pkl'
    -
    -    with pickle_path.open('wb') as f:
    -        pickle.dump(data, f)
    -
    -    with pickle_path.open('rb') as f:
    -        result = deserialization.deserialize(f, DataFormat.PICKLE)
    -        assert SIMPLE_TARGET_DATA == result
    
  • vantage6-client/tests/test_docker_wrapper.py+29 90 modified
    @@ -1,13 +1,10 @@
     import json
    -import pickle
     from pathlib import Path
     from unittest.mock import patch, MagicMock
     
     import pandas as pd
    -from pytest import raises
     
     from vantage6.tools import wrapper
    -from vantage6.tools.exceptions import DeserializationException
     
     MODULE_NAME = 'algorithm_module'
     DATA = 'column1,column2\n1,2'
    @@ -16,107 +13,49 @@
     JSON_FORMAT = 'json'
     SEPARATOR = '.'
     SAMPLE_DB = pd.DataFrame([[1, 2]], columns=['column1', 'column2'])
    -PICKLE_FORMAT = 'pickle'
     
     MOCK_SPARQL_ENDPOINT = 'sparql://some_triplestore'
     
     
    -def test_old_pickle_input_wrapper(tmp_path):
    -    """
    -    Testing if wrapper still parses legacy input.
    -    """
    -    input_file = tmp_path / 'input.pkl'
    -
    -    with input_file.open('wb') as f:
    -        pickle.dump(INPUT_PARAMETERS, f)
    +# def test_json_input_without_format_raises_deserializationexception(tmp_path):
    +#     """
    +#     It should only be possible to provide json input if it is preceded by the
    +#     string "json." in unicode. Otherwise a `DeserializationException` should
    +#     be thrown.
    +#     """
    +#     input_file = tmp_path / 'input.json'
     
    -    output_file = run_docker_wrapper_with_echo_db(input_file, tmp_path)
    -    assert file_echoes_db(output_file)
    +#     with input_file.open('wb') as f:
    +#         f.write(json.dumps(INPUT_PARAMETERS).encode())
     
    +#     with raises(DeserializationException):
    +#         run_docker_wrapper_with_echo_db(input_file, tmp_path)
     
    -def test_json_input_without_format_raises_deserializationexception(tmp_path):
    -    """
    -    It should only be possible to provide json input if it is preceded by the
    -    string "json." in unicode. Otherwise a `DeserializationException` should
    -    be thrown.
    -    """
    -    input_file = tmp_path / 'input.json'
    -
    -    with input_file.open('wb') as f:
    -        f.write(json.dumps(INPUT_PARAMETERS).encode())
    -
    -    with raises(DeserializationException):
    -        run_docker_wrapper_with_echo_db(input_file, tmp_path)
    -
    -
    -def test_json_input_with_format_succeeds(tmp_path):
    -    input_file = tmp_path / 'input.txt'
    -
    -    with input_file.open('wb') as f:
    -        f.write(f'JSON{SEPARATOR}'.encode())
    -        f.write(json.dumps(INPUT_PARAMETERS).encode())
     
    -    output_file = run_docker_wrapper_with_echo_db(input_file, tmp_path)
    -    assert file_echoes_db(output_file)
    +# def test_json_input_with_format_succeeds(tmp_path):
    +#     input_file = tmp_path / 'input.txt'
     
    +#     with input_file.open('wb') as f:
    +#         f.write(json.dumps(INPUT_PARAMETERS).encode())
     
    -def test_pickle_input_with_format_succeeds(tmp_path):
    -    input_file = create_pickle_input(tmp_path)
    -    output_file = run_docker_wrapper_with_echo_db(input_file, tmp_path)
    -    assert file_echoes_db(output_file)
    +#     output_file = run_docker_wrapper_with_echo_db(input_file, tmp_path)
    +#     assert file_echoes_db(output_file)
     
     
    -def test_wrapper_serializes_pickle_output(tmp_path):
    -    input_parameters = {
    -        'method': 'hello_world',
    -        'output_format': PICKLE_FORMAT
    -    }
    -    input_file = create_pickle_input(tmp_path, input_parameters)
    -
    -    output_file = run_docker_wrapper_with_echo_db(input_file, tmp_path)
    -
    -    with output_file.open('rb') as f:
    -        # Check whether the output starts with `pickle.` to indicate the pickle
    -        # data format
    -        assert f.read(len(PICKLE_FORMAT) + 1).decode() == f'{PICKLE_FORMAT}.'
    -
    -        result = pickle.loads(f.read())
    -        pd.testing.assert_frame_equal(SAMPLE_DB, result)
    -
    -
    -def test_wrapper_serializes_json_output(tmp_path):
    -    input_parameters = {'method': 'hello_world', 'output_format': JSON_FORMAT}
    -    input_file = create_pickle_input(tmp_path, input_parameters)
    -
    -    output_file = run_docker_wrapper_with_echo_db(input_file, tmp_path)
    -
    -    with output_file.open('rb') as f:
    -        # Check whether the data is preceded by json format string
    -        assert f.read(len(JSON_FORMAT) + 1).decode() == f'{JSON_FORMAT}.'
    -
    -        # Since the echo_db algorithm was triggered, output will be table that
    -        # can be read by pandas.
    -        result = pd.read_json(f.read())
    -        pd.testing.assert_frame_equal(SAMPLE_DB, result)
    -
    -
    -def create_pickle_input(tmp_path, input_parameters=None):
    -    if input_parameters is None:
    -        input_parameters = INPUT_PARAMETERS
    -
    -    input_file = tmp_path / 'input.pkl'
    -    with input_file.open('wb') as f:
    -        f.write(f'PICKLE{SEPARATOR}'.encode())
    -        f.write(pickle.dumps(input_parameters))
    -    return input_file
    +# def test_wrapper_serializes_json_output(tmp_path):
    +#     input_parameters = {'method': 'hello_world', 'output_format': JSON_FORMAT}
    +#     input_file = create_pickle_input(tmp_path, input_parameters)
     
    +#     output_file = run_docker_wrapper_with_echo_db(input_file, tmp_path)
     
    -def file_echoes_db(output_file):
    -    with output_file.open('rb') as f:
    -        result = pickle.load(f)
    -        target = SAMPLE_DB
    +#     with output_file.open('rb') as f:
    +#         # Check whether the data is preceded by json format string
    +#         assert f.read(len(JSON_FORMAT) + 1).decode() == f'{JSON_FORMAT}.'
     
    -        return target.equals(result)
    +#         # Since the echo_db algorithm was triggered, output will be table that
    +#         # can be read by pandas.
    +#         result = pd.read_json(f.read())
    +#         pd.testing.assert_frame_equal(SAMPLE_DB, result)
     
     
     def run_docker_wrapper_with_echo_db(input_file, tmp_path):
    @@ -169,7 +108,7 @@ def test_sparql_docker_wrapper_passes_dataframe(
         input_args = {'query': 'select *'}
     
         with input_file.open('wb') as f:
    -        pickle.dump(input_args, f)
    +        json.dumps(input_args, f)
     
         with token_file.open('w') as f:
             f.write(TOKEN)
    
  • vantage6-client/tests/test_serialization.py+3 26 modified
    @@ -1,14 +1,8 @@
    -import pickle
    -
     from pytest import mark
     
     from vantage6.tools import serialization
     import pandas as pd
     
    -from vantage6.tools.data_format import DataFormat
    -
    -JSON = 'json'
    -
     
     @mark.parametrize("data,target", [
         # Default serialization
    @@ -17,28 +11,11 @@
         ({'hello': 'goodbye'}, '{"hello": "goodbye"}'),
     
         # Pandas serialization
    -    (pd.DataFrame([[1, 2, 3]], columns=['one', 'two', 'three']), '{"one":{"0":1},"two":{"0":2},"three":{"0":3}}'),
    +    (pd.DataFrame([[1, 2, 3]], columns=['one', 'two', 'three']),
    +     '{"one":{"0":1},"two":{"0":2},"three":{"0":3}}'),
         (pd.Series([1, 2, 3]), '{"0":1,"1":2,"2":3}')
     ])
     def test_json_serialization(data, target):
    -    result = serialization.serialize(data, DataFormat.JSON)
    +    result = serialization.serialize(data)
     
         assert target == result.decode()
    -
    -
    -@mark.parametrize("data", [
    -    ({'key': 'value'}),
    -    (123),
    -    ([1, 2, 3]),
    -])
    -def test_pickle_serialization(data):
    -    pickled = serialization.serialize(data, DataFormat.PICKLE)
    -
    -    assert data == pickle.loads(pickled)
    -
    -
    -def test_pickle_serialization_pandas():
    -    data = pd.DataFrame([1, 2, 3])
    -    pickled = serialization.serialize(data, DataFormat.PICKLE)
    -
    -    pd.testing.assert_frame_equal(data, pickle.loads(pickled))
    
  • vantage6-client/vantage6/client/deserialization.py+0 115 removed
    @@ -1,115 +0,0 @@
    -"""
    -Module for deserialization of algorithm results.
    -
    -TODO: Merge with `vantage6.tools.deserialization` in `vantage6-toolkit` and move to `vantage6-common`
    -"""
    -
    -import json
    -import logging
    -import pickle
    -from .exceptions import DeserializationException
    -
    -_DATA_FORMAT_SEPARATOR = '.'
    -_MAX_FORMAT_STRING_LENGTH = 10
    -
    -logger = logging.getLogger(__name__)
    -
    -_deserializers = {}
    -
    -
    -def deserialize(file, data_format):
    -    """
    -    Lookup data_format in deserializer mapping and return the associated
    -    :param file:
    -    :param data_format:
    -    :return:
    -    """
    -    try:
    -        return _deserializers[data_format.lower()](file)
    -    except KeyError:
    -        raise Exception(f'Deserialization of {data_format} has not been implemented.')
    -
    -
    -def deserializer(data_format):
    -    """
    -    Register function as deserializer by adding it to the `_deserializers` map with key `data_format`.
    -
    -    :param data_format:
    -    :return:
    -    """
    -
    -    def decorator_deserializer(func):
    -        # Register deserialization function
    -        _deserializers[data_format] = func
    -
    -        # Return function without modifications so it can also be run without retrieving it from `_deserializers`.
    -        return func
    -
    -    return decorator_deserializer
    -
    -
    -@deserializer('json')
    -def deserialize_json(file):
    -    return json.loads(file)
    -
    -
    -@deserializer('pickle')
    -def deserialize_pickle(file):
    -    return pickle.loads(file)
    -
    -
    -def unpack_legacy_results(result):
    -    return pickle.loads(result.get("result"))
    -
    -
    -def load_data(input_bytes: bytes):
    -    """
    -    Try to read the specified data format and deserialize the rest of the stream accordingly. If this fails, assume
    -    the data format is pickle.
    -
    -    :param input_bytes:
    -    :return:
    -    """
    -    try:
    -        input_data = _read_formatted(input_bytes)
    -    except DeserializationException:
    -        logger.info('No data format specified. Assuming input data is pickle format')
    -        try:
    -            input_data = pickle.loads(input_bytes)
    -        except pickle.UnpicklingError:
    -            raise DeserializationException('Could not deserialize input')
    -    return input_data
    -
    -
    -def _read_formatted(input_bytes):
    -    data_format = str.join('', list(_read_data_format(input_bytes)))
    -    return deserialize(input_bytes[len(data_format) + 1:], data_format)
    -
    -
    -def _read_data_format(input_bytes):
    -    """
    -    Try to read the prescribed data format. The data format should be specified as follows: DATA_FORMAT.ACTUAL_BYTES.
    -    This function will attempt to read the string before the period. It will fail if the file is not in the right
    -    format.
    -
    -    :param input_bytes: Input file received from vantage infrastructure.
    -    :return:
    -    """
    -    success = False
    -
    -    for i in range(_MAX_FORMAT_STRING_LENGTH):
    -        try:
    -            char = input_bytes[i:i+1].decode()
    -        except UnicodeDecodeError:
    -            # We aren't reading a unicode string
    -            raise DeserializationException('No data format specified')
    -
    -        if char == _DATA_FORMAT_SEPARATOR:
    -            success = True
    -            break
    -        else:
    -            yield char
    -
    -    if not success:
    -        # The file didn't have a format prepended
    -        raise DeserializationException('No data format specified')
    
  • vantage6-client/vantage6/client/__init__.py+12 32 modified
    @@ -5,7 +5,6 @@
     client (client used by master algorithms) and the user client are derived.
     """
     import logging
    -import pickle
     import time
     import typing
     import jwt
    @@ -23,7 +22,7 @@
     from vantage6.common.globals import APPNAME
     from vantage6.common.encryption import RSACryptor, DummyCryptor
     from vantage6.common import WhoAmI
    -from vantage6.client import serialization, deserialization
    +from vantage6.tools import serialization, deserialization
     from vantage6.client.filter import post_filtering
     from vantage6.client.utils import print_qr_code, LogLevel
     
    @@ -438,9 +437,8 @@ def refresh_token(self) -> None:
         # TODO BvB 23-01-23 remove this method in v4+. It is only here for
         # backwards compatibility
         def post_task(self, name: str, image: str, collaboration_id: int,
    -                  input_='', description='',
    -                  organization_ids: list = None,
    -                  data_format=LEGACY, database: str = 'default') -> dict:
    +                  input_='', description='', organization_ids: list = None,
    +                  database: str = 'default') -> dict:
             """Post a new task at the server
     
             It will also encrypt `input_` for each receiving organization.
    @@ -461,11 +459,6 @@ def post_task(self, name: str, image: str, collaboration_id: int,
             organization_ids : list, optional
                 Ids of organizations (within the collaboration) that need to
                 execute this task, by default None
    -        data_format : str, optional
    -            Type of data format to use to send and receive
    -            data. possible values: 'json', 'pickle', 'legacy'. 'legacy'
    -            will use pickle serialization. Default is 'legacy'., by default
    -            LEGACY
             database : str, optional
                 Database label to use for the task, by default 'default'
     
    @@ -484,13 +477,8 @@ def post_task(self, name: str, image: str, collaboration_id: int,
             if organization_ids is None:
                 organization_ids = []
     
    -        if data_format == LEGACY:
    -            serialized_input = pickle.dumps(input_)
    -        else:
    -            # Data will be serialized to bytes in the specified data format.
    -            # It will be prepended with 'DATA_FORMAT.' in unicode.
    -            serialized_input = data_format.encode() + b'.' \
    -                + serialization.serialize(input_, data_format)
    +        # Data will be serialized in JSON.
    +        serialized_input = serialization.serialize(input_)
     
             organization_json_list = []
             for org_id in organization_ids:
    @@ -1871,7 +1859,6 @@ def list(self, initiator: int = None, initiating_user: int = None,
             @post_filtering(iterable=False)
             def create(self, collaboration: int, organizations: list, name: str,
                        image: str, description: str, input: dict,
    -                   data_format: str = LEGACY,
                        database: str = 'default') -> dict:
                 """Create a new task
     
    @@ -1890,8 +1877,6 @@ def create(self, collaboration: int, organizations: list, name: str,
                     Human readable description
                 input : dict
                     Algorithm input
    -            data_format : str, optional
    -                IO data format used, by default LEGACY
                 database: str, optional
                     Database name to be used at the node
     
    @@ -1902,7 +1887,7 @@ def create(self, collaboration: int, organizations: list, name: str,
                 """
                 return self.parent.post_task(name, image, collaboration, input,
                                              description, organizations,
    -                                         data_format, database)
    +                                         database)
     
             def delete(self, id_: int) -> dict:
                 """Delete a task
    @@ -2242,18 +2227,13 @@ def get_results(self, task_id: int):
             )
     
             res = []
    -        # Encryption is not done at the client level for the container.
    -        # Although I am not completely sure that the format is always
    -        # a pickle.
    -        # for result in results:
    -        #     self._decrypt_result(result)
    -        #     res.append(result.get("result"))
    -        #
             try:
    -            res = [pickle.loads(base64s_to_bytes(result.get("result")))
    -                   for result in results if result.get("result")]
    +            res = [
    +                json_lib.loads(base64s_to_bytes(result.get("result")).decode())
    +                for result in results if result.get("result")
    +            ]
             except Exception as e:
    -            self.log.error('Unable to unpickle result')
    +            self.log.error('Unable to load results')
                 self.log.debug(e)
     
             return res
    @@ -2351,7 +2331,7 @@ def post_task(self, name: str, image: str, collaboration_id: int,
             """
             self.log.debug("post task without encryption (is handled by proxy)")
     
    -        serialized_input = bytes_to_base64s(pickle.dumps(input_))
    +        serialized_input = bytes_to_base64s(serialization.serialize(input_))
     
             organization_json_list = []
             for org_id in organization_ids:
    
  • vantage6-client/vantage6/client/serialization.py+0 45 removed
    @@ -1,45 +0,0 @@
    -import json
    -import pickle
    -
    -_serializers = {}
    -
    -
    -def serialize(data, data_format) -> bytes:
    -    """
    -    Serialize data using the specified format
    -    :param data: the data to be serialized
    -    :param data_format: the desired data format. Valid options are 'json', 'pickle'.
    -    :return: a bytes-like object in the specified serialization format
    -    """
    -    try:
    -        return _serializers[data_format.lower()](data)
    -    except KeyError:
    -        raise Exception(f'Serialization of {data_format} has not been implemented.')
    -
    -
    -def serializer(data_format):
    -    """
    -    Register function as serializer by adding it to the `_serializers` map with key `data_format`.
    -
    -    :param data_format:
    -    :return:
    -    """
    -
    -    def decorator_serializer(func):
    -        # Register deserialization function
    -        _serializers[data_format] = func
    -
    -        # Return function without modifications so it can also be run without retrieving it from `_serializers`.
    -        return func
    -
    -    return decorator_serializer
    -
    -
    -@serializer('json')
    -def serialize_json(file) -> bytes:
    -    return json.dumps(file).encode()
    -
    -
    -@serializer('pickle')
    -def serialize_pickle(file) -> bytes:
    -    return pickle.dumps(file)
    
  • vantage6-client/vantage6/tools/data_format.py+0 16 removed
    @@ -1,16 +0,0 @@
    -"""
    -Class DataFormat
    -
    -This Enum contains all the possible dataformats that can be used to serialize
    -or deserialize the data to and from the algorithm wrapper.
    -
    -When serialization to an additional data format is implemented it should be
    -added here.
    -"""
    -from enum import Enum
    -
    -
    -# TODO: Should ideally be shared with the client as well
    -class DataFormat(Enum):
    -    JSON = 'json'
    -    PICKLE = 'pickle'
    
  • vantage6-client/vantage6/tools/deserialization.py+11 48 modified
    @@ -1,56 +1,19 @@
     import json
    -import pickle
    +from typing import BinaryIO
     
    -from vantage6.tools.data_format import DataFormat
     
    -_deserializers = {}
    -
    -
    -def deserialize(file, data_format: DataFormat):
    -    """
    -    Lookup data_format in deserializer mapping and return the associated
    -    function.
    -
    -    :param file:
    -    :param data_format:
    -    :return:
    -    """
    -    try:
    -        return _deserializers[data_format](file)
    -    except KeyError:
    -        raise Exception(
    -            f'Deserialization of {data_format} has not been implemented.'
    -        )
    -
    -
    -def deserializer(data_format):
    +def deserialize(file: BinaryIO):
         """
    -    Register function as deserializer by adding it to the `_deserializers` map
    -    with key `data_format`.
    +    Deserialize data from a file using JSON
     
    -    These functions should receive a file-like as input and provide the data as
    -    output in the format specified with the decorator.
    +    Parameters
    +    ----------
    +    file: BinaryIO
    +        The file to deserialize the data from
     
    -    :param data_format:
    -    :return:
    +    Returns
    +    -------
    +    str
    +        The deserialized data
         """
    -
    -    def decorator_deserializer(func):
    -        # Register deserialization function
    -        _deserializers[data_format] = func
    -
    -        # Return function without modifications so it can also be run without
    -        # retrieving it from `_deserializers`.
    -        return func
    -
    -    return decorator_deserializer
    -
    -
    -@deserializer(DataFormat.JSON)
    -def deserialize_json(file):
         return json.load(file)
    -
    -
    -@deserializer(DataFormat.PICKLE)
    -def deserialize_pickle(file):
    -    return pickle.load(file)
    
  • vantage6-client/vantage6/tools/docker_wrapper.py+1 1 modified
    @@ -5,4 +5,4 @@
         sparql_wrapper,
         parquet_wrapper,
         multidb_wrapper
    -)
    \ No newline at end of file
    +)
    
  • vantage6-client/vantage6/tools/mock_client.py+5 4 modified
    @@ -1,8 +1,10 @@
     import pandas
    -import pickle
    +import json
     
     from importlib import import_module
     
    +from vantage6.tools import serialization
    +
     
     class ClientMockProtocol:
         """
    @@ -78,7 +80,7 @@ def create_new_task(self, input_: dict,
     
                 idx = 999  # we dont need this now
                 results.append(
    -                {"id": idx, "result": pickle.dumps(result)}
    +                {"id": idx, "result": serialization.serialize(result)}
                 )
     
             id_ = len(self.tasks)
    @@ -123,8 +125,7 @@ def get_results(self, task_id: int) -> list[dict]:
             task = self.tasks[task_id]
             results = []
             for result in task.get("results"):
    -            print(result)
    -            res = pickle.loads(result.get("result"))
    +            res = json.loads(result.get("result"))
                 results.append(res)
     
             return results
    
  • vantage6-client/vantage6/tools/serialization.py+12 65 modified
    @@ -1,73 +1,20 @@
     import json
    -import pickle
     
    -import pandas as pd
    -from vantage6.tools.data_format import DataFormat
    -from vantage6.tools.util import info
     
    -_serializers = {}
    -
    -
    -def serialize(data, data_format: DataFormat):
    -    """
    -    Look up serializer for `data_format` and use this to serialize `data`.
    -
    -    :param data:
    -    :param data_format:
    -    :return:
    +# TODO BvB 2023-02-03: I feel this function could be given a better name. And
    +# it might not have to be in a separate file.
    +def serialize(data: any) -> bytes:
         """
    -    return _serializers[data_format](data)
    +    Serialize data using the specified format
     
    +    Parameters
    +    ----------
    +    data: any
    +        The data to be serialized
     
    -def serializer(data_format: DataFormat):
    +    Returns
    +    -------
    +    bytes
    +        A JSON-serialized and then encoded bytes object representing the data
         """
    -    Register function as serializer by adding it to the `_serializers` map with
    -    key `data_format`. This function should ideally support a multitude of
    -    python objects.
    -
    -    There are two ways to extend serialization functionality:
    -
    -    1. Create and register a new serialization function for a previously
    -       unsupported serialization format.
    -    2. Implement support for additional objects within an existing serializer
    -       function.
    -
    -    :param data_format:
    -    :return:
    -    """
    -
    -    def decorator_serializer(func):
    -        # Register serialization function
    -        _serializers[data_format] = func
    -
    -        # Return function without modifications so it can also be run without
    -        # retrieving it from `_serializers`.
    -        return func
    -
    -    return decorator_serializer
    -
    -
    -@serializer(DataFormat.JSON)
    -def serialize_to_json(data):
    -    info(f'Serializing type {type(data)} to json')
    -
    -    if isinstance(data, pd.DataFrame) | isinstance(data, pd.Series):
    -        return _serialize_pandas(data)
    -
    -    return _default_serialization(data)
    -
    -
    -def _default_serialization(data):
    -    info('Using default json serialization')
         return json.dumps(data).encode()
    -
    -
    -def _serialize_pandas(data):
    -    info('Running pandas json serialization')
    -    return data.to_json().encode()
    -
    -
    -@serializer(DataFormat.PICKLE)
    -def serialize_to_pickle(data):
    -    info('Serializing to pickle')
    -    return pickle.dumps(data)
    
  • vantage6-node/vantage6/node/docker/task_manager.py+3 2 modified
    @@ -2,7 +2,6 @@
     to be cleaned at some point. """
     import logging
     import os
    -import pickle
     import docker.errors
     import json
     
    @@ -285,11 +284,13 @@ def _run_algorithm(self) -> list[dict]:
                 )
     
             # try reading docker input
    +        # FIXME BvB 2023-02-03: why do we read docker input here? It is never
    +        # really used below. Should it?
             deserialized_input = None
             if self.docker_input:
                 self.log.debug("Deserialize input")
                 try:
    -                deserialized_input = pickle.loads(self.docker_input)
    +                deserialized_input = json.loads(self.docker_input)
                 except Exception:
                     pass
     
    

Vulnerability mechanics

Generated by null/stub on May 9, 2026. Inputs: CWE entries + fix-commit diffs from this CVE's patches. Citations validated against bundle.

References

7

News mentions

0

No linked articles in our index yet.