VYPR
High severityNVD Advisory· Published Jan 30, 2024· Updated May 29, 2025

Remote code execution

CVE-2024-21649

Description

The vantage6 technology enables to manage and deploy privacy enhancing technologies like Federated Learning (FL) and Multi-Party Computation (MPC). Prior to 4.2.0, authenticated users could inject code into algorithm environment variables, resulting in remote code execution. This vulnerability is patched in 4.2.0.

Affected packages

Versions sourced from the GitHub Security Advisory.

PackageAffected versionsPatched versions
vantage6PyPI
< 4.2.04.2.0

Affected products

1

Patches

1
eac19db73714

Merge pull request from GHSA-w9h2-px87-74vx

https://github.com/vantage6/vantage6Bart van BeusekomJan 18, 2024via ghsa
11 files changed · +204 48
  • docs/algorithms/code_details.rst+7 5 modified
    @@ -167,16 +167,18 @@ interaction between the algorithm and the node. The wrappers are responsible
     for reading the input data from the data source and supplying it to the algorithm.
     They also take care of writing the results back to the data source.
     
    -As algorithm developer, you do not have to worry about the wrappers. The only
    -thing you have to make sure is that the following line is present at the end of
    +As algorithm developer, you do not have to worry about the wrappers. The main
    +point you have to make sure is that the following line is present at the end of
     your ``Dockerfile``:
     
     .. code:: docker
     
    -    CMD python -c "from vantage6.algorithm.tools.wrap import wrap_algorithm; wrap_algorithm('${PKG_NAME}')"
    +    CMD python -c "from vantage6.algorithm.tools.wrap import wrap_algorithm; wrap_algorithm()"
     
    -where ``${PKG_NAME}`` is the name of your algorithm package. The ``wrap_algorithm``
    -function will wrap your algorithm.
    +The ``wrap_algorithm`` function will wrap your algorithm to ensure that the
    +vantage6 algorithm tools are available to it. Note that the ``wrap_algorithm``
    +function will also read the ``PKG_NAME`` environment variable from the
    +``Dockerfile`` so make sure that this variable is set correctly.
     
     For R, the command is slightly different:
     
    
  • docs/algorithms/develop.rst+10 3 modified
    @@ -97,14 +97,21 @@ as follows:
     
     .. code:: python
     
    -   import os
    +   from vantage6.algorithm.tools.util import get_env_var
     
        def my_function():
    -       input_file = os.environ["INPUT_FILE"]
    -       token_file = os.environ["DEFAULT_DATABASE_URI"]
    +       input_file = get_env_var("INPUT_FILE")
    +       token_file = get_env_var("DEFAULT_DATABASE_URI")
     
            # do something with the input file and database URI
     
    +.. note::
    +
    +   The ``get_env_var`` function is used here rather than the standard
    +   ``os.environ`` dictionary because the environment variables are encoded
    +   for security purposes. The ``get_env_var`` function will decode the
    +   environment variable for you.
    +
     The environment variables that you specify in the node configuration file
     can be used in the exact same manner. You can view all environment variables
     that are available to your algorithm by ``print(os.environ)``.
    
  • vantage6-algorithm-tools/vantage6/algorithm/tools/decorators.py+24 24 modified
    @@ -10,7 +10,7 @@
     
     from vantage6.algorithm.client import AlgorithmClient
     from vantage6.algorithm.tools.mock_client import MockAlgorithmClient
    -from vantage6.algorithm.tools.util import info, error, warn
    +from vantage6.algorithm.tools.util import info, error, warn, get_env_var
     from vantage6.algorithm.tools.wrappers import load_data
     from vantage6.algorithm.tools.preprocessing import preprocess_data
     
    @@ -89,12 +89,12 @@ def decorator(*args, mock_client: MockAlgorithmClient = None,
                 if mock_client is not None:
                     return func(mock_client, *args, **kwargs)
                 # read server address from the environment
    -            host = os.environ["HOST"]
    -            port = os.environ["PORT"]
    -            api_path = os.environ["API_PATH"]
    +            host = get_env_var("HOST")
    +            port = get_env_var("PORT")
    +            api_path = get_env_var("API_PATH")
     
                 # read token from the environment
    -            token_file = os.environ["TOKEN_FILE"]
    +            token_file = get_env_var("TOKEN_FILE")
                 info("Reading token")
                 with open(token_file) as fp:
                     token = fp.read().strip()
    @@ -184,7 +184,7 @@ def decorator(*args, mock_data: list[pd.DataFrame] = None,
     
                     # do any data preprocessing here
                     info(f"Applying preprocessing for database '{label}'")
    -                env_prepro = os.environ.get(f"{label.upper()}_PREPROCESSING")
    +                env_prepro = get_env_var(f"{label.upper()}_PREPROCESSING")
                     if env_prepro is not None:
                         preprocess = json.loads(env_prepro)
                         data_ = preprocess_data(data_, preprocess)
    @@ -291,7 +291,7 @@ def decorator(*args, **kwargs) -> callable:
             >>> def my_algorithm(metadata: RunMetaData, <other arguments>):
             >>>     pass
             """
    -        token_file = os.environ["TOKEN_FILE"]
    +        token_file = get_env_var("TOKEN_FILE")
             info("Reading token")
             with open(token_file) as fp:
                 token = fp.read().strip()
    @@ -304,10 +304,10 @@ def decorator(*args, **kwargs) -> callable:
                 node_id=payload["node_id"],
                 collaboration_id=payload["collaboration_id"],
                 organization_id=payload["organization_id"],
    -            temporary_directory=Path(os.environ["TEMPORARY_FOLDER"]),
    -            output_file=Path(os.environ["OUTPUT_FILE"]),
    -            input_file=Path(os.environ["INPUT_FILE"]),
    -            token_file=Path(os.environ["TOKEN_FILE"])
    +            temporary_directory=Path(get_env_var("TEMPORARY_FOLDER")),
    +            output_file=Path(get_env_var("OUTPUT_FILE")),
    +            input_file=Path(get_env_var("INPUT_FILE")),
    +            token_file=Path(get_env_var("TOKEN_FILE"))
             )
             return func(metadata, *args, **kwargs)
         return decorator
    @@ -336,11 +336,11 @@ def get_ohdsi_metadata(label: str) -> OHDSIMetaData:
         for var in expected_env_vars:
             _check_environment_var_exists_or_exit(f'{label_}_DB_PARAM_{var}')
     
    -    tmp = Path(os.environ["TEMPORARY_FOLDER"])
    +    tmp = Path(get_env_var("TEMPORARY_FOLDER"))
         metadata = OHDSIMetaData(
    -        database=os.environ[f"{label_}_DB_PARAM_CDM_DATABASE"],
    -        cdm_schema=os.environ[f"{label_}_DB_PARAM_CDM_SCHEMA"],
    -        results_schema=os.environ[f"{label_}_DB_PARAM_RESULTS_SCHEMA"],
    +        database=get_env_var(f"{label_}_DB_PARAM_CDM_DATABASE"),
    +        cdm_schema=get_env_var(f"{label_}_DB_PARAM_CDM_SCHEMA"),
    +        results_schema=get_env_var(f"{label_}_DB_PARAM_RESULTS_SCHEMA"),
             incremental_folder=tmp / "incremental",
             cohort_statistics_folder=tmp / "cohort_statistics",
             export_folder=tmp / "export"
    @@ -399,10 +399,10 @@ def _create_omop_database_connection(label: str) -> callable:
             _check_environment_var_exists_or_exit(f'{label_}_DB_PARAM_{var}')
     
         info("Reading OHDSI environment variables")
    -    dbms = os.environ[f"{label_}_DB_PARAM_DBMS"]
    -    uri = os.environ[f"{label_}_DATABASE_URI"]
    -    user = os.environ[f"{label_}_DB_PARAM_USER"]
    -    password = os.environ[f"{label_}_DB_PARAM_PASSWORD"]
    +    dbms = get_env_var(f"{label_}_DB_PARAM_DBMS")
    +    uri = get_env_var(f"{label_}_DATABASE_URI")
    +    user = get_env_var(f"{label_}_DB_PARAM_USER")
    +    password = get_env_var(f"{label_}_DB_PARAM_PASSWORD")
         info(f' - dbms: {dbms}')
         info(f' - uri: {uri}')
         info(f' - user: {user}')
    @@ -441,21 +441,21 @@ def _get_data_from_label(label: str) -> pd.DataFrame:
             Data from the database
         """
         # Load the input data from the input file - this may e.g. include the
    -    database_uri = os.environ[f"{label.upper()}_DATABASE_URI"]
    +    database_uri = get_env_var(f"{label.upper()}_DATABASE_URI")
         info(f"Using '{database_uri}' with label '{label}' as database")
     
         # Get the database type from the environment variable, this variable is
         # set by the vantage6 node based on its configuration file.
    -    database_type = os.environ.get(
    +    database_type = get_env_var(
             f"{label.upper()}_DATABASE_TYPE", "csv").lower()
     
         # Load the data based on the database type. Try to provide environment
         # variables that should be available for some data types.
         return load_data(
             database_uri,
             database_type,
    -        query=os.environ.get(f"{label.upper()}_QUERY"),
    -        sheet_name=os.environ.get(f"{label.upper()}_SHEET_NAME")
    +        query=get_env_var(f"{label.upper()}_QUERY"),
    +        sheet_name=get_env_var(f"{label.upper()}_SHEET_NAME")
         )
     
     
    @@ -470,7 +470,7 @@ def _get_user_database_labels() -> list[str]:
         """
         # read the labels that the user requested, which is a comma
         # separated list of labels.
    -    labels = os.environ["USER_REQUESTED_DATABASE_LABELS"]
    +    labels = get_env_var("USER_REQUESTED_DATABASE_LABELS")
         return labels.split(',')
     
     
    
  • vantage6-algorithm-tools/vantage6/algorithm/tools/util.py+31 0 modified
    @@ -1,4 +1,7 @@
     import sys
    +import os
    +import base64
    +from vantage6.common.globals import STRING_ENCODING, ENV_VAR_EQUALS_REPLACEMENT
     
     
     def info(msg: str) -> None:
    @@ -35,3 +38,31 @@ def error(msg: str) -> None:
             Error message to be printed
         """
         sys.stdout.write(f"error > {msg}\n")
    +
    +
    +def get_env_var(var_name: str, default: str | None = None) -> str:
    +    """
    +    Get the value of an environment variable. Environment variables are encoded
    +    by the node so they need to be decoded here.
    +
    +    Note that this decoding follows the reverse of the encoding in the node:
    +    first replace '=' back and then decode the base32 string.
    +
    +    Parameters
    +    ----------
    +    var_name : str
    +        Name of the environment variable
    +
    +    Returns
    +    -------
    +    var_value : str | None
    +        Value of the environment variable, or default value if not found
    +    """
    +    try:
    +        encoded_env_var_value = \
    +            os.environ[var_name].replace(
    +                ENV_VAR_EQUALS_REPLACEMENT, "="
    +            ).encode(STRING_ENCODING)
    +        return base64.b32decode(encoded_env_var_value).decode(STRING_ENCODING)
    +    except KeyError:
    +        return default
    
  • vantage6-algorithm-tools/vantage6/algorithm/tools/wrap.py+7 4 modified
    @@ -6,11 +6,11 @@
     
     from vantage6.common.client import deserialization
     from vantage6.common import serialization
    -from vantage6.algorithm.tools.util import info, error
    +from vantage6.algorithm.tools.util import info, error, get_env_var
     from vantage6.algorithm.tools.exceptions import DeserializationException
     
     
    -def wrap_algorithm(module: str, log_traceback: bool = True) -> None:
    +def wrap_algorithm(log_traceback: bool = True) -> None:
         """
         Wrap an algorithm module to provide input and output handling for the
         vantage6 infrastructure.
    @@ -41,10 +41,13 @@ def wrap_algorithm(module: str, log_traceback: bool = True) -> None:
             default False. Algorithm developers should set this to False if
             the error messages may contain sensitive information. By default True.
         """
    +    # get the module name from the environment variable. Note that this env var
    +    # is set in the Dockerfile and is therefore not encoded.
    +    module = os.environ.get("PKG_NAME")
         info(f"wrapper for {module}")
     
         # read input from the mounted input file.
    -    input_file = os.environ["INPUT_FILE"]
    +    input_file = get_env_var("INPUT_FILE")
         info(f"Reading input file {input_file}")
         input_data = load_input(input_file)
     
    @@ -54,7 +57,7 @@ def wrap_algorithm(module: str, log_traceback: bool = True) -> None:
     
         # write output from the method to mounted output file. Which will be
         # transferred back to the server by the node-instance.
    -    output_file = os.environ["OUTPUT_FILE"]
    +    output_file = get_env_var("OUTPUT_FILE")
         info(f"Writing output to {output_file}")
     
         _write_output(output, output_file)
    
  • vantage6-common/vantage6/common/globals.py+3 0 modified
    @@ -36,3 +36,6 @@
     
     # The basics image can be used (mainly by the UI) to collect column names
     BASIC_PROCESSING_IMAGE = 'harbor2.vantage6.ai/algorithms/basics'
    +
    +# Character to replace '=' with in encoded environment variables
    +ENV_VAR_EQUALS_REPLACEMENT = "!"
    
  • vantage6-common/vantage6/common/serialization.py+2 1 modified
    @@ -1,4 +1,5 @@
     import json
    +from vantage6.common.globals import STRING_ENCODING
     
     
     # TODO BvB 2023-02-03: I feel this function could be given a better name. And
    @@ -17,4 +18,4 @@ def serialize(data: any) -> bytes:
         bytes
             A JSON-serialized and then encoded bytes object representing the data
         """
    -    return json.dumps(data).encode()
    +    return json.dumps(data).encode(STRING_ENCODING)
    
  • vantage6-node/vantage6/node/docker/docker_manager.py+3 3 modified
    @@ -485,8 +485,8 @@ def get_result(self) -> Result:
                             self.active_tasks.remove(task)
                             break
                     except AlgorithmContainerNotFound:
    -                    self.log.exception(f'Failed to find container for '
    -                                       f'result {task.result_id}')
    +                    self.log.exception('Failed to find container for '
    +                                       'algorithm with run_id %s', task.run_id)
                         self.failed_tasks.append(task)
                         self.active_tasks.remove(task)
                         break
    @@ -517,7 +517,7 @@ def get_result(self) -> Result:
             else:
                 # at least one task failed to start
                 finished_task = self.failed_tasks.pop()
    -            logs = 'Container failed'
    +            logs = 'Container failed. Check node logs for details'
                 results = b''
     
             return Result(
    
  • vantage6-node/vantage6/node/docker/task_manager.py+105 4 modified
    @@ -4,18 +4,21 @@
     import os
     import docker.errors
     import json
    +import base64
     
     from pathlib import Path
     
    -from vantage6.common.globals import APPNAME
    +from vantage6.common.globals import (
    +    APPNAME, ENV_VAR_EQUALS_REPLACEMENT, STRING_ENCODING
    +)
     from vantage6.common.docker.addons import (
         remove_container_if_exists, remove_container, pull_if_newer,
         running_in_docker
     )
     from vantage6.common.docker.network_manager import NetworkManager
     from vantage6.common.task_status import TaskStatus
     from vantage6.node.util import get_parent_id
    -from vantage6.node.globals import ALPINE_IMAGE
    +from vantage6.node.globals import ALPINE_IMAGE, ENV_VARS_NOT_SETTABLE_BY_NODE
     from vantage6.node.docker.vpn_manager import VPNManager
     from vantage6.node.docker.squid import Squid
     from vantage6.node.docker.docker_base import DockerBaseManager
    @@ -124,7 +127,7 @@ def is_finished(self) -> bool:
             """
             try:
                 self.container.reload()
    -        except docker.errors.NotFound:
    +        except (docker.errors.NotFound, AttributeError):
                 self.log.error("Container not found")
                 self.log.debug(f"- task id: {self.task_id}")
                 self.log.debug(f"- result id: {self.task_id}")
    @@ -530,12 +533,110 @@ def _setup_environment_vars(self, algorithm_env: dict,
                 db_labels.append(label)
             environment_variables['DB_LABELS'] = ','.join(db_labels)
     
    -        self.log.debug(f"environment: {environment_variables}")
     
             # Load additional environment variables
             if algorithm_env:
                 environment_variables = \
                     {**environment_variables, **algorithm_env}
                 self.log.info('Custom environment variables are loaded!')
                 self.log.debug(f"custom environment: {algorithm_env}")
    +
    +        # validate whether environment variables don't contain any illegal
    +        # characters
    +        self._validate_environment_variables(environment_variables)
    +
    +        # print the environment before encoding it so that the user can see
    +        # what is passed to the container
    +        self.log.debug(f"environment: {environment_variables}")
    +
    +        # encode environment variables to prevent special characters from being
    +        # possibly code injection
    +        environment_variables = self._encode_environment_variables(
    +            environment_variables)
    +
             return environment_variables
    +
    +    def _validate_environment_variables(self,
    +                                        environment_variables: dict) -> None:
    +        """
    +        Check whether environment variables don't contain any illegal
    +        characters
    +
    +        Parameters
    +        ----------
    +        environment_variables: dict
    +            Environment variables required to run algorithm
    +
    +        Raises
    +        ------
    +        PermanentAlgorithmStartFail
    +            If environment variables contain illegal characters
    +        """
    +        msg = None
    +        for key in environment_variables:
    +            if not key.isidentifier():
    +                msg = (
    +                    f"Environment variable '{key}' is invalid: environment "
    +                    " variable names should only contain number, letters and "
    +                    " underscores, and start with a letter."
    +                )
    +            elif key in ENV_VARS_NOT_SETTABLE_BY_NODE:
    +                msg = (
    +                    f"Environment variable '{key}' cannot be set: this "
    +                    "variable is set in the algorithm Dockerfile and cannot "
    +                    "be overwritten."
    +                )
    +            if msg:
    +                self.status = TaskStatus.FAILED
    +                self.log.error(msg)
    +                raise PermanentAlgorithmStartFail(msg)
    +
    +    def _encode_environment_variables(self, environment_variables: dict) \
    +            -> dict:
    +        """
    +        Encode environment variable values to ensure that special characters
    +        are not interpretable as code while transferring them to the algorithm
    +        container.
    +
    +        Parameters
    +        ----------
    +        environment_variables: dict
    +            Environment variables required to run algorithm
    +
    +        Returns
    +        -------
    +        dict:
    +            Environment variables with encoded values
    +        """
    +        def _encode(string: str) -> str:
    +            """ Encode env var value
    +
    +            We first encode to bytes, then to b32 and then decode to a string.
    +            Finally, '=' is replaced by less sensitve characters to prevent
    +            issues with interpreting the encoded string in the env var value.
    +
    +            Parameters
    +            ----------
    +            string: str
    +                String to be encoded
    +
    +            Returns
    +            -------
    +            str:
    +                Encoded string
    +
    +            Examples
    +            --------
    +            >>> _encode("abc")
    +            'MFRGG!!!'
    +            """
    +            return base64.b32encode(
    +                string.encode(STRING_ENCODING)
    +            ).decode(STRING_ENCODING).replace('=', ENV_VAR_EQUALS_REPLACEMENT)
    +
    +        self.log.debug("Encoding environment variables")
    +
    +        encoded_environment_variables = {}
    +        for key, val in environment_variables.items():
    +            encoded_environment_variables[key] = _encode(val)
    +        return encoded_environment_variables
    
  • vantage6-node/vantage6/node/globals.py+4 0 modified
    @@ -46,3 +46,7 @@
     #   SQUID RELATED CONSTANTS
     #
     SQUID_IMAGE = "harbor2.vantage6.ai/infrastructure/squid"
    +
    +# Environment variables that should be set in the Dockerfile and that may not
    +# be overwritten by the user.
    +ENV_VARS_NOT_SETTABLE_BY_NODE = ["PKG_NAME"]
    
  • vantage6/vantage6/cli/node/start.py+8 4 modified
    @@ -10,10 +10,7 @@
     
     from colorama import Fore, Style
     
    -from vantage6.common import (
    -    warning, error, info, debug,
    -    get_database_config
    -)
    +from vantage6.common import warning, error, info, debug, get_database_config
     from vantage6.common.globals import (
         APPNAME,
         DEFAULT_DOCKER_REGISTRY,
    @@ -241,6 +238,13 @@ def cli_node_start(name: str, config: str, system_folders: bool, image: str,
         db_labels = [db['label'] for db in ctx.databases]
         for label in db_labels:
     
    +        # check that label contains only valid characters
    +        if not label.isidentifier():
    +            error(f"Database label {Fore.RED}{label}{Style.RESET_ALL} contains"
    +                  " invalid characters. Only letters, numbers, and underscores"
    +                  " are allowed, and it cannot start with a number.")
    +            exit(1)
    +
             db_config = get_database_config(ctx.databases, label)
             uri = db_config['uri']
             db_type = db_config['type']
    

Vulnerability mechanics

Generated by null/stub on May 9, 2026. Inputs: CWE entries + fix-commit diffs from this CVE's patches. Citations validated against bundle.

References

5

News mentions

0

No linked articles in our index yet.