VYPR
Moderate severityNVD Advisory· Published May 14, 2021· Updated Aug 3, 2024

Interpreter crash from `tf.io.decode_raw`

CVE-2021-29614

Description

TensorFlow is an end-to-end open source platform for machine learning. The implementation of tf.io.decode_raw produces incorrect results and crashes the Python interpreter when combining fixed_length and wider datatypes. The implementation of the padded version(https://github.com/tensorflow/tensorflow/blob/1d8903e5b167ed0432077a3db6e462daf781d1fe/tensorflow/core/kernels/decode_padded_raw_op.cc) is buggy due to a confusion about pointer arithmetic rules. First, the code computes(https://github.com/tensorflow/tensorflow/blob/1d8903e5b167ed0432077a3db6e462daf781d1fe/tensorflow/core/kernels/decode_padded_raw_op.cc#L61) the width of each output element by dividing the fixed_length value to the size of the type argument. The fixed_length argument is also used to determine the size needed for the output tensor(https://github.com/tensorflow/tensorflow/blob/1d8903e5b167ed0432077a3db6e462daf781d1fe/tensorflow/core/kernels/decode_padded_raw_op.cc#L63-L79). This is followed by reencoding code(https://github.com/tensorflow/tensorflow/blob/1d8903e5b167ed0432077a3db6e462daf781d1fe/tensorflow/core/kernels/decode_padded_raw_op.cc#L85-L94). The erroneous code is the last line above: it is moving the out_data pointer by fixed_length * sizeof(T) bytes whereas it only copied at most fixed_length bytes from the input. This results in parts of the input not being decoded into the output. Furthermore, because the pointer advance is far wider than desired, this quickly leads to writing to outside the bounds of the backing data. This OOB write leads to interpreter crash in the reproducer mentioned here, but more severe attacks can be mounted too, given that this gadget allows writing to periodically placed locations in memory. The fix will be included in TensorFlow 2.5.0. We will also cherrypick this commit on TensorFlow 2.4.2, TensorFlow 2.3.3, TensorFlow 2.2.3 and TensorFlow 2.1.4, as these are also affected and still in supported range.

Affected packages

Versions sourced from the GitHub Security Advisory.

PackageAffected versionsPatched versions
tensorflowPyPI
< 2.1.42.1.4
tensorflowPyPI
>= 2.2.0, < 2.2.32.2.3
tensorflowPyPI
>= 2.3.0, < 2.3.32.3.3
tensorflowPyPI
>= 2.4.0, < 2.4.22.4.2
tensorflow-cpuPyPI
< 2.1.42.1.4
tensorflow-cpuPyPI
>= 2.2.0, < 2.2.32.2.3
tensorflow-cpuPyPI
>= 2.3.0, < 2.3.32.3.3
tensorflow-cpuPyPI
>= 2.4.0, < 2.4.22.4.2
tensorflow-gpuPyPI
< 2.1.42.1.4
tensorflow-gpuPyPI
>= 2.2.0, < 2.2.32.2.3
tensorflow-gpuPyPI
>= 2.3.0, < 2.3.32.3.3
tensorflow-gpuPyPI
>= 2.4.0, < 2.4.22.4.2

Affected products

1

Patches

1
698e01511f62

Fix `tf.io.decode_raw` bugs and update documentation.

https://github.com/tensorflow/tensorflowMihai MaruseacApr 30, 2021via ghsa
2 files changed · +43 27
  • tensorflow/core/kernels/decode_padded_raw_op.cc+12 9 modified
    @@ -19,6 +19,7 @@ limitations under the License.
     #include "tensorflow/core/framework/common_shape_fns.h"
     #include "tensorflow/core/framework/op.h"
     #include "tensorflow/core/framework/op_kernel.h"
    +#include "tensorflow/core/framework/op_requires.h"
     #include "tensorflow/core/framework/shape_inference.h"
     
     namespace tensorflow {
    @@ -83,14 +84,13 @@ class DecodePaddedRawOp : public OpKernel {
         // can copy the memory directly.
         if (!convert_data_endianness_ || sizeof(T) == 1) {
           for (int64 i = 0; i < flat_in.size(); ++i) {
    -        const T* in_data = reinterpret_cast<const T*>(flat_in(i).data());
    -
    -        if (flat_in(i).size() > fixed_length) {
    -          memcpy(out_data, in_data, fixed_length);
    -        } else {
    -          memcpy(out_data, in_data, flat_in(i).size());
    -        }
    -        out_data += fixed_length;
    +        const auto to_copy =
    +            std::min(flat_in(i).size(), static_cast<size_t>(fixed_length));
    +        memcpy(out_data, flat_in(i).data(), to_copy);
    +        // Note: increase out_data by width since it's already of type T* so
    +        // each shift amount is implicitly multiplied by sizeof(T) according to
    +        // pointer arithmetic rules.
    +        out_data += width;
           }
         } else {
           // Otherwise, the data is not in the host's byte order, and rather than a
    @@ -105,7 +105,10 @@ class DecodePaddedRawOp : public OpKernel {
                  p_in += sizeof(T), p_out += sizeof(T)) {
               std::reverse_copy(p_in, p_in + sizeof(T), p_out);
             }
    -        out_data += fixed_length;
    +        // Note: increase out_data by width since it's already of type T* so
    +        // each shift amount is implicitly multiplied by sizeof(T) according to
    +        // pointer arithmetic rules.
    +        out_data += width;
           }
         }
       }
    
  • tensorflow/python/ops/parsing_ops.py+31 18 modified
    @@ -850,8 +850,8 @@ def decode_raw(input_bytes,
                    name=None):
       r"""Convert raw bytes from input tensor into numeric tensors.
     
    -  The input tensor is interpreted as a sequence of bytes. These bytes are then
    -  decoded as numbers in the format specified by `out_type`.
    +  Every component of the input tensor is interpreted as a sequence of bytes.
    +  These bytes are then decoded as numbers in the format specified by `out_type`.
     
       >>> tf.io.decode_raw(tf.constant("1"), tf.uint8)
       <tf.Tensor: shape=(1,), dtype=uint8, numpy=array([49], dtype=uint8)>
    @@ -909,22 +909,35 @@ def decode_raw(input_bytes,
       >>> tf.io.decode_raw(tf.constant(["1212"]), tf.uint16, fixed_length=4)
       <tf.Tensor: shape=(1, 2), dtype=uint16, numpy=array([[12849, 12849]], ...
     
    -  Note: There is currently a bug in `fixed_length` that can result in data loss:
    -
    -  >>> # truncated to length of type as it matches fixed_length
    -  >>> tf.io.decode_raw(tf.constant(["1212"]), tf.uint16, fixed_length=2)
    -  <tf.Tensor: shape=(1, 1), dtype=uint16, numpy=array([[12849]], dtype=uint16)>
    -  >>> # ignores the second component
    -  >>> tf.io.decode_raw(tf.constant(["12","34"]), tf.uint16, fixed_length=2)
    -  <tf.Tensor: shape=(2, 1), dtype=uint16, numpy=
    -  array([[12849],
    -         [    0]], dtype=uint16)>
    -  >>> tf.io.decode_raw(tf.constant(["12","34"]), tf.uint16, fixed_length=4)
    -  <tf.Tensor: shape=(2, 2), dtype=uint16, numpy=
    -  array([[12849,     0],
    -         [    0,     0]], dtype=uint16)>
    -
    -  This will be fixed on a future release of TensorFlow.
    +  If the input value is larger than `fixed_length`, it is truncated:
    +
    +  >>> x=''.join([chr(1), chr(2), chr(3), chr(4)])
    +  >>> tf.io.decode_raw(x, tf.uint16, fixed_length=2)
    +  <tf.Tensor: shape=(1,), dtype=uint16, numpy=array([513], dtype=uint16)>
    +  >>> hex(513)
    +  '0x201'
    +
    +  If `little_endian` and `fixed_length` are specified, truncation to the fixed
    +  length occurs before endianness conversion:
    +
    +  >>> x=''.join([chr(1), chr(2), chr(3), chr(4)])
    +  >>> tf.io.decode_raw(x, tf.uint16, fixed_length=2, little_endian=False)
    +  <tf.Tensor: shape=(1,), dtype=uint16, numpy=array([258], dtype=uint16)>
    +  >>> hex(258)
    +  '0x102'
    +
    +  If input values all have the same length, then specifying `fixed_length`
    +  equal to the size of the strings should not change output:
    +
    +  >>> x = ["12345678", "87654321"]
    +  >>> tf.io.decode_raw(x, tf.int16)
    +  <tf.Tensor: shape=(2, 4), dtype=int16, numpy=
    +  array([[12849, 13363, 13877, 14391],
    +         [14136, 13622, 13108, 12594]], dtype=int16)>
    +  >>> tf.io.decode_raw(x, tf.int16, fixed_length=len(x[0]))
    +  <tf.Tensor: shape=(2, 4), dtype=int16, numpy=
    +  array([[12849, 13363, 13877, 14391],
    +         [14136, 13622, 13108, 12594]], dtype=int16)>
     
       Args:
         input_bytes:
    

Vulnerability mechanics

Generated by null/stub on May 9, 2026. Inputs: CWE entries + fix-commit diffs from this CVE's patches. Citations validated against bundle.

References

7

News mentions

0

No linked articles in our index yet.