VYPR
Low severityNVD Advisory· Published May 14, 2021· Updated Aug 3, 2024

Null pointer dereference in `StringNGrams`

CVE-2021-29541

Description

TensorFlow is an end-to-end open source platform for machine learning. An attacker can trigger a dereference of a null pointer in tf.raw_ops.StringNGrams. This is because the implementation(https://github.com/tensorflow/tensorflow/blob/1cdd4da14282210cc759e468d9781741ac7d01bf/tensorflow/core/kernels/string_ngrams_op.cc#L67-L74) does not fully validate the data_splits argument. This would result in ngrams_data(https://github.com/tensorflow/tensorflow/blob/1cdd4da14282210cc759e468d9781741ac7d01bf/tensorflow/core/kernels/string_ngrams_op.cc#L106-L110) to be a null pointer when the output would be computed to have 0 or negative size. Later writes to the output tensor would then cause a null pointer dereference. The fix will be included in TensorFlow 2.5.0. We will also cherrypick this commit on TensorFlow 2.4.2, TensorFlow 2.3.3, TensorFlow 2.2.3 and TensorFlow 2.1.4, as these are also affected and still in supported range.

Affected packages

Versions sourced from the GitHub Security Advisory.

PackageAffected versionsPatched versions
tensorflowPyPI
< 2.1.42.1.4
tensorflowPyPI
>= 2.2.0, < 2.2.32.2.3
tensorflowPyPI
>= 2.3.0, < 2.3.32.3.3
tensorflowPyPI
>= 2.4.0, < 2.4.22.4.2
tensorflow-cpuPyPI
< 2.1.42.1.4
tensorflow-cpuPyPI
>= 2.2.0, < 2.2.32.2.3
tensorflow-cpuPyPI
>= 2.3.0, < 2.3.32.3.3
tensorflow-cpuPyPI
>= 2.4.0, < 2.4.22.4.2
tensorflow-gpuPyPI
< 2.1.42.1.4
tensorflow-gpuPyPI
>= 2.2.0, < 2.2.32.2.3
tensorflow-gpuPyPI
>= 2.3.0, < 2.3.32.3.3
tensorflow-gpuPyPI
>= 2.4.0, < 2.4.22.4.2

Affected products

1

Patches

1
ba424dd8f16f

Enhance validation of ngram op and handle case of 0 tokens.

https://github.com/tensorflow/tensorflowMihai MaruseacApr 22, 2021via ghsa
2 files changed · +75 11
  • tensorflow/core/kernels/string_ngrams_op.cc+41 11 modified
    @@ -61,16 +61,28 @@ class StringNGramsOp : public tensorflow::OpKernel {
         OP_REQUIRES_OK(context, context->input("data_splits", &splits));
         const auto& splits_vec = splits->flat<SPLITS_TYPE>();
     
    -    // Validate that the splits are valid indices into data
    +    // Validate that the splits are valid indices into data, only if there are
    +    // splits specified.
         const int input_data_size = data->flat<tstring>().size();
         const int splits_vec_size = splits_vec.size();
    -    for (int i = 0; i < splits_vec_size; ++i) {
    -      bool valid_splits = splits_vec(i) >= 0;
    -      valid_splits = valid_splits && (splits_vec(i) <= input_data_size);
    -      OP_REQUIRES(
    -          context, valid_splits,
    -          errors::InvalidArgument("Invalid split value ", splits_vec(i),
    -                                  ", must be in [0,", input_data_size, "]"));
    +    if (splits_vec_size > 0) {
    +      int prev_split = splits_vec(0);
    +      OP_REQUIRES(context, prev_split == 0,
    +                  errors::InvalidArgument("First split value must be 0, got ",
    +                                          prev_split));
    +      for (int i = 1; i < splits_vec_size; ++i) {
    +        bool valid_splits = splits_vec(i) >= prev_split;
    +        valid_splits = valid_splits && (splits_vec(i) <= input_data_size);
    +        OP_REQUIRES(context, valid_splits,
    +                    errors::InvalidArgument(
    +                        "Invalid split value ", splits_vec(i), ", must be in [",
    +                        prev_split, ", ", input_data_size, "]"));
    +        prev_split = splits_vec(i);
    +      }
    +      OP_REQUIRES(context, prev_split == input_data_size,
    +                  errors::InvalidArgument(
    +                      "Last split value must be data size. Expected ",
    +                      input_data_size, ", got ", prev_split));
         }
     
         int num_batch_items = splits_vec.size() - 1;
    @@ -174,13 +186,31 @@ class StringNGramsOp : public tensorflow::OpKernel {
             ngram->append(left_pad_);
             ngram->append(separator_);
           }
    +      // Only output first num_tokens - 1 pairs of data and separator
           for (int n = 0; n < num_tokens - 1; ++n) {
             ngram->append(data[data_start_index + n]);
             ngram->append(separator_);
           }
    -      ngram->append(data[data_start_index + num_tokens - 1]);
    -      for (int n = 0; n < right_padding; ++n) {
    -        ngram->append(separator_);
    +      // Handle case when there are no tokens or no right padding as these can
    +      // result in consecutive separators.
    +      if (num_tokens > 0) {
    +        // If we have tokens, then output last and then pair each separator with
    +        // the right padding that follows, to ensure ngram ends either with the
    +        // token or with the right pad.
    +        ngram->append(data[data_start_index + num_tokens - 1]);
    +        for (int n = 0; n < right_padding; ++n) {
    +          ngram->append(separator_);
    +          ngram->append(right_pad_);
    +        }
    +      } else {
    +        // If we don't have tokens, then the last item inserted into the ngram
    +        // has been the separator from the left padding loop above. Hence,
    +        // output right pad and separator and make sure to finish with a
    +        // padding, not a separator.
    +        for (int n = 0; n < right_padding - 1; ++n) {
    +          ngram->append(right_pad_);
    +          ngram->append(separator_);
    +        }
             ngram->append(right_pad_);
           }
     
    
  • tensorflow/core/kernels/string_ngrams_op_test.cc+34 0 modified
    @@ -542,6 +542,40 @@ TEST_F(NgramKernelTest, TestEmptyInput) {
       assert_int64_equal(expected_splits, *GetOutput(1));
     }
     
    +TEST_F(NgramKernelTest, TestNoTokens) {
    +  MakeOp("|", {3}, "L", "R", -1, false);
    +  // Batch items are:
    +  // 0:
    +  // 1: "a"
    +  AddInputFromArray<tstring>(TensorShape({1}), {"a"});
    +  AddInputFromArray<int64>(TensorShape({3}), {0, 0, 1});
    +  TF_ASSERT_OK(RunOpKernel());
    +
    +  std::vector<tstring> expected_values(
    +      {"L|L|R", "L|R|R",             // no input in first split
    +       "L|L|a", "L|a|R", "a|R|R"});  // second split
    +  std::vector<int64> expected_splits({0, 2, 5});
    +
    +  assert_string_equal(expected_values, *GetOutput(0));
    +  assert_int64_equal(expected_splits, *GetOutput(1));
    +}
    +
    +TEST_F(NgramKernelTest, TestNoTokensNoPad) {
    +  MakeOp("|", {3}, "", "", 0, false);
    +  // Batch items are:
    +  // 0:
    +  // 1: "a"
    +  AddInputFromArray<tstring>(TensorShape({1}), {"a"});
    +  AddInputFromArray<int64>(TensorShape({3}), {0, 0, 1});
    +  TF_ASSERT_OK(RunOpKernel());
    +
    +  std::vector<tstring> expected_values({});
    +  std::vector<int64> expected_splits({0, 0, 0});
    +
    +  assert_string_equal(expected_values, *GetOutput(0));
    +  assert_int64_equal(expected_splits, *GetOutput(1));
    +}
    +
     TEST_F(NgramKernelTest, ShapeFn) {
       ShapeInferenceTestOp op("StringNGrams");
       INFER_OK(op, "?;?", "[?];[?]");
    

Vulnerability mechanics

Generated by null/stub on May 9, 2026. Inputs: CWE entries + fix-commit diffs from this CVE's patches. Citations validated against bundle.

References

7

News mentions

0

No linked articles in our index yet.