Null pointer dereference in `StringNGrams`
Description
TensorFlow is an end-to-end open source platform for machine learning. An attacker can trigger a dereference of a null pointer in tf.raw_ops.StringNGrams. This is because the implementation(https://github.com/tensorflow/tensorflow/blob/1cdd4da14282210cc759e468d9781741ac7d01bf/tensorflow/core/kernels/string_ngrams_op.cc#L67-L74) does not fully validate the data_splits argument. This would result in ngrams_data(https://github.com/tensorflow/tensorflow/blob/1cdd4da14282210cc759e468d9781741ac7d01bf/tensorflow/core/kernels/string_ngrams_op.cc#L106-L110) to be a null pointer when the output would be computed to have 0 or negative size. Later writes to the output tensor would then cause a null pointer dereference. The fix will be included in TensorFlow 2.5.0. We will also cherrypick this commit on TensorFlow 2.4.2, TensorFlow 2.3.3, TensorFlow 2.2.3 and TensorFlow 2.1.4, as these are also affected and still in supported range.
Affected packages
Versions sourced from the GitHub Security Advisory.
| Package | Affected versions | Patched versions |
|---|---|---|
tensorflowPyPI | < 2.1.4 | 2.1.4 |
tensorflowPyPI | >= 2.2.0, < 2.2.3 | 2.2.3 |
tensorflowPyPI | >= 2.3.0, < 2.3.3 | 2.3.3 |
tensorflowPyPI | >= 2.4.0, < 2.4.2 | 2.4.2 |
tensorflow-cpuPyPI | < 2.1.4 | 2.1.4 |
tensorflow-cpuPyPI | >= 2.2.0, < 2.2.3 | 2.2.3 |
tensorflow-cpuPyPI | >= 2.3.0, < 2.3.3 | 2.3.3 |
tensorflow-cpuPyPI | >= 2.4.0, < 2.4.2 | 2.4.2 |
tensorflow-gpuPyPI | < 2.1.4 | 2.1.4 |
tensorflow-gpuPyPI | >= 2.2.0, < 2.2.3 | 2.2.3 |
tensorflow-gpuPyPI | >= 2.3.0, < 2.3.3 | 2.3.3 |
tensorflow-gpuPyPI | >= 2.4.0, < 2.4.2 | 2.4.2 |
Affected products
1- Range: < 2.1.4
Patches
1ba424dd8f16fEnhance validation of ngram op and handle case of 0 tokens.
2 files changed · +75 −11
tensorflow/core/kernels/string_ngrams_op.cc+41 −11 modified@@ -61,16 +61,28 @@ class StringNGramsOp : public tensorflow::OpKernel { OP_REQUIRES_OK(context, context->input("data_splits", &splits)); const auto& splits_vec = splits->flat<SPLITS_TYPE>(); - // Validate that the splits are valid indices into data + // Validate that the splits are valid indices into data, only if there are + // splits specified. const int input_data_size = data->flat<tstring>().size(); const int splits_vec_size = splits_vec.size(); - for (int i = 0; i < splits_vec_size; ++i) { - bool valid_splits = splits_vec(i) >= 0; - valid_splits = valid_splits && (splits_vec(i) <= input_data_size); - OP_REQUIRES( - context, valid_splits, - errors::InvalidArgument("Invalid split value ", splits_vec(i), - ", must be in [0,", input_data_size, "]")); + if (splits_vec_size > 0) { + int prev_split = splits_vec(0); + OP_REQUIRES(context, prev_split == 0, + errors::InvalidArgument("First split value must be 0, got ", + prev_split)); + for (int i = 1; i < splits_vec_size; ++i) { + bool valid_splits = splits_vec(i) >= prev_split; + valid_splits = valid_splits && (splits_vec(i) <= input_data_size); + OP_REQUIRES(context, valid_splits, + errors::InvalidArgument( + "Invalid split value ", splits_vec(i), ", must be in [", + prev_split, ", ", input_data_size, "]")); + prev_split = splits_vec(i); + } + OP_REQUIRES(context, prev_split == input_data_size, + errors::InvalidArgument( + "Last split value must be data size. Expected ", + input_data_size, ", got ", prev_split)); } int num_batch_items = splits_vec.size() - 1; @@ -174,13 +186,31 @@ class StringNGramsOp : public tensorflow::OpKernel { ngram->append(left_pad_); ngram->append(separator_); } + // Only output first num_tokens - 1 pairs of data and separator for (int n = 0; n < num_tokens - 1; ++n) { ngram->append(data[data_start_index + n]); ngram->append(separator_); } - ngram->append(data[data_start_index + num_tokens - 1]); - for (int n = 0; n < right_padding; ++n) { - ngram->append(separator_); + // Handle case when there are no tokens or no right padding as these can + // result in consecutive separators. + if (num_tokens > 0) { + // If we have tokens, then output last and then pair each separator with + // the right padding that follows, to ensure ngram ends either with the + // token or with the right pad. + ngram->append(data[data_start_index + num_tokens - 1]); + for (int n = 0; n < right_padding; ++n) { + ngram->append(separator_); + ngram->append(right_pad_); + } + } else { + // If we don't have tokens, then the last item inserted into the ngram + // has been the separator from the left padding loop above. Hence, + // output right pad and separator and make sure to finish with a + // padding, not a separator. + for (int n = 0; n < right_padding - 1; ++n) { + ngram->append(right_pad_); + ngram->append(separator_); + } ngram->append(right_pad_); }
tensorflow/core/kernels/string_ngrams_op_test.cc+34 −0 modified@@ -542,6 +542,40 @@ TEST_F(NgramKernelTest, TestEmptyInput) { assert_int64_equal(expected_splits, *GetOutput(1)); } +TEST_F(NgramKernelTest, TestNoTokens) { + MakeOp("|", {3}, "L", "R", -1, false); + // Batch items are: + // 0: + // 1: "a" + AddInputFromArray<tstring>(TensorShape({1}), {"a"}); + AddInputFromArray<int64>(TensorShape({3}), {0, 0, 1}); + TF_ASSERT_OK(RunOpKernel()); + + std::vector<tstring> expected_values( + {"L|L|R", "L|R|R", // no input in first split + "L|L|a", "L|a|R", "a|R|R"}); // second split + std::vector<int64> expected_splits({0, 2, 5}); + + assert_string_equal(expected_values, *GetOutput(0)); + assert_int64_equal(expected_splits, *GetOutput(1)); +} + +TEST_F(NgramKernelTest, TestNoTokensNoPad) { + MakeOp("|", {3}, "", "", 0, false); + // Batch items are: + // 0: + // 1: "a" + AddInputFromArray<tstring>(TensorShape({1}), {"a"}); + AddInputFromArray<int64>(TensorShape({3}), {0, 0, 1}); + TF_ASSERT_OK(RunOpKernel()); + + std::vector<tstring> expected_values({}); + std::vector<int64> expected_splits({0, 0, 0}); + + assert_string_equal(expected_values, *GetOutput(0)); + assert_int64_equal(expected_splits, *GetOutput(1)); +} + TEST_F(NgramKernelTest, ShapeFn) { ShapeInferenceTestOp op("StringNGrams"); INFER_OK(op, "?;?", "[?];[?]");
Vulnerability mechanics
Generated by null/stub on May 9, 2026. Inputs: CWE entries + fix-commit diffs from this CVE's patches. Citations validated against bundle.
References
7- github.com/advisories/GHSA-xqfj-35wv-m3crghsaADVISORY
- nvd.nist.gov/vuln/detail/CVE-2021-29541ghsaADVISORY
- github.com/pypa/advisory-database/tree/main/vulns/tensorflow-cpu/PYSEC-2021-469.yamlghsaWEB
- github.com/pypa/advisory-database/tree/main/vulns/tensorflow-gpu/PYSEC-2021-667.yamlghsaWEB
- github.com/pypa/advisory-database/tree/main/vulns/tensorflow/PYSEC-2021-178.yamlghsaWEB
- github.com/tensorflow/tensorflow/commit/ba424dd8f16f7110eea526a8086f1a155f14f22bghsax_refsource_MISCWEB
- github.com/tensorflow/tensorflow/security/advisories/GHSA-xqfj-35wv-m3crghsax_refsource_CONFIRMWEB
News mentions
0No linked articles in our index yet.