VYPR
High severity7.5NVD Advisory· Published Mar 9, 2026· Updated Apr 17, 2026

CVE-2026-0846

CVE-2026-0846

Description

A vulnerability in the filestring() function of the nltk.util module in nltk version 3.9.2 allows arbitrary file read due to improper validation of input paths. The function directly opens files specified by user input without sanitization, enabling attackers to access sensitive system files by providing absolute paths or traversal paths. This vulnerability can be exploited locally or remotely, particularly in scenarios where the function is used in web APIs or other interfaces that accept user-supplied input.

Affected packages

Versions sourced from the GitHub Security Advisory.

PackageAffected versionsPatched versions
nltkPyPI
< 3.9.33.9.3

Affected products

1
  • cpe:2.3:a:nltk:nltk:3.9.2:*:*:*:*:*:*:*

Patches

1
b2e1164bf892

Merge pull request #3485 from HyperPS/fix-filestring-sandbox-update

https://github.com/nltk/nltkSteven BirdJan 9, 2026via ghsa
2 files changed · +125 5
  • nltk/test/test_filestring_sandbox.py+72 0 added
    @@ -0,0 +1,72 @@
    +import io
    +import os
    +
    +import pytest
    +
    +from nltk.util import filestring
    +
    +
    +def test_reads_allowed_file(tmp_path):
    +    """filestring should read files inside allowed_dir"""
    +    allowed_dir = tmp_path / "allowed"
    +    allowed_dir.mkdir()
    +
    +    f = allowed_dir / "example.txt"
    +    f.write_text("hello world")
    +
    +    output = filestring(str(f), allowed_dir=str(allowed_dir))
    +    assert output == "hello world"
    +
    +
    +def test_rejects_parent_traversal(tmp_path):
    +    """filestring should block ../ traversal attempts"""
    +    allowed = tmp_path / "allowed"
    +    allowed.mkdir()
    +
    +    secret = tmp_path / "secret.txt"
    +    secret.write_text("topsecret")
    +
    +    # simulate ../ traversal
    +    traversal_path = str(allowed / ".." / "secret.txt")
    +
    +    with pytest.raises(PermissionError):
    +        filestring(traversal_path, allowed_dir=str(allowed))
    +
    +
    +def test_rejects_symlink_escape(tmp_path):
    +    """filestring should block symlink pointing outside allowed_dir"""
    +    allowed = tmp_path / "allowed"
    +    allowed.mkdir()
    +
    +    outside = tmp_path / "outside.txt"
    +    outside.write_text("hidden-data")
    +
    +    link = allowed / "link.txt"
    +
    +    # On Windows, symlink creation may require admin — skip cleanly if not allowed
    +    try:
    +        link.symlink_to(outside)
    +    except (OSError, NotImplementedError):
    +        pytest.skip("Symlink creation not supported on this platform")
    +
    +    with pytest.raises(PermissionError):
    +        filestring(str(link), allowed_dir=str(allowed))
    +
    +
    +def test_preserves_file_like_objects():
    +    """filestring should maintain legacy behavior for stream-like objects"""
    +    stream = io.StringIO("stream-data")
    +    assert filestring(stream) == "stream-data"
    +
    +
    +def test_encoding_fallback(tmp_path):
    +    """filestring should tolerate decoding errors when reading files"""
    +    allowed = tmp_path / "allowed"
    +    allowed.mkdir()
    +
    +    f = allowed / "latin1.txt"
    +    f.write_bytes(b"caf\xe9")  # invalid UTF-8 sequence
    +
    +    output = filestring(str(f), allowed_dir=str(allowed))
    +    assert isinstance(output, str)
    +    assert "caf" in output  # partial decode allowed via errors="ignore"
    
  • nltk/util.py+53 5 modified
    @@ -216,14 +216,62 @@ def re_show(regexp, string, left="{", right="}"):
     
     
     # recipe from David Mertz
    -def filestring(f):
    +def filestring(f, allowed_dir=None):
    +    """
    +    Read a file path or file-like object into a string.
    +
    +    Security (opt-in):
    +    - If `allowed_dir` is provided, enforce sandbox restrictions:
    +        * Resolve realpath()
    +        * Prevent ../ traversal
    +        * Prevent symlink escape
    +    - If `allowed_dir` is None, old behavior is preserved (for backward compatibility).
    +
    +    Notes:
    +    - File-like objects (`.read()`) are always allowed.
    +    - TOCTOU race conditions cannot be fully eliminated if an attacker can modify
    +      the filesystem concurrently, though realpath() and commonpath() reduce common bypasses.
    +    """
    +
    +    # file-like object: preserve legacy behavior
         if hasattr(f, "read"):
             return f.read()
    -    elif isinstance(f, str):
    -        with open(f) as infile:
    +
    +    # path input
    +    if isinstance(f, str):
    +        # sandbox mode enabled only when allowed_dir provided
    +        if allowed_dir is not None:
    +            base = os.path.realpath(os.path.abspath(allowed_dir))
    +
    +            # ensure allowed_dir exists and is a directory
    +            if not os.path.isdir(base):
    +                raise ValueError(
    +                    f"allowed_dir must be an existing directory: {allowed_dir!r}"
    +                )
    +
    +            full = os.path.realpath(os.path.abspath(f))
    +
    +            # robust "is inside" check using commonpath; handle cross-drive case
    +            try:
    +                inside = os.path.commonpath([base, full]) == base
    +            except ValueError:
    +                # different drives (Windows) -> not inside
    +                inside = False
    +
    +            if not inside:
    +                raise PermissionError(
    +                    f"Access blocked: '{full}' is outside allowed_dir '{base}'"
    +                )
    +
    +            # safe read with UTF-8-first fallback
    +            with open(full, encoding="utf-8", errors="ignore") as infile:
    +                return infile.read()
    +
    +        # no sandbox: legacy behavior (backward compatible)
    +        with open(f, encoding="utf-8", errors="ignore") as infile:
                 return infile.read()
    -    else:
    -        raise ValueError("Must be called with a filename or file-like object")
    +
    +    raise ValueError("filestring() expects a filename or a file-like object")
     
     
     ##########################################################################
    

Vulnerability mechanics

Generated by null/stub on May 9, 2026. Inputs: CWE entries + fix-commit diffs from this CVE's patches. Citations validated against bundle.

References

5

News mentions

0

No linked articles in our index yet.