VYPR
Critical severityNVD Advisory· Published Feb 4, 2026· Updated Feb 4, 2026

Unstructured has Path Traversal via Malicious MSG Attachment that Allows Arbitrary File Write

CVE-2025-64712

Description

The unstructured library provides open-source components for ingesting and pre-processing images and text documents, such as PDFs, HTML, Word docs, and many more. Prior to version 0.18.18, a path traversal vulnerability in the partition_msg function allows an attacker to write or overwrite arbitrary files on the filesystem when processing malicious MSG files with attachments. This issue has been patched in version 0.18.18.

Affected packages

Versions sourced from the GitHub Security Advisory.

PackageAffected versionsPatched versions
unstructuredPyPI
< 0.18.180.18.18

Affected products

1

Patches

1
b01d35b2373f

fix: sanitize MSG attachment filenames to prevent path traversal (GHS… (#4117)

4 files changed · +168 2
  • CHANGELOG.md+4 0 modified
    @@ -1,3 +1,7 @@
    +## 0.18.18
    +
    +### Fixes
    +- **Prevent path traversal in email MSG attachment filenames** Fixed a security vulnerability (GHSA-gm8q-m8mv-jj5m) where malicious attachment filenames containing path traversal sequences could write files outside the intended directory. The fix normalizes both Unix and Windows path separators before sanitizing filenames, preventing cross-platform path traversal attacks in `partition_msg` functions
     ## 0.18.17
     
     ### Enhancement
    
  • test_unstructured/partition/test_msg.py+147 0 modified
    @@ -309,6 +309,153 @@ def test_partition_msg_raises_TypeError_for_invalid_languages():
     # ================================================================================================
     
     
    +class DescribeMsgAttachmentFilenameSanitization:
    +    """Unit-test suite for filename sanitization in MSG attachments (GHSA-gm8q-m8mv-jj5m)."""
    +
    +    def it_sanitizes_path_traversal_attempts(self, request: FixtureRequest):
    +        from unstructured.partition.msg import _AttachmentPartitioner
    +
    +        attachment = Mock()
    +        attachment.file_name = "../../../etc/passwd"
    +        attachment.file_bytes = b"malicious content"
    +        attachment.last_modified = None
    +
    +        opts = Mock()
    +        opts.metadata_last_modified = None
    +
    +        partitioner = _AttachmentPartitioner(attachment, opts)
    +
    +        assert partitioner._attachment_file_name == "passwd"
    +
    +    def it_sanitizes_absolute_unix_paths(self, request: FixtureRequest):
    +        from unstructured.partition.msg import _AttachmentPartitioner
    +
    +        attachment = Mock()
    +        attachment.file_name = "/etc/passwd"
    +        attachment.file_bytes = b"malicious content"
    +        attachment.last_modified = None
    +
    +        opts = Mock()
    +        opts.metadata_last_modified = None
    +
    +        partitioner = _AttachmentPartitioner(attachment, opts)
    +
    +        assert partitioner._attachment_file_name == "passwd"
    +
    +    def it_sanitizes_absolute_windows_paths(self, request: FixtureRequest):
    +        from unstructured.partition.msg import _AttachmentPartitioner
    +
    +        attachment = Mock()
    +        attachment.file_name = "C:\\Windows\\System32\\config\\sam"
    +        attachment.file_bytes = b"malicious content"
    +        attachment.last_modified = None
    +
    +        opts = Mock()
    +        opts.metadata_last_modified = None
    +
    +        partitioner = _AttachmentPartitioner(attachment, opts)
    +
    +        assert partitioner._attachment_file_name == "sam"
    +
    +    def it_removes_null_bytes_from_filenames(self, request: FixtureRequest):
    +        from unstructured.partition.msg import _AttachmentPartitioner
    +
    +        attachment = Mock()
    +        attachment.file_name = "file\x00.txt"
    +        attachment.file_bytes = b"content"
    +        attachment.last_modified = None
    +
    +        opts = Mock()
    +        opts.metadata_last_modified = None
    +
    +        partitioner = _AttachmentPartitioner(attachment, opts)
    +
    +        assert partitioner._attachment_file_name == "file.txt"
    +        assert "\x00" not in partitioner._attachment_file_name
    +
    +    def it_handles_dot_and_dotdot_filenames(self, request: FixtureRequest):
    +        from unstructured.partition.msg import _AttachmentPartitioner
    +
    +        opts = Mock()
    +        opts.metadata_last_modified = None
    +
    +        # Test single dot
    +        attachment1 = Mock()
    +        attachment1.file_name = "."
    +        attachment1.file_bytes = b"content"
    +        attachment1.last_modified = None
    +        partitioner1 = _AttachmentPartitioner(attachment1, opts)
    +        assert partitioner1._attachment_file_name == "unknown"
    +
    +        # Test double dot
    +        attachment2 = Mock()
    +        attachment2.file_name = ".."
    +        attachment2.file_bytes = b"content"
    +        attachment2.last_modified = None
    +        partitioner2 = _AttachmentPartitioner(attachment2, opts)
    +        assert partitioner2._attachment_file_name == "unknown"
    +
    +    def it_handles_missing_filename(self, request: FixtureRequest):
    +        from unstructured.partition.msg import _AttachmentPartitioner
    +
    +        attachment = Mock()
    +        attachment.file_name = None
    +        attachment.file_bytes = b"content"
    +        attachment.last_modified = None
    +
    +        opts = Mock()
    +        opts.metadata_last_modified = None
    +
    +        partitioner = _AttachmentPartitioner(attachment, opts)
    +
    +        assert partitioner._attachment_file_name == "unknown"
    +
    +    def it_allows_valid_filenames_through(self, request: FixtureRequest):
    +        from unstructured.partition.msg import _AttachmentPartitioner
    +
    +        attachment = Mock()
    +        attachment.file_name = "document.pdf"
    +        attachment.file_bytes = b"content"
    +        attachment.last_modified = None
    +
    +        opts = Mock()
    +        opts.metadata_last_modified = None
    +
    +        partitioner = _AttachmentPartitioner(attachment, opts)
    +
    +        assert partitioner._attachment_file_name == "document.pdf"
    +
    +    def it_handles_complex_path_traversal_with_mixed_separators(self, request: FixtureRequest):
    +        from unstructured.partition.msg import _AttachmentPartitioner
    +
    +        attachment = Mock()
    +        attachment.file_name = "..\\../\\..\\etc/passwd"
    +        attachment.file_bytes = b"malicious content"
    +        attachment.last_modified = None
    +
    +        opts = Mock()
    +        opts.metadata_last_modified = None
    +
    +        partitioner = _AttachmentPartitioner(attachment, opts)
    +
    +        assert partitioner._attachment_file_name == "passwd"
    +
    +    def it_handles_empty_string_filename(self, request: FixtureRequest):
    +        from unstructured.partition.msg import _AttachmentPartitioner
    +
    +        attachment = Mock()
    +        attachment.file_name = ""
    +        attachment.file_bytes = b"content"
    +        attachment.last_modified = None
    +
    +        opts = Mock()
    +        opts.metadata_last_modified = None
    +
    +        partitioner = _AttachmentPartitioner(attachment, opts)
    +
    +        assert partitioner._attachment_file_name == "unknown"
    +
    +
     class DescribeMsgPartitionerOptions:
         """Unit-test suite for `unstructured.partition.msg.MsgPartitionerOptions` objects."""
     
    
  • unstructured/partition/msg.py+16 1 modified
    @@ -279,8 +279,23 @@ def _attachment_file_name(self) -> str:
             """The original name of the attached file, no path.
     
             This value is 'unknown' if it is not present in the MSG file (not expected).
    +        The filename is sanitized to prevent path traversal attacks.
             """
    -        return self._attachment.file_name or "unknown"
    +        raw_filename = self._attachment.file_name or "unknown"
    +
    +        # Sanitize the filename to prevent path traversal attacks
    +        # Remove any path components for both Unix and Windows paths
    +        # Use both separators to handle cross-platform attacks
    +        safe_filename = os.path.basename(raw_filename.replace("\\", "/"))
    +
    +        # Remove null bytes and other control characters
    +        safe_filename = safe_filename.replace("\0", "")
    +
    +        # If the filename becomes empty after sanitization, use a default
    +        if not safe_filename or safe_filename in (".", ".."):
    +            safe_filename = "unknown"
    +
    +        return safe_filename
     
         @lazyproperty
         def _attachment_last_modified(self) -> str | None:
    
  • unstructured/__version__.py+1 1 modified
    @@ -1 +1 @@
    -__version__ = "0.18.17"  # pragma: no cover
    +__version__ = "0.18.18"  # pragma: no cover
    

Vulnerability mechanics

Generated by null/stub on May 9, 2026. Inputs: CWE entries + fix-commit diffs from this CVE's patches. Citations validated against bundle.

References

4

News mentions

0

No linked articles in our index yet.