Apache Tika PDF parser module: XXE vulnerability in PDFParser's handling of XFA
Description
Critical XXE in Apache Tika (tika-parser-pdf-module) in Apache Tika 1.13 through and including 3.2.1 on all platforms allows an attacker to carry out XML External Entity injection via a crafted XFA file inside of a PDF. An attacker may be able to read sensitive data or trigger malicious requests to internal resources or third-party servers. Note that the tika-parser-pdf-module is used as a dependency in several Tika packages including at least: tika-parsers-standard-modules, tika-parsers-standard-package, tika-app, tika-grpc and tika-server-standard.
Users are recommended to upgrade to version 3.2.2, which fixes this issue.
AI Insight
LLM-synthesized narrative grounded in this CVE's description and references.
Critical XXE vulnerability in Apache Tika's tika-parser-pdf-module allows reading sensitive data via crafted XFA file in PDF; update to 3.2.2.
CVE-2025-54988 is a critical XML External Entity (XXE) injection vulnerability in the Apache Tika PDF parser module (tika-parser-pdf-module). The flaw arises from improper handling of XFA (XML Forms Architecture) files embedded within PDFs, allowing an attacker to define malicious external entities that the parser processes without restriction [4].
Exploitation requires only a crafted PDF containing a malicious XFA file; no authentication is needed if the parser processes user-supplied documents. The XXE can be used to read arbitrary files from the server's filesystem (e.g., configuration files, credentials) or to send HTTP requests to internal resources or third-party servers (SSRF) [4]. The vulnerable module is included in multiple Tika packages—such as tika-parsers-standard-modules, tika-app, and tika-server—widening the potential attack surface.
The impact includes sensitive data exfiltration and potential pivoting to internal networks via SSRF. As a fix, Apache Tika 3.2.2 (released 2025-08-06) resolves the issue [1]. Users on versions 1.13 through 3.2.1 should upgrade immediately; no workarounds are documented.
AI Insight generated on May 19, 2026. Synthesized from this CVE's description and the cited reference URLs; citations are validated against the source bundle.
Affected packages
Versions sourced from the GitHub Security Advisory.
| Package | Affected versions | Patched versions |
|---|---|---|
org.apache.tika:tika-parser-pdf-moduleMaven | >= 1.13, < 3.2.2 | 3.2.2 |
org.apache.tika:tika-parsersMaven | >= 1.13, < 2.0.0-ALPHA | 2.0.0-ALPHA |
Affected products
2- Apache Software Foundation/Apache Tika PDF parser modulev5Range: 1.13
Patches
12b52257304f4TIKA-4459 -- force stream to zip file to handle encrypted od* documents correctly (#2291)
2 files changed · +20 −52
tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-miscoffice-module/src/main/java/org/apache/tika/parser/odf/OpenDocumentParser.java+13 −52 modified@@ -20,16 +20,13 @@ import java.io.IOException; import java.io.InputStream; -import java.util.ArrayList; import java.util.Arrays; import java.util.Collections; import java.util.Enumeration; import java.util.HashSet; -import java.util.List; import java.util.Set; import java.util.zip.ZipEntry; import java.util.zip.ZipFile; -import java.util.zip.ZipInputStream; import org.apache.commons.io.IOUtils; import org.apache.commons.io.input.CloseShieldInputStream; @@ -40,7 +37,6 @@ import org.apache.tika.config.Field; import org.apache.tika.exception.EncryptedDocumentException; import org.apache.tika.exception.TikaException; -import org.apache.tika.exception.WriteLimitReachedException; import org.apache.tika.extractor.EmbeddedDocumentUtil; import org.apache.tika.io.TikaInputStream; import org.apache.tika.metadata.Metadata; @@ -134,21 +130,21 @@ public void parse(InputStream stream, ContentHandler baseHandler, Metadata metad // Open the Zip stream // Use a File if we can, and an already open zip is even better ZipFile zipFile = null; - ZipInputStream zipStream = null; + TikaInputStream tmpTis = null; if (stream instanceof TikaInputStream) { TikaInputStream tis = (TikaInputStream) stream; Object container = ((TikaInputStream) stream).getOpenContainer(); if (container instanceof ZipFile) { zipFile = (ZipFile) container; - } else if (tis.hasFile()) { - zipFile = new ZipFile(tis.getFile()); } else { - zipStream = new ZipInputStream(stream); + zipFile = new ZipFile(tis.getFile()); + tis.setOpenContainer(zipFile); } } else { - zipStream = new ZipInputStream(stream); + tmpTis = TikaInputStream.get(stream); + tmpTis.setOpenContainer(new ZipFile(tmpTis.getFile())); + zipFile = (ZipFile) tmpTis.getOpenContainer(); } - // Prepare to handle the content XHTMLContentHandler xhtml = new XHTMLContentHandler(baseHandler, metadata); xhtml.startDocument(); @@ -157,19 +153,13 @@ public void parse(InputStream stream, ContentHandler baseHandler, Metadata metad EndDocumentShieldingContentHandler handler = new EndDocumentShieldingContentHandler(xhtml); try { - if (zipFile != null) { - try { - handleZipFile(zipFile, metadata, context, handler, embeddedDocumentUtil); - } finally { - //Do we want to close silently == catch an exception here? - zipFile.close(); - } - } else { - try { - handleZipStream(zipStream, metadata, context, handler, embeddedDocumentUtil); - } finally { - //Do we want to close silently == catch an exception here? - zipStream.close(); + try { + handleZipFile(zipFile, metadata, context, handler, embeddedDocumentUtil); + } finally { + //Do we want to close silently == catch an exception here? + if (tmpTis != null) { + //tmpTis handles closing of the open zip container + tmpTis.close(); } } } catch (SAXException e) { @@ -194,35 +184,6 @@ public boolean isExtractMacros() { return extractMacros; } - private void handleZipStream(ZipInputStream zipStream, Metadata metadata, ParseContext context, - EndDocumentShieldingContentHandler handler, - EmbeddedDocumentUtil embeddedDocumentUtil) - throws IOException, TikaException, SAXException { - ZipEntry entry = zipStream.getNextEntry(); - if (entry == null) { - throw new IOException("No entries found in ZipInputStream"); - } - List<SAXException> exceptions = new ArrayList<>(); - do { - try { - handleZipEntry(entry, zipStream, metadata, context, handler, - embeddedDocumentUtil); - } catch (SAXException e) { - WriteLimitReachedException.throwIfWriteLimitReached(e); - if (e.getCause() instanceof EncryptedDocumentException) { - throw (EncryptedDocumentException)e.getCause(); - } else { - exceptions.add(e); - } - } - entry = zipStream.getNextEntry(); - } while (entry != null); - - if (exceptions.size() > 0) { - throw exceptions.get(0); - } - } - private void handleZipFile(ZipFile zipFile, Metadata metadata, ParseContext context, EndDocumentShieldingContentHandler handler, EmbeddedDocumentUtil embeddedDocumentUtil)
tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-miscoffice-module/src/test/java/org/apache/tika/parser/odf/ODFParserTest.java+7 −0 modified@@ -25,6 +25,7 @@ import java.io.IOException; import java.io.InputStream; import java.nio.charset.StandardCharsets; +import java.nio.file.Files; import java.nio.file.Path; import java.nio.file.Paths; import java.util.Arrays; @@ -415,6 +416,12 @@ public void testEncryptedODTFile() throws Exception { getRecursiveMetadata(p, false); }); + assertThrows(EncryptedDocumentException.class, () -> { + try (InputStream is = Files.newInputStream(p)) { + getRecursiveMetadata(is, false); + } + }); + List<Metadata> metadataList = getRecursiveMetadata(p, true); assertEquals("true", metadataList.get(0).get(TikaCoreProperties.IS_ENCRYPTED)); }
Vulnerability mechanics
Generated on May 9, 2026. Inputs: CWE entries + fix-commit diffs from this CVE's patches. Citations validated against bundle.
References
11- github.com/advisories/GHSA-p72g-pv48-7w9xghsaADVISORY
- lists.apache.org/thread/8xn3rqy6kz5b3l1t83kcofkw0w4mmj1wghsavendor-advisoryWEB
- nvd.nist.gov/vuln/detail/CVE-2025-54988ghsaADVISORY
- www.openwall.com/lists/oss-security/2025/08/20/2ghsaWEB
- www.openwall.com/lists/oss-security/2025/08/20/3ghsaWEB
- archive.apache.org/dist/tika/3.2.2/CHANGES-3.2.2.txtghsaWEB
- github.com/apache/tika/commit/2b52257304f4d3cde2b8463657380bdb936d9ef2ghsaWEB
- github.com/apache/tika/pull/2291ghsaWEB
- issues.apache.org/jira/browse/TIKA-4459ghsaWEB
- lists.apache.org/thread/stn9oh7rfn9yv76n1srxr9w56oy04p72ghsaWEB
- lists.debian.org/debian-lts-announce/2025/10/msg00030.htmlghsaWEB
News mentions
0No linked articles in our index yet.