Skip to content

Unable to read PDF files larger than ~2 GB due to integer overflow in RandomAccessFileOrArray #1582

Description

@hanschua

Describe the bug

Reading a 5 GB PDF fails due to an integer overflow in the file pointer integer cast that causes an InvalidPdfException.

To Reproduce

Code to reproduce the issue. I am unable to attach the 5GB file full of images but can provide if necessary.

  1. Sample Code
// forceRead=false, plainRandomAccess=false
RandomAccessFileOrArray f = new RandomAccessFileOrArray("5GB.pdf", false, false);
try (PdfReader reader = new PdfReader(f)) {
}
  1. Error Encountered
Exception in thread "main" org.openpdf.text.exceptions.InvalidPdfException: Rebuild failed: Position out of bounds; Original message: PDF startxref not found.
	at org.openpdf.text.pdf.PdfReader.readPdfPartial(PdfReader.java:1382)
	at org.openpdf.text.pdf.PdfReader.<init>(PdfReader.java:285)

Expected behavior

Able to read the 5GB PDF file.

Screenshots

  1. Problematic code in RandomAccessFileOrArray. The rf.getFilePointer() returns a long but is casted to int.
    https://github.com/LibrePDF/OpenPDF/blob/master/openpdf-core/src/main/java/org/openpdf/text/pdf/RandomAccessFileOrArray.java#L350
    public int getFilePointer() throws IOException {
        insureOpen();
        int n = isBack ? 1 : 0;
        if (arrayIn == null) {
            return (int) (plainRandomAccess ? trf.getFilePointer() : rf.getFilePointer()) - n - startOffset;
        } else {
            return arrayInPtr - n - startOffset;
        }
    }
  2. Integer Overflow
    Image
  3. Stacktrace
    Image

System

(please complete the following information)

  • OS: Windows
  • Used font: Default
  • OpenPDF version: 3.0.3

Additional context

Attempt via RandomAccessFileOrArray with InputStream

new PdfReader(new FileInputStream("5GB.pdf"));

Error (Expected due to byte array limit)

Exception in thread "main" java.lang.OutOfMemoryError: Required array length 2147483639 + 9 is too large
	at java.base/jdk.internal.util.ArraysSupport.hugeLength(ArraysSupport.java:914)
	at java.base/jdk.internal.util.ArraysSupport.newLength(ArraysSupport.java:907)
	at java.base/java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:100)
	at java.base/java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:132)
	at org.openpdf.text.pdf.RandomAccessFileOrArray.InputStreamToArray(RandomAccessFileOrArray.java:172)
	at org.openpdf.text.pdf.RandomAccessFileOrArray.<init>(RandomAccessFileOrArray.java:150)
	at org.openpdf.text.pdf.PdfReader.<init>(PdfReader.java:257)

Attempt via RandomAccessFileOrArray with forceRead = true, plainRandomAccess = true

Error (Expected due to byte array limit)

Exception in thread "main" java.lang.OutOfMemoryError: Required array length 2147483639 + 9 is too large
	at java.base/jdk.internal.util.ArraysSupport.hugeLength(ArraysSupport.java:914)
	at java.base/jdk.internal.util.ArraysSupport.newLength(ArraysSupport.java:907)
	at java.base/java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:100)
	at java.base/java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:132)
	at org.openpdf.text.pdf.RandomAccessFileOrArray.InputStreamToArray(RandomAccessFileOrArray.java:172)
	at org.openpdf.text.pdf.RandomAccessFileOrArray.<init>(RandomAccessFileOrArray.java:131)

Attempt via RandomAccessFileOrArray with forceRead = false, plainRandomAccess = true

Error (Took 59 minutes. Same as the plainRandomAccess = fase)

Exception in thread "main" org.openpdf.text.exceptions.InvalidPdfException: Rebuild failed: Position out of bounds; Original message: PDF startxref not found.
	at org.openpdf.text.pdf.PdfReader.readPdfPartial(PdfReader.java:1382)
	at org.openpdf.text.pdf.PdfReader.<init>(PdfReader.java:285)

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions