Where To Now?

Ring Us Now on
+613 (03) 5798 1464

We are available 24/7 to take your call or perform trouble-shooting tasks.

   

Use PDF::Extract (A Perl Module)

03 5798 1464

PDF::Extract's Home Page

Version 3.01 is now available

Extracting and serving dynamically selected portions of a 24 page PDF Document is a snap with our perl PDF::Extract module.

Up to 24 pages to download

Dot2dot (a PDF Document)
No PDF Document (raise an error)

PDF Extract is a group of methods that allow the user to quickly grab pages as a new PDF document from a pre-existing PDF document.

PDF::Extract can create a new PDF document that can be:-

  • Assigned to a scalar variable with getPDFExtract.
  • Saved to disk with savePDFExtract.
  • Printed to STDOUT as a PDF web document with servePDFExtract.
  • Cache served with fastServePDFExtract.
These four main methods can be called with or without arguments. The methods will not work unless they know the location of the original PDF document and the pages to extract. There are no default values.

This perl module is platform independent.
It has no dependant modules.

Download the latest package 3.01 HERE
You can read the pod documentation HERE

The official PDF::Extract Forum is HERE
You can contact me HERE

A more realistic demo of the module can be found HERE

This is the perl code that interfaces these demo web forms to the PDF::Extract module:-

use CGI;
$cgi=new CGI;
$0=~s/.\w+\.pl$//;  # get current path

use PDF::Extract;
$pdf = new PDF::Extract(   
    PDFErrorPage =>"$0/PDFError.html",
    PDFDoc =>"$0/" . $cgi->param("PDFDoc"),
    PDFPages => join( " ", $cgi->param("PDFPages"))
);
$pdf->servePDFExtract;
     

BUGS

There is a bug that Jon Schaeffer reported that had to do with some font resources not being found in the extracted PDF. The source of the bug has, as yet, not been found. If you find such a bug can you email a one page original pdf that can produce a PDF extract that has this bug.
Please report any bugs you find.

NOTES

This version of PDF::Extract has been designed to produce output to the PDF Standard as defined in the PDF Reference Seventh Edition.

However some third party PDF applications require a non standard feature of PDF documents. Namely: The sequential numbering of objects starting at zero.

PDF::Extract treats a PDF file as a flat file, for speed of processing, and consequently knows nothing of PDF objects. Objects extracted remain exactly as they were in the original document. These objects are not renumbered. There will be gaps in the object number sequence. This is allowed in the specification. Only the catalog and page tree objects are altered.

Chris Gamache sent this workaround in:
Using the release version of PDF::Extract I can perform, without error, this normalization routine...

sub normalize_pdf {
    my $sPDF = shift;
    my $pdf = PDF::API2->openScalar ($sPDF);
    $sPDF = $pdf->stringify;
    $pdf->end;
    return($sPDF);
}
     

The Adobe PDF Reader plugin is available CLICK HERE