Generate, Parse, and Convert Documents in PHP with Paperdoc

Paperdoc is a PHP library by Zerarka Mohamed Ali Akram for generating, parsing, and converting documents across multiple file formats through a single unified API. Rather than juggling several format-specific packages, Paperdoc gives you one consistent interface for everything from PDFs to spreadsheets. It handles both modern formats (PDF, DOCX, XLSX, PPTX, HTML, CSV, Markdown) and legacy formats (DOC, XLS, PPT), and all of them work bidirectionally — you can both parse existing files and generate new ones in any of them.

Main Features

Document Generation — Create new documents from scratch in any supported format. You work with a structured document model (sections, paragraphs, tables, etc.) that maps cleanly to whichever output format you choose:

use Paperdoc\Support\DocumentManager;
use Paperdoc\Document\Style\TextStyle;
 
$doc = DocumentManager::create('pdf', 'My PaperDoc Demo Document');
 
$boldStyle = TextStyle::make()
            ->setBold()
            ->setColor('#f9332b')
            ->setFontSize(14);
 
$section = $doc->openSection();
$section->addHeading('Generate, Parse, and Convert Documents in PHP with Paperdoc');
$section->addParagraph('Paperdoc is a PHP library by Zerarka Mohamed Ali Akram for generating, parsing, and converting documents across multiple file formats through a single unified API.');
$section->addParagraph(
    'Convert PDF, HTML, CSV, DOCX, XLSX, PPTX, Markdown and more.',
    $boldStyle
);
 
DocumentManager::save($doc, 'output/paperdoc.pdf');

Parsing — Load existing files into a normalized in-memory model. This gives you a consistent data structure to work with regardless of the source format:

$doc = DocumentManager::open('paperdoc.pdf');
 
foreach ($doc->getSections() as $section) {
    foreach ($section->getParagraphs() as $paragraph) {
        echo $paragraph->getText();
    }
}

Format Conversion — Transform documents between any two supported formats in a single call:

DocumentManager::convert('paperdoc.pdf', 'paperdoc.docx', 'docx');

Rendering to string — Rather than writing to disk, you can render a document directly to a string, which is useful for returning responses or piping output:

$html = DocumentManager::renderAs($doc, 'html');

OCR Processing — Extract text from scanned documents using Tesseract, making it possible to work with image-based or non-machine-readable PDFs. You'll need the Tesseract binary installed on your system and the path configured in config/paperdoc.php.

AI Augmentation — An optional Neuron AI integration adds LLM-powered capabilities such as document summarization, translation, and structured data extraction.

Thumbnails — Generate preview images of any document via thumbnail() or thumbnailDataUri() for inline use. Note that high-quality rendering requires third-party binaries depending on the format — LibreOffice for Office files, and Imagick or Ghostscript for PDFs.

Batch Processing — Open and process multiple documents in a single call:

$docs = DocumentManager::openBatch([
    'file1.pdf',
    'file2.docx',
    'file3.xlsx',
]);

Laravel Integration

Paperdoc ships with first-party Laravel support including a ServiceProvider, Facade, and Artisan commands, targeting Laravel 11+. The Facade is registered automatically via package auto-discovery, giving you a clean Paperdoc interface:

use Paperdoc\Facades\Paperdoc;
 
$doc = Paperdoc::create('md', 'Paperdoc + Laravel');
$doc->openSection()
    ->addParagraph('Paperdoc also works great with Laravel!');
 
Paperdoc::save($doc, storage_path('app/private/articles/paperdoc-and-laravel.pdf'));
 
Paperdoc::convert('articles/paperdoc-and-laravel.md', 'articles/paperdoc-laravel.pdf', 'pdf');
 
$html = Paperdoc::renderAs($doc, 'html');
 
$docs = Paperdoc::openBatch([
    'file1.pdf',
    'file2.docx',
    'file3.xlsx',
]);