PDFlib PDFlib TET – PDF Text & Image Extraction Engine

PDFlib TET – PDF Text & Image Extraction Engine

Extract text, images, metadata and structured content from PDFs with exceptional accuracy.

PDFlib Text and Image Extraction Toolkit (TET) is a professional, high-performance library that enables developers to extract meaningful content from PDF documents. Whether you're building search systems, content pipelines, AI/ML data processing workflows, or automated document analysis tools, TET provides the precision and control needed for reliable PDF extraction at scale.

Greatstone Software is an authorised UK distributor of TET, offering competitive pricing, expert product guidance, and fast licence delivery for all deployment environments.

Buy now from £970.00 Free Trial

PDFlib TET – PDF Text & Image Extraction Engine PDFlib TET 6.0 IBM AIX License type: License type: 1 - 4 licences. Email [email protected] for up to 60% volume discount

£6,345.00

PDFlib TET – PDF Text & Image Extraction Engine PDFlib TET 6.0 IBM i5/iSeries License type: License type: 1 - 4 licences. Email [email protected] for up to 60% volume discount

£6,345.00

PDFlib TET – PDF Text & Image Extraction Engine PDFlib TET 6.0 Linux License type: License type: 1 - 4 licences. Email [email protected] for up to 60% volume discount

£2,775.00

PDFlib TET – PDF Text & Image Extraction Engine PDFlib TET 6.0 OS X Desktop License type: License type: 1 - 4 licences. Email [email protected] for up to 60% volume discount

£970.00

PDFlib TET – PDF Text & Image Extraction Engine PDFlib TET 6.0 Windows Desktop License type: License type: 1 - 4 licences. Email [email protected] for up to 60% volume discount

£970.00

PDFlib TET – PDF Text & Image Extraction Engine PDFlib TET 6.0 Windows Server License type: License type: 1 - 4 licences. Email [email protected] for up to 60% volume discount

£2,775.00

Overview

PDFlib TET is a specialised extraction toolkit designed to read and interpret the internal structure of PDF files, making it possible to retrieve:

Text in any language, script or encoding
Embedded images and graphics
Content position, geometry and structural relationships
Document metadata, XMP data and page information
Reading order and logical layout where available

Unlike simple PDF-to-text converters, TET operates at a deep structural level, delivering high-fidelity extraction even from complex, irregular or poorly formed PDFs. This makes it ideal for enterprise systems that depend on accurate content retrieval, indexing or transformation.

TET is widely used across industries that need to analyse or repurpose information contained in PDF documents, including:

Search platforms & content indexing engines
Digital archives & library systems
Machine learning and natural-language processing workflows
Financial and legal document processing
Invoice, receipt and statement extraction
Healthcare and regulatory document parsing
Large-scale text mining & analytics
Metadata extraction for content management systems

If your workflow requires you to understand what’s inside a PDF — text, images, fonts, reading order — TET is the most accurate and reliable tool available.

Features

High-Accuracy Text Extraction

Extracts plain text, structured text or positional text

Handles Unicode, ligatures and complex scripts

Works with PDFs using advanced or embedded font formats

Supports both physical layout extraction and logical reading order

Ideal for search, indexing, classification and data processing.

Image Extraction

Extract embedded images in original resolution and format

Supports JPEG, PNG, TIFF, CCITT and other common image types

Save or process images programmatically

Perfect for OCR pipelines, AI training datasets, or document analysis workflows.

Structural Content Analysis

Retrieve text positions, bounding boxes and typographic details

Extract hierarchical text structures (pages, blocks, lines, words, glyphs)

Supports Tagged PDF structure for semantic extraction

Essential for applications requiring layout-aware understanding.

Metadata & Document Information

Extract detailed document attributes such as:

XMP metadata

Document info (author, subject, keywords, etc.)

Page size, rotation and attributes

Font information and resource usage

Useful for document indexing and compliance workflows.

Multilingual & Global Script Support

TET handles virtually every script and writing system, including:

Latin, Cyrillic, Greek

Arabic, Hebrew

Chinese, Japanese, Korean

Indic scripts