PDFlib PDFlib TET – PDF Text & Image Extraction Engine
PDFlib TET – PDF Text & Image Extraction Engine

PDFlib TET – PDF Text & Image Extraction Engine

Extract text, images, metadata and structured content from PDFs with exceptional accuracy.

PDFlib Text and Image Extraction Toolkit (TET) is a professional, high-performance library that enables developers to extract meaningful content from PDF documents. Whether you're building search systems, content pipelines, AI/ML data processing workflows, or automated document analysis tools, TET provides the precision and control needed for reliable PDF extraction at scale.

Greatstone Software is an authorised UK distributor of TET, offering competitive pricing, expert product guidance, and fast licence delivery for all deployment environments.

PDFlib TET – PDF Text & Image Extraction Engine PDFlib TET 6.0 IBM AIX License type: License type: 1 - 4 licences. Email [email protected] for up to 60% volume discount
£6,445.00
PDFlib TET – PDF Text & Image Extraction Engine PDFlib TET 6.0 IBM i5/iSeries License type: License type: 1 - 4 licences. Email [email protected] for up to 60% volume discount
£6,445.00
PDFlib TET – PDF Text & Image Extraction Engine PDFlib TET 6.0 Linux License type: License type: 1 - 4 licences. Email [email protected] for up to 60% volume discount
£2,820.00
PDFlib TET – PDF Text & Image Extraction Engine PDFlib TET 6.0 OS X Desktop License type: License type: 1 - 4 licences. Email [email protected] for up to 60% volume discount
£985.00
PDFlib TET – PDF Text & Image Extraction Engine PDFlib TET 6.0 Windows Desktop License type: License type: 1 - 4 licences. Email [email protected] for up to 60% volume discount
£985.00
PDFlib TET – PDF Text & Image Extraction Engine PDFlib TET 6.0 Windows Server License type: License type: 1 - 4 licences. Email [email protected] for up to 60% volume discount
£2,820.00

Overview

PDFlib TET is a specialised extraction toolkit designed to read and interpret the internal structure of PDF files, making it possible to retrieve:

  • Text in any language, script or encoding
  • Embedded images and graphics
  • Content position, geometry and structural relationships
  • Document metadata, XMP data and page information
  • Reading order and logical layout where available

Unlike simple PDF-to-text converters, TET operates at a deep structural level, delivering high-fidelity extraction even from complex, irregular or poorly formed PDFs. This makes it ideal for enterprise systems that depend on accurate content retrieval, indexing or transformation.

TET is widely used across industries that need to analyse or repurpose information contained in PDF documents, including:

  • Search platforms & content indexing engines
  • Digital archives & library systems
  • Machine learning and natural-language processing workflows
  • Financial and legal document processing
  • Invoice, receipt and statement extraction
  • Healthcare and regulatory document parsing
  • Large-scale text mining & analytics
  • Metadata extraction for content management systems

If your workflow requires you to understand what’s inside a PDF — text, images, fonts, reading order — TET is the most accurate and reliable tool available.

Features

High-Accuracy Text Extraction
Extracts plain text, structured text or positional text
Handles Unicode, ligatures and complex scripts
Works with PDFs using advanced or embedded font formats
Supports both physical layout extraction and logical reading order
Ideal for search, indexing, classification and data processing.
Image Extraction
Extract embedded images in original resolution and format
Supports JPEG, PNG, TIFF, CCITT and other common image types
Save or process images programmatically
Perfect for OCR pipelines, AI training datasets, or document analysis workflows.
Structural Content Analysis
Retrieve text positions, bounding boxes and typographic details
Extract hierarchical text structures (pages, blocks, lines, words, glyphs)
Supports Tagged PDF structure for semantic extraction
Essential for applications requiring layout-aware understanding.
Metadata & Document Information
Extract detailed document attributes such as:
XMP metadata
Document info (author, subject, keywords, etc.)
Page size, rotation and attributes
Font information and resource usage
Useful for document indexing and compliance workflows.
Multilingual & Global Script Support
TET handles virtually every script and writing system, including:
Latin, Cyrillic, Greek
Arabic, Hebrew
Chinese, Japanese, Korean
Indic scripts