Home Blog What is PDFlib TET and How is it Used?

What is PDFlib TET and How is it Used?

18 Feb 22

Without the right toolkit, extracting content from PDF documents can be extremely difficult. The ultimate solution is PDFlib TET (Text and Image Extraction Toolkit); one of many useful software developer components offered by Greatstone International. We will discuss what PDFlib TET is and some of its primary uses in this post.

What is PDFlib TET?

PDFlib TET, from the PDFlib suite, is a developer toolkit that allows developers to extract text, imagery, and metadata from PDF documents. Text is stripped into Unicode strings, detailed colour, glyph and font information, and page position, whilst raster images are extracted in popular image formats, including TIFF and JPEG.

How is it used?

Some of the core uses of PDFlib TET for PDFs include:

  • Conversion – It can convert PDF documents to an XML-based format containing text, metadata, and resource information called TETML.
  • Processing – When used alongside PDFlib PDI, TET can process PDF documents based on their contents, for example, separating PDFs using document headings.
  • Indexing – It allows developers to implement a PDF indexer for a search engine.
  • Repurposing – It can repurpose text and images in PDFs.
  • Inspection – It enables developers to dynamically check if a target location on the page is empty before inserting a barcode or stamp.
  • Analysis – With sophisticated content analysis algorithms, TET can verify word boundaries, group text into columns, detect table structures and delete unnecessary items.
  • Querying – Developers can query PDF document details such as XMP metadata, font lists, page size and document information fields.

Speak to the PDFlib experts

It is one thing to understand PDFlib TET and what it can do, but how can you begin using it? Speak to the team at Greatstone International and we will help you get started.

Related Articles

PDFlib Product Family Version 10 now available

PDFLib version 10 is now available. Check out the new features, bug fixes, and much-anticipated user-led enhancements. Read More

PDF Automation Made Easy with Investintech from Greatstone

Since Greatstone partnered with Investintech in early 2020, we have been continually impressed with how versatile the PDF solutions from Investintech are. Read More

Pick of the Bunch: PDFlib Personalization Server

Perhaps the most popular file type in the business world is the portable document format, more commonly known as PDF. Read More