The GroupDocs.Search for Java API is an enterprise reading developer library for Java platforms. This Java library supports simple to sophisticated search features including merging and collecting multiple indexes, using Regular Expression (regex), Simple, Boolean, Fuzzy and other query types to grab your required data and search through indexes, through smart search from files, documents and emails. GroupDocs.Search for Java is all developers need to developer quick robust, intelligent and versatile search applications.
GroupDocs.Search for Java supports the following formats:
Microsoft Office Formats
- Excel: XLS, XLSX, XLSM, XLT, XLTX,LAM, CSV, TSV, XLTM, XLSB, XLA,
- PowerPoint: POT, POTX, PPS, PPSX, PPPT, PPTX, PTM, PPSM, POTM
- Word: DOC, DOCX, DOCM, DOT, DOTX, DOTM
- Diagram: VSD, VSS
- Microsoft Compiled HTML: CHM
- Project: MPP
- OneNote: ONE
OpenDocument & Other Formats
- Portable Document Format: PDF
- OpenDocument: ODT, OTT, ODS, OTS, ODP
- Email: PST, OST, MSG, EML, EMLX
- Web File Formats: XHTML, MHT, XML, HTM, HTML, MHTML
- Audio: MP3, WAV
- Video: AVI, MOV, QT, FLV, ASF
- Text: TXT
- Rich Text Format: RTF
- Markdown Documentation File: MD
- Images: TIFF, EMF, BMP, GIF, JP2, PNG, WEBP,WMF, JPG, PSD
- Others: TORRENT, DJVU, EPUB, FB2ZIP, DCM
Search & Indexing
Developers can use the GroupDocs.Search for Java API to perform intelligent search and indexing. Indexes are used to parse, collect or store data for quick and precise searching.
- Load Index: Load an existing Index.
- Create Index: Create Index folders and insert & index documents to them.
- Update Index: Update an Index whenever a document is modified, deleted or added to keep search results current.
- Add Documents to Index: Asynchronously Add documents to existing Index.
Merge Several Indices to Improve Search Effectiveness
Use GroupDocs.Search for Java to combine more than one Index. Frequently updated indices become slow and inefficient. Using the GroupDocs.Search for Java API developers are able to quickly merge delta indices into a composite index. The merged Index contains the information from the delta indices and maintains the integrity of the delta indices. This improves search efficiency and provides many functions to improve the search.
Generate HTML Markup with Stored Text in Index
GroupDocs.Search for Java is able to cache the text from indexed documents in an index. The cached text can be used to quickly produce HTML markup by highlighting search results. This method is more efficient than extracting the text directly from files. Extracting text from cache is available without the source files. The cached text can be stored using a variety of compression levels to occupy less disk space and reduce indexing duration.
Use Fuzzy & Regex Search to Get Related Documents
Developers use Fuzzy or Regex search to get the list of documents that exactly match search criteria. However, search results also contain words or terms similar in your search. For example, use GroupDocs.Search for Java to perform a fuzzy search for query “Greatstone”, and you will get documents containing the word “greatstone” and documents containing similar words such as “Great Stone”. The results will depend on the level of fuzziness specified.
Recognise Search Queries of Different Keyboard Layout
GroupDocs.Search for Java is able to recognise search queries written in a language that does not match a user’s keyboard’s layout. GroupDocs.Search for Java is able to recognise 100 languages and 184 keyboard layouts.
Search with Morphological Word Forms
GroupDocs.Search for Java API allows developers to search for diverse word forms. Developers can search for a noun, verbs, root, third-person singular, simple past and various other forms by searching for its plural and singular forms.