Recoll

Recoll is a desktop search tool that provides full-text search in a GUI with a few mandatory external dependencies. It runs on many Unix-like operating systems and is mostly independent of the desktop environment. Recoll has been ported to OS/2, and is planned for integration into the OS/2-based ArcaOS.

Recoll was designed not to require a permanent daemon; on Linux systems, it can make use of inotify. Recoll updates its index at designed intervals (for example, through cronjobs), but if desired, the indexing task can run as a file-system monitoring daemon for real-time index updates.

Features

 * Qt GUI.
 * Xapian backend.
 * Indexes the contents of many document types: text, HTML, email stores of all kinds, OpenDocument, Microsoft Office and Office Open XML, AbiWord, KWord, Gaim, Lyx, Scribus, PDF, WordPerfect, PostScript, RTF, TeX, DVI, DjVu, MP3 and other audio file formats, JPEG and other image file formats.
 * Recursively processes embedded documents (email attachments, zip archives) to arbitrary depths.
 * Query facilities with boolean searches, wildcards, phrases, proximity, and filters on file types and directory trees.
 * GUI Boolean search build tool.
 * Xesam query language support.
 * Word stemming is performed at query time (you can switch stemming language after indexing).
 * Multiple indexes are selectable at query time (i.e., personal + system indexes).
 * Natively based on Unicode. Supports many languages and character sets, including good support for East Asian texts (CJK).
 * MD5 document hashes for the elimination of duplicates in results.
 * Batch and real-time indexing modes.
 * Python API.
 * GNOME Shell search provider, WEB interface, and Firefox history extensions.

File types indexed natively

 * Text.
 * Html.
 * Maildir, MH, and mailbox (Mozilla, Thunderbird, and Evolution). Evolution requires .cache to be removed from the skippedNames list in the GUI Indexing preferences/Local Parameters/ Pane to index local copies of IMAP mail.
 * Gaim and purple log files.
 * Scribus files.
 * Man pages (needs Groff).
 * Mimehtml web archive format (support based on the mail filter).
 * All the following need Python 3:
 * Dia diagrams.
 * Excel and PowerPoint (pre-open XML).
 * Tar archives. Tar file indexing is disabled by default given that tar archives don't typically contain the kind of documents that people search for, so it needs to be enabled explicitly with "[index]" or "application/x-tar=execm rcltar" in a $HOME/.recoll/mimeconf file.
 * Zip archives.
 * Konqueror web archive format (uses the tarfile Python standard library module).

File types indexed with external helpers

 * PDF files.
 * MS-Word files.
 * Wordperfect files.
 * RTF files.
 * Image and audio file tags.
 * Abiword files.
 * Fb2, Epub, and CHM ebooks.
 * Kword files.
 * Microsoft Office traditional and Open XML files.
 * OpenOffice files.
 * SVG files.
 * Okular annotations files.
 * HWP files (without page numbering).