https://www.mandalka.name/investigateix/
InvestigateIX
InvestigateIX is a live-system for search empowering investigative journalists to setup an own open-source search engine on an encrypted external device to search in a large amount of documents, files and data for searching, overview, analysis, exploration, document discovery, text mining and document mining Search in a large collection of documents
your own internal search engine
search in many documents and files (full text search, explorative search and interactive filters)
analyze documents (document mining and text mining, aggregated overviews, viewer, wordlists and visualizations with wordclouds and trend charts)
structure investigations (semantic wiki for tagging documents, annotations and structured notes and an user interface for managing named entities like persons, organzations, locations or concepts)
automatic import of many different document formats:
automatic text recognition (OCR) for images and graphical files like JPG or PNG, i.e. for scanned or photographed documents or for scans inside PDF
secure: on an encrypted external device
unhosted: works offline without need of internet or spying cloud services
empowering journalists and activists: without need for an admin or computer specialist
easy to setup
easy to use
cheep to use: one old standard pc or laptop or even a netbook is enough
free open source software based on Debian GNU/Linux (Operating system), Apache Solr (Enterprise Search), Open Semantic Search (search engine and user interface) and Cryptsetup for dm-crypt and LUKS (disk encryption)
extendable: Using open standards and offering a powerfull plugin interface you can write own search tools or data enrichment plugins with a few line of code