Heurist

Heurist is an Open Source online database builder and CMS publisher designed for Humanities research data and collections, including data on people, organisations, places, events, artefacts, documents, media, bibliographic records, contemporary stories and other data which is rich in text and classification data, richly interlinked, and often heterogeneous.

Heurist was originally designed by Ian Johnson (from 2005) and developed by the (now disbanded) Arts eResearch unit (AeR) at the University of Sydney. It continues to be actively developed within the Faculty of Arts and Social Sciences (version 6 released 2021). Free web services for building research databases are available at https://heuristplus.sydney.edu.au/ and [https://- https://heurist.Huma-Num.fr]. New Heurist servers can be set up using installation packages downloadable from the project web site (http://HeuristNetwork.org). The source is available at https://github.com/HeuristNetwork/heurist).

Heurist was developed to overcome three problems identified as common to researchers in the Humanities (and others):


 * the technical expertise required to set up rich heterogeneous databases with relationships between entities, and to publish data selectively to the web
 * the fragmentation of research data across many separate poorly-connected or incompatible databases
 * problems of sustainability due to the ad hoc nature of custom database development requiring individual maintenance of each database

It aims to tackle these issues by:


 * providing a web service supporting the on-demand creation, management and population of new databases through a web interface, and the creation of CMS web sites embedded directly in the databases which have direct access to the database content.
 * allowing the storage and interlinking of a wide variety of research data, notes, annotations and digital attachments in a single shared database, while providing individual ‘views’ on this data and workgroup-owned and private areas for research in progress.
 * centralised update and maintenance of thousands of databases, and automatic update of database formats by newer software versions to ensure backward compatibility (from ~2010). Data can also be dumped in a reloadable archival format.

Methodology
Heurist is written in PHP and JavaScript, on top of a fixed MySQL/MariaDB data structure (all Heurist databases have the same underlying MySQL structure, as the schema of the domain is encoded directly in the database as editable data). Entities/record types, fields, vocabularies and terms are defined through data within the database rather than being hardcoded in the software or database structure. Heurist uses a key-value pair approach linked to a primary data table instantiating typed entities, allowing variant data structures and repeating value fields (0 or 1 ..1..m cardinality) with maintained order. Relationships between entities are implemented as record pointer fields (equivalent to a Foreign Key) and Relationship Marker fields (constraining the creation of relationship records linking any two records/entities).

Heurist has the following field types, all of which can have multiple cardinality:


 * Numeric (integer or decimal)
 * Text (single line or memo)
 * Term lists (values from a controlled hierarchically organised vocabulary)
 * Date / time fields (including fuzzy dates and several alternative calendars)
 * Geographic (point, line, polygon, circle, rectangle)
 * Pointer fields allowing lookup of another record in the database (type constrained or unconstrained)
 * Relationship marker fields allowing the creation of typed, constrained, directional, dated and annotated relationships between records
 * File fields - uploaded to server or remote files referenced through a URL (including tiled images and IIIF)

Heurist provides several modes of data visualisation and export based on filtered subsets of the database: export in CSV, JSon, XML, KML, GeoJSon, GEFX for Gephi, IIIF manifests; tabular listing; user-defined reporting using Smarty; interactive maps and timelines (items with geographic or time fields); simple network diagrams; crosstabulation. Widgets for these visualisations can be embedded in the CMS website generated from the database, or in standalone web pages or iframes in an external website.

Databases can be populated through form-based data entry, CSV import via a wizard which matches existing records and normalises data by extracting and linking entities based on selected columns, Zotero bibliography synchronisation, KML import, media uploads and indexing.

Other functions include wizards to build simple or facetted searches, personal and shared saved searches, search expansion rules to pull in related records, workgroup ownership of records, group notifications, blogging, a bookmarklet for capturing web references, WYSIWYG formatted text, user and workgroup tags.

For developers there is an API and all the export formats are available as live feeds. XML output can be transformed through XSLT stored in records within the database (temporarily unavailable, due to be reinstated 2022). Heurist source code is available under GNU GPL from the GitHub repository at https://github.com/HeuristNetwork/heurist and can be installed on any LAMP server, including virtual servers in the NeCTAR Research cloud, Amazon AWS and virtual servers from most ISPs. It has also been successfully installed on Windows servers.

Applicability
Heurist was conceived as a digital knowledgebase for managing heterogeneous data with rich interlinking, in small to medium collections (typically <500K records), often rich in media, textual and categorisation data, such as those typically found in the Arts and Humanities, and in personal research spaces. It is not suitable for large, structured, homogeneous, numerical datasets typical of the Sciences.

Heurist allows management of information with spatial and temporal components. Spatial components include the ability to enter georeferenced points, polygons etc. directly into an editor, as well as the ability to upload spatial data such as KML and Shapefiles. Spatial data is displayed on a map view within the database. Temporal components include the ability to enter dates as calendar dates, ranges, fuzzy dates or radiocarbon dates, with confidence levels. Dates are displayed on a timeline generally linked to the map display.

As of end 2021 Heurist is supporting a coupe of hundred projects on the public servers, ranging from large ERC (Europe), AHRC (UK), ANR (France) and ARC (Australia) to many small personal projects such as PhD research, primarily in Humanities disciplines.

Example applications
A more extensive list of examples can be found at http://HeuristNetwork.org/Projects

Recent projects (last 5 years)
tbc

Older projects
These projects remain active (end 2021)


 * Beyond 1914 (beyond1914.sydney.edu.au) and Expert Nation (ExpertNation.org) - records of university staff and students involved in WWI (University of Sydney and nationwide, respectively). Developed 2013 & 2016. A website for the University of Adelaide runs off the same database.
 * Virtual Museum of Balinese Paintings (balipaintings.org) - research into 20thC Balinese paintings which links to works scattered across multiple collections in various countries. Developed ~2010.
 * Digital Harlem (DigitalHarlem.org/) - search and mapping of events (mostly recorded in legal records) from 1915 to 1930 Harlem. First developed in 2003 and transferred to Heurist ~2013

Past projects
These projects are complete or no longer active.


 * Heurist was used as the database to manage the cultural heritage information for nomination of the World Heritage Site Bahrain Pearling Trail, which was successfully inscribed on the UNESCO World Heritage List in 2012. Cultural Heritage Managers at the former Ministry of Culture in Bahrain (now the Bahrain Authority for Culture & Antiquities) used Heurist to collate, analyse, manage and assist with the vast array of data associated with the nomination. This data included spatial polygons defining the properties to be included in the World Heritage Site, details of the properties (including timelines and history of ownership), details of people associated with the properties (including anthropological interviews with informants), associated photographs, documents and plans, including architectural plans and legal documents. These items were all cross-referenced with intuitive relationships defining how they were associated with each other. This database was referred to in the Nomination file, accepted by UNESCO in 2012.
 * Federated Archaeological Information Management System - generation of database schemas and interoperability with Android field data collection system. Development of a new version of FAIMS in 2021/2022 has incorporated some of the application building and data management functions originally offered by the Heurist integration, while changes in format will require reprogramming of interoperability.
 * the Dictionary of Sydney - born digital in Heurist from 2006 (first published 2009), the public web site was generated directly from the Heurist database until ~2016, when the project was transferred to the State Library of NSW and converted to their internal systems.
 * the Australian Broadcasting Corporation Gallipoli project [1]. - events stored in Heurist and generated as XML for input to the visualisation
 * Early Agricultural Remnants and Technical Heritage (EARTH) Programme [2] - database of photographic and video recordings of agricultural practice