Tagsistant

Tagsistant is a semantic file system for the Linux kernel, written in C and based on FUSE. Unlike traditional file systems that use hierarchies of directories to locate objects, Tagsistant introduces the concept of tags.

Design and differences with hierarchical file systems
In computing, a file system is a type of data store which could be used to store, retrieve and update files. Each file can be uniquely located by its path. The user must know the path in advance to access a file and the path does not necessarily include any information about the content of the file.

Tagsistant uses a complementary approach based on tags. The user can create a set of tags and apply those tags to files, directories and other objects (devices, pipes, ...). The user can then search all the objects that match a subset of tags, called a query. This kind of approach is well suited for managing user contents like pictures, audio recordings, movies and text documents but is incompatible with system files (like libraries, commands and configurations) where the univocity of the path is a security requirement to prevent the access to a wrong content.

The tags/ directory
A Tagsistant file system features four main directories:


 * archive/
 * relations/
 * stats/
 * tags/

Tags are created as sub directories of the  directory and can be used in queries complying to this syntax:



where a subquery is an unlimited list of tags, concatenated as directories:



The portion of a path delimited by  and   is the actual query. The  operator joins the results of different sub-queries in one single list. The  operator ends the query.

To be returned as a result of the following query:



an object must be tagged as both  and   or as both   and. Any object tagged as  or , but not as   will not be retrieved.

The query syntax deliberately violates the POSIX file system semantics by allowing a path token to be a descendant of itself, like in  where   appears twice. As a consequence a recursive scan of a Tagsistant file system will exit with an error or endlessly loop, as done by UNIX  :

This drawback is balanced by the possibility to list the tags inside a query in any order. The query  is completely equivalent to   and   is equivalent to.

The  element has the precise purpose of restoring the POSIX semantics: the path   refers to a traditional directory and a recursive scan of this path will properly perform.

The reasoner and the relations/ directory
Tagsistant features a simple reasoner which expands the results of a query by including objects tagged with related tags. A relation between two tags can be established inside the  directory following a three level pattern:



The  element can be includes or is_equivalent. To include the rock tag in the music tag, the UNIX command  can be used:



The reasoner can recursively resolve relations, allowing the creation of complex structures:



The web of relations created inside the  directory constitutes a basic form of ontology.

Autotagging plugins
Tagsistant features an autotagging plugin stack which gets called when a file or a symlink is written. Each plugin is called if its declared MIME type matches

The list of working plugins released with Tagsistant 0.6 is limited to:


 * text/html: tags the file with each word in  and   elements and with document, webpage and html too
 * image/jpeg: tags the file with each Exif tag

The repository
Each Tagsistant file system has a corresponding repository containing an  directory where the objects are actually saved and a   file holding tagging information as an SQLite database. If the MySQL database engine was specified with the  argument, the   file will be empty. Another file named  is a GLib ini store with the repository configuration.

Tagsistant 0.6 is compatible with the MySQL and Sqlite dialects of SQL for tag reasoning and tagging resolution. While porting its logic to other SQL dialects is possible, differences in basic constructs (especially the INTERSECT SQL keyword) must be considered.

The archive/ and stats/ directories
The  directory has been introduced to provide a quick way to access objects without using tags. Objects are listed with their inode number prefixed.

The  directory features some read-only files containing usage statistics. A file  holds both compile time information and current repository configuration.

Main criticisms
It has been highlighted that relying on an external database to store tags and tagging information could cause the complete loss of metadata if the database gets corrupted.

It has been highlighted that using a flat namespace tends to overcrowd the  directory. This could be mitigated introducing triple tags.