Pipeline Pilot

Pipeline Pilot is a desktop software program sold by Dassault Systèmes for processing and analyzing data. It was originally used for its basic ETL (extract, transform, and load) and analytics capabilities, which have broadened over time.

The program has the ability to design data workflows using a graphical user interface. It is a visual and dataflow programming software and has been used in cheminformatics, QSAR,  Next Generation Sequencing, image analysis,  and text analytics.

History
Pipeline Pilot was created by SciTegic. BIOVIA subsequently acquired SciTegic and Pipeline Pilot in 2004. BIOVIA was itself purchased by Dassault Systèmes in 2014.

The product expanded from an initial focus on chemistry to include general extract, transform, and load (ETL), analytical, and data processing collection capabilities.

Overview
Pipeline Pilot is part of a class of software products that provide user interfaces for manipulating and analyzing data. Like other graphical ETL products, it enables users to pull from different data sources, such as CSV files, text files, and databases.

The graphical user interface, called the Pipeline Pilot Professional Client, allows users to drag and drop discrete data processing units called "components". Components can load, filter, join, or manipulate data. Components can also build regression models, train neural networks, or process datasets into PDF reports. Pipeline Pilot implements a component paradigm. Components are represented as nodes in a workflow. In a mathematical sense, components are modeled as nodes in a directed graph: "pipes" (graph edges) connect components and move data along from node to node, where operations are performed on the data.

Users can choose from components that come pre-installed or create their own components in workflows called "protocols". Protocols are sets of linked components. Protocols can be saved, reused, and shared. Users can mix and match components that are provided with the software from BIOVIA with their own custom components. Connections between two components are called "pipes", and are visualized in the software as two components connected by a pipe. Data flows from left to right along the pipes. Pipeline Pilot can visually condense a series of data manipulations that involve many components.

Component collections
Pipeline Pilot features a number of add-ons called "collections". Collections are groups of specialized functions like processing genetic information or analyzing polymers offered to end users for an additional licensing fee. Currently, there are a number of these collections.

Custom scripts
Pipeline Pilot is often used when one or more large (1TB+) and/or complex datasets are processed. Early in its development, Pipeline Pilot created a scripting language called "PilotScript" that enabled end users to write basic programming scripts that could be incorporated into a Pipeline Pilot protocol. Later releases extended support for a variety of programming languages, including Python, .NET, Matlab, Perl, SQL, Java, VBScript and R. The product supports a number of APIs for different programming languages that can be executed without the program's graphical user interface.

The syntax for PilotScript is based on PLSQL. It can be used in components such as the Custom Manipulator (PilotScript) or the Custom Filter (PilotScript). As an example, the following script can be used to add a property named "Hello" to each record passing through a custom scripting component in a Pipeline Pilot protocol. The value of the property is the string "Hello World!".