Business process discovery

Business process discovery (BPD) related to business process management and process mining is a set of techniques that manually or automatically construct a representation of an organisations' current business processes and their major process variations. These techniques use data recorded in the existing organisational methods of work, documentations, and technology systems that run business processes within an organisation. The type of data required for process discovery is called an event log. Any record of data that contains the case id (a unique identifier that is helpful in grouping activities belonging to the same case), activity name (description of the activity taking place), and timestamp. Such a record qualifies for an event log and can be used to discover the underlying process model. The event log can contain additional information related to the process, such as the resources executing the activity, the type or nature of the events, or any other relevant details. Process discovery aims to obtain a process model that describes the event log as closely as possible. The process model acts as a graphical representation of the process (Petri nets, BPMN, activity diagrams, state diagrams, etc.). The event logs used for discovery could contain noise, irregular information, and inconsistent/incorrect timestamps. Process discovery is challenging due to such noisy event logs and because the event log contains only a part of the actual process hidden behind the system. The discovery algorithms should solely depend on a small percentage of data provided by the event logs to develop the closest possible model to the actual behaviour.

Process discovery techniques
Various algorithms have been developed over the years for the discovering the process model using an event log:
 * α-algorithm - α-algorithm was the first process discovery algorithms that could adequately deal with concurrency. With an event log as the input, the α-algorithm derives various "relations" between the activities occurring in the event log. These relations are used to produce a Petri net that represents the log. Although the α-algorithm should not be considered as mining technique that can be used in practice, it provides a good introduction to the topic. The α-algorithm provided the basis for many other process discovery techniques.
 * Heuristic mining – Heuristic mining algorithms use a representation similar to causal nets. Moreover, these algorithms take frequencies of events and sequences into account when constructing a process model. The basic idea is that infrequent paths should not be incorporated into the model.
 * Genetic process mining - The α-algorithm and techniques for heuristic and fuzzy mining provide process models in a direct and deterministic manner. Genetic algorithms are a search technique that mimics the natural process of evolution in biological systems. These algorithms try to find a solution in the search space, by either testing existing points, or through the process of mutation or a combination of existing points. Such approaches are not deterministic and depend on randomisation to find new alternatives.
 * Region-Based mining - In the context of Petri nets, researchers have been looking at the so-called synthesis problem, i.e., constructing a system model from a description of its behavior. State-based regions can be used to construct a Petri net from a transition system. This technique finds "General Excitation Regions" and builds Petri nets using such regions. Language-based regions can be used to construct a Petri net from a prefix-closed language. The language-based region technique uses algebraic constraints modeled from the event log to determine the places that allow the behavior observed in the event log.
 * Inductive miner - A range of inductive process discovery techniques exists for process trees, which ensure soundness from construction. Therefore, the inductive mining framework is highly extendible and allows for many variants of the basic approach. It is considered one of the leading process discovery approaches due to its flexibility, formal guarantees, and scalability.

Application
Business Process Discovery complements and builds upon the work in many other fields. Resources are allocated based on the process category with resources first dedicated to red processes, then yellow processes and finally green processes. In the event that resources become limited, resources are first withheld from Green Processes, then Yellow Processes. Resources are only withheld from Red Processes if failure to achieve outcomes/goals is acceptable.
 * Process discovery is one of the three main types of process mining. The other two types of process mining are conformance checking and model extension/enhancement. All of these techniques aim at extracting process related knowledge from event logs. In the case of process discovery, there is no prior process model; the model is discovered based on event logs. Conformance checking aims at finding differences between a given process model and event log. This way it is possible to quantify compliance and analyze discrepancies. Enhancement takes an a priori model and improves or extends it using information from the event log, e.g., show bottlenecks.
 * Business process discovery is the next level of understanding in the emerging field of business analytics, which allows organizations to view, analyze and adjust the underlying structure and processes that go into day-to-day operations. This discovery includes information gathering of all of the components of a business process, including technology, people, department procedures and protocols.
 * Business process discovery creates a process master which complements business process analysis (BPA). BPA tools and methodologies are well suited to top-down hierarchical process decomposition, and analysis of to-be processes. BPD provides a bottoms-up analysis that marries to the top-down to provide a complete business process, organized hierarchically by BPA.
 * Business Intelligence provides organizations with reporting and analytics on the data in their organizations. However, BI has no process model, awareness or analytics. BPD complements BI by providing an explicit process view to current operations, and providing analytics on that process model to help organizations identify and act upon business process inefficiencies, or anomalies.
 * Web analytics are a limited example of BPD in that web analytics reconstruct the web-user’s process as they interact with a Web-site. However, these analytics are limited to the process as is contained within the session, from the users perspective and with respect to just the web-based system and process.
 * Business triage provides a framework for categorizing the processes identified by business process analysis (BPA) based on their relative importance to achieving a stated, measurable goal or outcome. Utilizing the same categories employed by military medical and disaster medical services, business processes are categorized as:
 * Essential/critical (red process) - Process essential for achieving outcomes/goals
 * Important/urgent (yellow process) - Process which speeds achieving outcomes/goals
 * Optional/supportive (green process) - Process not needed to achieve outcomes/goals

The purpose and example
A small example may illustrate the Business Process Discovery technology that is required today. Automated Business Process Discovery tools capture the required data, and transform it into a structured dataset for the actual diagnosis; A major challenge is the grouping of repetitive actions from the users into meaningful events. Next, these Business process discovery tools propose probabilistic process models. Probabilistic behavior is essential for the analysis and the diagnosis of the processes. The following shows an example where a probabilistic repair-process is recovered from user actions. The "as-is" process model shows exactly where the pain is in this business. Five percent faulty repairs is a bad sign, but worse, the repetitive fixes that are needed to complete those repairs are cumbersome.



A deeper analysis of the "as-is" process data may reveal which are the faulty parts that are responsible for the overall behavior in this example. It may lead to the discovery of subgroups of repairs that actually need management focus for improvement.



In this case, it would become obvious that the faulty parts are also responsible for the repetitive fixes. Similar applications have been documented, such as a Healthcare Insurance Provider case where in 4 months the ROI of Business Process Analysis was earned from precisely comprehending its claims handling process and discovering the faulty parts.

History

 * Business intelligence (BI) emerged more than 20 years ago and is critical for reporting what is happening within an organization’s systems. Yet current BI applications and data mining technologies are not always suited for evaluating the level of detail required to analyze unstructured data and the human dynamics of business processes.
 * Six Sigma and other quantitative approaches to business process improvement have been employed for over a decade with varying degrees of success. A major limitation to the success of these approaches is the availability of accurate data to form the basis of the analysis. With BPD, many six-sigma organizations are finding the ability to extend their analysis into major business processes effectively.
 * Process mining According to researchers at Eindhoven University of Technology, (PM) emerged as a scientific discipline around 1990 when techniques like the Alpha algorithm made it possible to extract process models (typically represented as Petri nets) from event logs. Criticisms have emerged  pointing out that Process Mining is no more than a set of algorithms which solves a specific and simple business problem: business process discovery and auxiliary evaluation methods. Today, there are over 100 process mining algorithms that are able to discover process models that also include concurrency, e.g., genetic process discovery techniques, heuristic mining algorithms, region-based mining algorithms, and fuzzy mining algorithms.

Process models
The process discovery techniques applied to the event logs provide a graphical representation of a process. The result of a process discovery algorithm is generally a process model and statistics of the cases that are part of the event log. The representation and accuracy of the discovered model depend both on the technique used for the discovery and the type of visualization that is chosen.


 * Directly-follows graph: A Directly-Follows Graph (DFG) is the simplest representation of the process models. In a directly-follows graph, each node represents an activity and the arcs describe the relationship between various activities. Typically in a process model, the directly-follows graph has a source and sink representing the start and end activities. An arc in the directly-follows graph between any two activities represents that the source activity is directly followed by the sink activity in the event log.
 * Petri nets: Petri nets provide a higher-level representation of the process models and allow for a compact representation of concurrent behaviour in processes. A Petri net is capable of showing different types of transformations between the activities. Petri nets are capable of describing sequential, parallel, choice, and loop execution between various activities in the processes. The notion of token flows has been adopted by most of the graphical process modeling languages (BPMN, UML activity diagrams, etc.).
 * BPMN: The BPMN 2.0 (Business Process Model and Notation) standard is widely used and allows building compact and understandable process models. In addition to the flat control-flow perspective, subprocesses, data flows, resources can be integrated within one BPMN diagram. This makes BPMN very attractive for both process miners and business users since the control flow perspective can be integrated with data and resource perspectives discovered from event logs.