User:Ottomachin/Real world modelling and OWL

At a recent meeting the workshop leader reminded us that our primary task was "to create a model of the real world in OWL". This task statement can be decomposed into two activities:
 * model the real world
 * express the model in the web ontology language OWL (although OWL2 now seems to have reverted to being a "virtual" language, in so far as it now has several syntaxes, and the number seems to be growing !).

The purpose of this memo is to show that there is: Apart from simple common sense, another justification for decomposing the primary task into two subtasks can be found in the Protégé user documentation:
 * a preferred method for creating models of the real world, ie. ER (Entity-Relationship) modelling (now subsumed in UML)
 * a preferred method for publication of these models in OWL, ie. systematic translation.

This statement, as well as confirming the two subtasks of modelling and conversion to OWL, also hints at other dimensions of these two subtasks, ie. that for the first, we should probably use some tool; and that for second, at least one requirement is that it can be done easily.

Greek Philosophy
Modelling the real world is a well known and understood activity. From at least the time of the ancient Greek philosophers there was understanding of the ontological (aspects of being) and epistemological (aspects of knowledge) and their application to understanding the real world. Plato (428 BC) described the:

His student Aristotle (384 BC), perhaps the most famous of all ancient Greek philosophers and the father of logic, is known for his categories of being and his considerations of the "bare particulars" (objects in themselves) and their "inherence relations" (their properties and relationships). In various ontological theories of the "categories of being" it can be seen that the common core categories are category, property and relationship. Aristotle was also aware of questions such as whether or not an object should be considered as just the totality of it's attributes and relationships (ie. "bundle theory" which while perhaps not philosophically convincing, does in fact have some utility in the actual practice of modelling).

Modern Philosophy
A modern philosophical dictionary summarises these philosophical concepts:

This is very close to a common modern definition often given for an entity, which is a "Person, Place, Event, Concept or Thing". The dictionary goes on to quote a more modern philospher, Charles S. Peirce who succinctly defines the broad notion of an object as follows:

and the dictionary then continues to a final key summary:

Entity, Attribute and Relationship
In other words, the world consists of entities, and these entities may have attributes and may participate in relationships. The consequence of all of this philosophy, is that we can be quite confident, that if we wish to create a description or model of the real world, we can do so by using just these three concepts: In fact, if we should embark upon modelling the world using some other technique, we must answer the question: what is the philosophical and theoretical basis for that technique. If there is no cogent answer, then we must accept that we are being arbitrary and ad hoc in our modelling, ie. we are just making it up as we go along.
 * entity (class, category, type, object)
 * attribute (property, quality)
 * relationship (relation, association)

It is a common refrain in ontological texts that: "Anyone can say anything about anything".

But just because we can, doesn't mean we should.

Data Modelling
As the field of "Data Processing" evolved from the mid-20th century, software engineers realised that functional requirements could be categorised as being either "data requirements" or "process requirements". The data requirements analysts eventually began to realise that the data should in fact constitute a structural picture of the business, and that the technological details of implementation should be subordinate to the necessity for realism in this picture of the business. This culminated in 1976 with the seminal paper by the father of Entity-Relationship modelling, Peter Chen, which describes the goal of creating models of the real world and comes to the same fundamental concepts as the philosophers who had gone before him:

the paper then contrasts the then existing common data model methodologies:

Other authors also had a similar programme, eg. Kent

Comparison of the concepts from the various domains of knowledge representation shows the synonomy amongst the terms used:

It is thus clear that the ER modelling technique is purpose built to enable modelling of the real world. We are also fortunate in that there exists a wealth of supporting training, experience, best practice and academic literature, which has been built up over the last three decades. Not to mention the philosophical considerations over millenia. The technique is sound. Unlike OWL for instance, there are no semantic paradoxes remaining in its usage.

Modelling Tools
Neither are we limited to pencils and backs of envelopes as our modelling tools. There are many ER tools available (although it is important that the tool is a proper ER tool and not just a record modelling tool in disguise - ERWin being a notable failure in this regard). It can also be demonstrated that a small subset of the extensive number of UML class diagram elements are sufficient to be completely analogous to ER diagrams and so the new generation of UML CASE (Computer Aided Software Design) tools can be used in place of ER CASE tools. In describing the purpose of ontologies, Protégé user documentation says:

The UML class diagrams are exactly "structural" in nature with "information" as their focus. The article continues to give a description "in practical terms" of what the steps are in the development of an ontology, which could just as easily be instructions for building an ER or UML model:

although the subsequent steps are more closely related to the entry of instance data into a database.

The OWL2 primer also gives (though very grudgingly) support to the concept that in many cases, an ontology is very closely related to a data model:

This is even more certainly the case in our primary field of interest "interoperability" where it is our key objective to discover and relate together various structures of data. The benefits of using visual modelling tools have been known for many years, one of the primary benefits of ER modelling has always been its visual nature:

Translation
Once we have a model the next step is to then create a representation of the model in OWL. As previously mentioned, we would like this process to be as easy as possible (amongst other requirements!):

Another paper also describes the need for a model and for automatic translation of models into OWL:

The paper then describes some of the other benefits of automatic translation of models into OWL:

It has long been obvious that most of the processes of software engineering should, where-ever possible, be automated and "untouched by human hand". "Model Driven" architecture, design and programming are all in vogue. It is well understood that automation of these processes has many benefits, the automated processes are then:
 * manageable
 * auditable
 * reversible
 * repairable
 * repeatable
 * shareable
 * efficient
 * standardised
 * self documented

The following figure shows a categorisation of various approaches to the automatic generation of ontologies:

Figure 1 Classification of database-to-ontology mapping approaches.

and also a survey of some tools:

Figure 2 Features of different database-to-ontology mapping tools

Note that DataGenie has now been superceded by DataMaster. Note also that the focus of this memo is the creation of new ontologies rather than mapping to existing.

Methods
There are several options for creating ontologies from UML models. The first consideration is the difference between ontology Classes and Individuals. This is analogous to the difference between table definitions and row instances in a relational database. Thus one immediately obvious approach would be to generate a relational database from a UML model then translate from the database into OWL. This approach would also allow for any required instance data to be inserted into the relational tables and then exported into OWL. The D2R tool supports the export of both tables and rows from a relational database into OWL classes and OWL individuals. This tool creates proper OWL classes, whereas DataMaster and some other tools create definitions of tables and columns wrapped in RDF/XML, usually in Relational OWL.

Figure 3 Schema represented in OWL and Relational.OWL

Although obviously useful for certain purposes, relational OWL is not what we would want, as it models a database and not the domain ie. it is not a model of the real world.

It is also possible to create OWL ontologies directly from a UML model. The obvious path is to first export the UML model into a .XMI file and then have an automatic translator read from the .XMI and generate the OWL ontology. This approach should give to the translator a measure of independence from the UML CASE tool used, due to the XMI standard, however it is well known that vendor standardisation on XMI is unfortunately by no means perfect. There are many groups working on tools for generation of OWL from UML. The OMG (Open Management Group) has also recently published a set of standards, metamodels, profiles and .XMI files to support UML to OWL mapping. . An example using the ODM exists on the Eclipse site.

UML Class models translate well to the ontology Classes; but if any individuals were also required then additional Object modelling may be necessary. An alternative hybrid approach, where instance data is sourced from other datastores eg. spread sheets or databases, would not be difficult and would probably be preferrable.

Fortunately, in general, we would not expect to have great need of very much instance data, eg. we certainly would expect to model a class for PERSON, but it is rather unlikely that we would also wish to model Jack, Jill, Tom, Dick and Harry. Any instance data, which is of interest, is most likely to be "lookup data" such as "reference data" eg. country_codes, currency_codes etc. or perhaps "master data" eg. product_code, department_code etc. and rarely, if at all, "transaction data". Lookup data occurs in much lower volumes than transaction data. If transaction level data is required, it is almost certain to be sourced from operational databases, and would require migration rather than modelling.

It is worthy of note, that the JC3IEDM, an ER model, has recently been translated into OWL. Not only is this an example of exactly the kind of task which will become more common in the future, there should also be lessons learned and even some technology arising from this project.

Conclusion
The purpose of this article has been to show that there is:
 * a preferred method for creating models of the real world, ie. ER (Entity-Relationship) modelling
 * a preferred method for representation of these models in OWL, ie. automated and systematic translation

It is necessary that in order to create correct models, a proven technique is used. It is necessary that in order to avoid the errors and idosyncracies that arise from hand coding, and to efficiently achieve standardised, auditable, repeatable, reversible, repairable ontologies, a systematic automated approach is necessary.