You are here

Semantic Data Modeling

Intro

Johan ter Bekke developed a new data modeling approach based on semantic principles resulting in inherently specified data structures. A singular word or data item hardly can convey meaning to humans, but in combination with the context a word gets more meaning. In a database environment the context of data items is mainly defined by structure: a data item or object can have some properties ("horizontal structure"), but can also have relationships ("vertical structure") with other objects. In the relational approach vertical structure is defined by explicit referential constraints, but in the semantic approach structure is defined in an inherent way: a property itself may coincide with a reference to another object. This has important consequences for the semantic data manipulation language.

Semantic Data Modeling Principles

The objective of data modeling is to design a data structure for a database fitting as good as possible with some relevant world, often related to an organization with some information need. In general there is some relationship between a data model and a part of the existing world, but it is also possible that a data model has a relationship with some imaginary and abstract world. According to the well-known seminal work of Smith and Smith (1977), three abstractions are very important for data modeling:

Classification is used to model instance_of relations, aggregation to model has_a relations and generalization to model is_a relations. In semantic data modelling all three abstractions lead to a type definition (which can be base or composite). The semantic data model is, as contrasted with many other data models, based on only one fundamental notion: the type concept.

It is interesting to know that also the Xplain meta data model requires all three abstractions. Many other data modelling techniques (such as the relational model) do not support these three abstractions and therefore are limited in modelling capabilities.

A semantic data model can be represented grafically in an Abstraction Hierarchy diagram showing the types (as boxes) and their inter-relations (as lines). See warehouse example. It is hierarchical in the sense that the types which reference other types are always listed above the referenced type. This simple notation principle makes the diagrams very easy to read and understand, even for non-data modellers.

Data Integrity

Data model specifications imply validity of certain integrity rules. Two inherent integrity rules are recognized for type definitions in a semantic data model:

  • relatability: Each attribute in a type definition is related to one and only one equally named type, while each type may correspond with various attributes in other types.
  • convertibility: Each type definition is unique: there are no type definitions carrying the same name or the same collection of attributes.

It is important to realize that these two integrity rules require neither separate specification nor declaration by procedures - they are inherent in the type definitions in a semantic data model.

Additional Data Restrictions

In addition to restrictions inherent from the data model there is often a need for additional more complex restrictions on data states that cannot be specified in a data model itself. These additional restrictions can be specified as

Declarative Data Derivations

The assert command - as explained in static restriction - can be used to specify derivable attributes. This is extremely useful for complex data derivations, such as needed for user applications (e.g. total order amount), reports (e.g. grouping per x, Year-To-Date and totals) and data analysis (top 10 products).

An assert command derives only one attribute or a single variable at a time. For complex derivations one has to define multiple assertions building on each other. This principle ensures modularity (thus easy to understand, test and maintain) and re-usability (of intermediate derived data) for other queries.

Comments

Assertions are designed in order to support system maintained derived (attribute)values. They also may include a value restriction. For example, if an organization does not allow for more than 100 employees per department:

assert department its empnumber (0..100) =
count employee per department.

Since derived attributes can be used in the calculation of other derived attributes, assertions can be interdependent. The evaluation order of interdependent assertions can be derived automatically from their specification because all calculations must be specified in terms of predefined attributes (or single variables specified through the 'value' statement). This prevents the problems associated with the procedural specification of rules, where programmers have to think in terms of scenarios. In this way the semantic language offers a fundamental contribution to the specification of more intelligent active database systems.