You are here

Generalization

Generalization and Specialization

In addition to aggregation as a building principle for data modelling, generalization is another building stone. An example is that a student administration can be interested in different kinds of students: students with or without a job. We could model this as follows:

type student = name, address, town, birth_date, faculty, employer.

The problem with this solution is that in the case of nil employer the attribute “student its employer” is not relevant at all. If this model is implemented the people of the student administration department cannot be sure about the relevance of the attribute “student its employer”. If they have to insert new instances of “student” the data model as such does not tell them what to do. They can make two kinds of errors: they can ignore the field for employer even if it still is relevant, or they can refer to some employer if it is not relevant. Therefore it is better to design a data model where the type “student” has all the properties that are relevant to all students, which implies that there is not an attribute “student its employer”. We call this generalization: the cross section of properties of “student” and “working student” defines the type “student” and the type “working student” is a specialization of “student”: it has the additional property “working student its employer”:

type student = name, address, town, birth_date, faculty.
type working student = [student], employer.

The square brackets specify that for each instance of “student” there is at most one instance of “working student” and there is one instance of “student” referred to by an instance of “working student”. Using generalization/specialization it is clear to users which attribute fields are mandatory for registration. Now there is no doubt about the relevance of attribute fields in an application: each attribute field must be filled correctly and NULL-values are not allowed! The absence of NULL-values is one of the requirements for applying a semantic datamanipulation language. A more thorough discussion of data modeling and design examples can be found in the articles and books of Johan ter Bekke.

See also Classification and Aggregation