M E M O Nr. 132 ER Modelling from First Relational Principles Ernst-Erich Doberkat Eugenio G. Omodeo Februar 2003 Internes Memorandum des Lehrstuhls für Software-Technologie Prof. Dr. Ernst-Erich Doberkat Fachbereich Informatik Universität Dortmund Baroper Straße 301 D-44227 Dortmund ISSN 0933-7725 ER Modelling from First Relational Principles Ernst-Erich Doberkat1 Chair for Software-Technology Universita¨t Dortmund Dortmund, Germany doberkat@acm.org Eugenio G. Omodeo Dipartimento di Informatica Universita` degli Studi di L’Aquila L’Aquila, Italy omodeo@di.univaq.it February 7, 2003 1Corresponding author. Postal address: Universita¨t Dortmund, D-44221 Dortmund, Tel. +49-231- 755.2780, Fax +49-231-755.2061 Abstract Entity-Relationship modelling is a popular technique for data modelling. Despite its popu- larity and wide spread use, it lacks a firm semantic foundation. We propose a translation of an ER-model into relation algebra, suggesting that this kind of algebra does provide suitable mechanisms for establishing a formal semantics of Entity-Relationship modelling. The work reported on here deals first with the techniques necessary for the translation, thus construct- ing a static view of an ER-model in an abstract setting of what might be called logic without variables. We then undertake a detailed analysis of the insertion and deletion operations for an ER-model represented in terms of the relation calculus. Keywords: Entity-Relationship models (static and dynamic view), relation algebra, re- lational foundations of data modelling. Page 1 ER Modeling from First Relational Principles 1 Introduction: Goals of the present study The structure of complex data may be specified through an Entity-Relationship model. This technique for modelling is rather popular, in part because it permits the visualization of the relationship between data, making the structure of their interrelationship easily grasped. The technique is based on Codd’s seminal paper [5], it is available in many variants, and it is one of the ancestors of the popular design method UML, see e.g. [16]. Despite its popularity, this approach to data modelling lacks a firm formal foundation: it appeals rather to the intuition of a modeler, neglecting the necessity of formally arguing about an ER-model. There are some proposals for a formal semantics of various versions of the model (section 2 provides a brief overview) which are mainly based on algebraic formalisms like algebraic specifications. A semantics based purely on a calculus coming from logic has not been investigated yet to the authors’ knowledge. 1.1 Motivation The goal of the present study is a systematic investigation into the semantic foundations of ER modelling using relational algebra. Using relational algebra (called map algebra in [3]) as a formalization of set theory without variables [20] is intuitively appealing: The naive semantics given to an ER-model is essentially set based — entities are modelled through sets, relations through sets of n-ary tuples, and attributes as functions. It is this naive approach which is formalized here. It will investigate which tools for a relation calculus are necessary for an adequate representation of ER models. Conversely, it provides in this way an assessment for the power of relation algebra based methods for modelling concepts that are in practical use. This means that we have to find a sufficiently rich formal model which has the essential properties of ER models in practical use. It means also that we ought to find mathematical assumptions with which an adequate modelling is feasible (e.g., we will find it convenient from this point of view to represent entities and relations equally as binary relations — which of course does not imply that we suggest treating entities and relations on an equal footing when it comes to modelling data specific for some application). These questions imply our pursuing two goals: 1. The representation of ER modelling from first relational principles, comparable to con- structing a computing system from bits and bytes (one does not do this in practice, of course, either, but the investigations into these concepts yields a firm and sound basis for any kind of practical construction). 2. The actual proof that these relational methods are sufficiently powerful for representing adequately the conceptual foundations for a practically oriented method. The proof for this is given by performing the actual construction, and by providing the actually needed tools and devices. Both goals complement each other, their realization requires delving deeply into formal details, as will be experienced shortly. They require the translation of an ER model into the target language of a relational calculus. Our reference on relation algebra is [9] (the most fundamental reference of all is [20]), and we briefly report on it in Sect. 4. Let us recall here that this calculus, grown out of Boole’s “algebra of thought” just a few decades after the latter had been conceived (cf. [18, 19]), is an E.-E. Doberkat & E. G. Omodeo February 7, 2003 Page 2 ER Modeling from First Relational Principles enriched form of it. Today it looks like a simplified version of Codd’s relational algebra which also happens to be a denominator common to a number of taxonomic languages proposed for knowledge representation by several authors. Relation algebra can hence be viewed both as an assembly language for knowledge representation and as a common ancestor of the many systems of algebraic logic available today. It has also been used for decomposing relations in a database according to functional dependencies in [13], but these methods have not yet be utilized for a systematic investigation of the dynamic behavior of a data base. We separate the static structure (the topology) of the ER model from its dynamic counter- part, and we show first how to model the static view using relation algebra. This is obviously not enough, because the dynamic nature of an ER model cannot be described using the static structure alone. Let us have a brief look at abstract data types for just conveying the flavor of our arguments. 1.2 The ADT view An abstract data type (ADT) encapsulates data and the operations (usually called methods) on it. This notion of an ADT is fundamental in object oriented software construction, classes may be considered as special cases of ADTs. This notion is fundamental because it supports data abstraction and permits keeping data and their operation in one physically well defined place. ADTs serve as templates, they are instantiated, and the instances of an ADT are the living capsules data and operations are kept in. The state of an instance is just the collection of specific values the data of this instance are having. The approach Design by Contract, so forcefully advocated by Bertrand Meyer [14], and realized in his language Eiffel, goes one step beyond, associating with each ADT specific properties called invariants. Operations on an (instance of an) ADT have to respect these invariants in the sense that each operation that starts on an instance which satisfies the invariant leaves the instance in a state which also satisfies it. Each method m of an ADT is associated with a precondition prem and a postcondition postm indicating a contract: entering m such that prem is satisfied guarantees leaving m with postm satisfied. In Hoare’s notation of predicate transforms, {inv ∧ prem} m {inv ∧ postm}. Actually, Design by Contract entails more, because inheritance comes into the game through rather involved co- and contravariant rules relating methods from subclasses to super classes, but this will not concern us here. Call an ADT proper iff it has invariants and pre- as well as postconditions, and if the Design by Contract rules are imposed on its methods. An ER model M may be considered as an ADT. The data to be stored in an instance are composed of the data stored in the entities, relations and attributes, and the invariant is provided by the conditions imposed on the model’s validity (see Definition 5). We should look for three families of operations: • initializing an instance of M, • inserting elements into entities and relations, • deleting elements from entities and relations. E.-E. Doberkat & E. G. Omodeo February 7, 2003 Page 3 ER Modeling from First Relational Principles Note that we do not talk about operations but rather about families of them; this is so since an operation like inserting an element into a relation R usually entails other operations (like inserting elements into the domain, and into the codomain of R); there may be more subtle dependencies as well, as we will see. The invariant to be maintained by these operations constitute the validity of the model; this means that the model before and after one of these families of operations has to conform to the model’s declaration. The post conditions are in every case empty, because the operations are all geared towards maintaining the ADT’s invariant. Then M is shown to form in fact a proper ADT. 1.3 Overview What needs to be done then is to formulate the invariant and the precondition using the language we have chosen for our formalization. After we discuss the version of ER modelling we want to work with in Section 3, we introduce relation algebra in Section 4, there we will also provide some abbreviations that are helpful for the discussions to follow. Section 8 deals with a formulation of the preconditions for insertions. For reasons of managing the complexity, this is reduced first into the bare bones version of an ER model which does not entertain attributes, leading to the notion of a weakly valid ER model. It is shown under which conditions weak validity is maintained. Attributes are added to the discussions at that point, introducing the notion of a valid model, and strengthening the preconditions towards keeping validity invariant. A very similar procedere is observed when discussing deletions in Section 9, which quite surprisingly turns out to be easier to handle than insertions. This is mainly due to the fact that most of the interesting properties are downward closed: if a relational expression W observes it, then all relational expressions V ⊆ W do, too. Section 11 proposes further investigations along the lines suggested here, discussing for example how model checking as a technique to ascertain properties of an ER model could be incorporated. Preliminary versions of this work could be presented at the RelMis 2001 workshop at ETAPS 2001 Genova [15], and at the RelMics’6-TARSKI workshop at the University of Tilburg [8]. Acknowledgements. This research was in part supported through grants from the Ex- change Programme for Scientists between Italy and Germany from the Italian Ministry of Foreign Affairs & Deutscher Akademischer Austauschdienst, from which both authors were funded at different times during Fall 2000 and Summer 2001. The first author’s work was ad- ditionally partially funded through a grant from Progetto speciale I.N.D.A.M./GNIM “Nuovi paradigmi di calcolo: Linguaggi e Modelli” while visiting what is now the Dipartimento di Informatica at the University of L’Aquila. The second author was partly funded by the MURST/MIUR 40% project Aggregate- and number-reasoning for computing: from deci- sion algorithms to constraint programming with multisets, sets, and maps. This work also benefited from the collaboration fostered by the European COST Action 274 (TARSKI, cf. http://www.tarski.org). The referees’ comments and suggestions were very helpful and are appreciated. E.-E. Doberkat & E. G. Omodeo February 7, 2003 Page 4 ER Modeling from First Relational Principles 2 Related Work The present study stands in line with other approaches to provide a firmly based semantics for ER-models. They are mainly based on algebraic modelling techniques and capitalize on the semantic framework that come with them. The work being done on query algebras for data base models or on the semantics of UML class diagrams pursues different goals, hence is not mentioned here. Hettler [11] gives a translation of these models into the specification language SPEC- TRUM, essentially modelling entities as records with attributes as entries, but not taking inheritance into account. The report [4] transforms an ER-diagram into an attributed graph signature, and the integrity constraints into first-order logic formulas. The ER-models con- sidered do not take inheritance explicitly into account. The paper focusses on the dynamic aspects — viz., transactions – through homomorphisms, hence showing how transactions may be caught through an algebraic framework. The formal semantics of an extended ER-model is investigated in [10, 12] from a database point of view, proposing the semantics of a database signature as the set of all interpretations; this work does not mention algebraic specifications explicitly. In [6] it is shown how to generate an algebraic specification from an ER-model, hereby carrying the model based semantics of such a specification over to ER-models. [7] generalizes this approach somewhat by proposing the colimit of a diagram extracted from an ER-model as the categorial semantics of the model. Relational algebra has been used for decomposing relations in a database according to functional dependencies in [13]; these methods, however, have not yet be utilized for a sys- tematic investigation of the dynamic behavior of a data base. 3 Entity Relationship Models We assume the reader to be familiar with Entity Relationship modelling [5] as a popular and wide spread technique for data modelling. Many variants have been discussed (Thalheim’s encyclopedic book [21] provides an overview). We will fix now for the rest of the paper the model which will stand in the center of our attention. 3.1 The variant to be considered We will restrict ourselves to a rather basic variant in which • All relations are binary, and the only cardinality restriction that may be imposed on a relation is that it is left- or right- unique. • Inheritance is restricted to single inheritance. • Relations are assumed to be total. In fact, in the presence of inheritance non-total relations may be transformed into total ones by introducing additional entities for the domain, and for the range, respectively, as will be demonstrated in Sect. 6.1. • Attributes are defined on entities only. This version of ER modelling is a bit more restrictive than the one investigated in [6]. The restrictions can be removed or refined at the cost of a more complicated technical development. E.-E. Doberkat & E. G. Omodeo February 7, 2003 Page 5 ER Modeling from First Relational Principles We feel, however, that the methods we develop here provide a way of modelling these more complicated situations. ER modelling may be embedded into the UML when modelling classes before methods are incorporated, thus capturing the static aspects of a class hierarchy. In terms of the UML, we restrict ourselves here to representing these classes when only simple generalization is permitted, associations have to be binary and are without attributes, and multiplicity restrictions may only express left- or right-uniqueness. For practical use, these are of course serious restrictions, for our fundamental investigation, however, they are not, since they permit us to clarify the basic mechanisms. An added layer of technical development will certainly show how to technically cope with additional properties along the lines we are developing here. 3.2 The process model We are given an instance M of an ER model which is valid, so all constraints formulated in the declaration of the model are satisfied. We want to investigate change, namely we want to investigate conditions under which insertions and deletions into M lead to a valid model again. In order to investigate this for insertions, we assume that we have complete information about the items to be inserted. Thus, if E is an entity, we know the items δ+E to be inserted into E, yielding E ∪ δ+E as the new version of this entity. Similarly, we know for relations R the tuples δ+R to be inserted, and we know for attributes α the changes in δ+α. What we want to know is, under which conditions for E, δ+E,R, δ+R and α, δ+α the invariance of validity of the instance is maintained. The question arises mutatis mutandis for deletions. Note that the assumption that the change sets δ+ and δ− are given does not address the problem of constructing them. When insertion is done interactively, and is not done with care, situations may arise when an infinite sequence of insertions may be necessary; this can be demonstrated through easily found examples. We bypass these complications by postulating that complete information is available from the outset. 4 Relational Calculus Very much like any logical formalism, relation calculus consists of a symbolic language, an intended semantics, a collection of logical axiom schemes (which, according to the intended semantics, are valid, i.e. true in any legal interpretation), and a collection of inference rules. Making use of a formalism means describing a universe U of discourse by means of proper axioms stated in the symbolic language. More axioms means fewer interpretations, and therefore a larger collection of derivable consequences; as an extreme case, in absence of proper axioms, only valid sentences —which are a bit too vacuous to be really useful— are derivable. Relation calculus was designed to ease reasoning about dyadic relations — maps, as we will call them— over an unspecified, yet fixed, universe U . Its language is entirely equational, and ground (i.e., devoid of individual variables); we will therefore concentrate mainly on syntax and intended semantics, and summarize the logical axioms and inference rules in Def. 1. Relation algebras formalize axiomatically the usual operations on binary relations (like composition or forming the converse), so that binary relations appear as one of several models that are possible for these algebras. We will provide a very brief introduction to these algebras, and we will fix some notations for the reader’s convenience. E.-E. Doberkat & E. G. Omodeo February 7, 2003 Page 6 ER Modeling from First Relational Principles 4.1 Relation Algebra A relation algebra is defined as a Boolean algebra with additional properties that are imposed because a composition relation is available. The version of relation algebras we want to use is defined below; for variants and further developments the reader is encouraged to consult [3, Ch. 2] or [2, Ch. 1]. Definition 1 〈Ø, 1l,∩, · , ι, ;, ·^〉 is called a relation algebra iff 1. 〈Ø, 1l,∩, · 〉 is a Boolean algebra with smallest element Ø, largest element 1l, intersection (meet) ∩, and complementation · ; the associated order relation and union (join), and difference, are denoted by ⊆, ∪, and \, resp. 2. ; is a binary associative operation on the Boolean algebra with ι as the left- and right- neutral element, 3. ·^ is a unary idempotent operation on the Boolean algebra, 4. the following properties hold: (a) (P ;Q)^ = Q^;P ^, (b) (P ∩ Q)^ = P ^ ∩ Q^, (c) P ; (Q1 ∩Q2) ⊆ P ;Q1 ∩ P ;Q2 (∩-subdistributivity), (d) P ; (Q1 ∪Q2) = P ;Q1 ∪ P ;Q2 (∪-distributivity), 5. P ⊆ Q implies P ;R ⊆ Q;R, 6. (P ;Q) ∩ R = Ø implies (P ^;R) ∩ Q = Ø (Schro¨der’s Rule). Relation algebra consists of map equalities P = Q, where P and Q are map expressions: Definition 2 Map expressions are terms of the signature according to the table below, where we have added union ∪ as an associative operation, and difference \ (which will be treated as left-associative operators) for convenience: Symbol Ø 1l ι ri ∩ ; ·^ · ∪ \ Degree 0 0 0 0 2 2 1 1 2 2 Priority 5 6 7 2 2 2 Here ri is one of the countably many map letters which we assume to be available. Map letters are used to customizing relation algebra by attaching additional properties through additional axioms for the relation algebra, as we will see in the sequel. An interpretation I over a universe U maps each map expression to a subset of the Cartesian square U2 ≡Def U × U such that e.g. ØI = ∅ (P ∩ Q)I = P I ∩ QI , 1lI = U2 (P ;Q)I = P I;QI ιI = ∆ (Q^)I = ( QI )^ E.-E. Doberkat & E. G. Omodeo February 7, 2003 Page 7 ER Modeling from First Relational Principles Here ∆ is the diagonal {〈a, a〉| a ∈ U} of U , and the operations on the right-hand side are the familiar ones manipulating relations over sets. Hence e.g. ∪-distributivity translates into the set equality R;(S1 ∪S2) = R;S1 ∪R;S2 which is familiar for the relations R ⊆ A×B and S1, S2 ⊆ B × C for sets A,B and C. Adding new axioms through fixing properties of map letters has the effect of restricting interpretations: they have to satisfy the additional properties for the interpretation of the map letters, which in turn also have to be provided. For conciseness, we use some abbreviations which are listed in the table below. Notation Expression Note Coll(R) R ⊆ ι RI represents a collection Total(R) R;1l = 1l RI is a (left−) total relation dom(R) R;1l ∩ ι domain of R img(R) 1l;R ∩ ι range/image of R RUniq(R) Coll(R^;R) RI is a partial map LUniq(R) RUniq(R^) (R^)I is a partial map NonVoid(R) Total(1l;R) RI 6= ∅ Snglt(R) NonVoid(R) & RUniq(1l;R) & RUniq(R^) RI is a singleton DomSub(R,S) R;1l ⊆ S;1l domain containment ImgSub(R,S) 1l;R ⊆ 1l;S range containment Const(R) Snglt(R) & Coll(R) RI represents {〈a, a〉} ×n+1j=1 Rj (×nj=1Rj) ;Rn+1 For example, Coll(E) says that EI is supposed to consist of pairs of the form 〈a, a〉 for some a ∈ U , Total(E) indicates that EI is (left-) total, hence that for each a ∈ U there is some b ∈ U with 〈a, b〉 ∈ EI . The reader is invited to formulate these expressions in terms of set-theoretic relations. Some identities and inequalities will be particularly helpful in the sequel; we collect them here for easier reference, and point the reader to [20] and to [17]. Lemma 1 Let P,Q and R be map expressions, then 1. RUniq(R) ⇔ R;(P ∩Q) = R;P ∩ R;Q, 2. (P ∪ Q)^ = P ^ ∪ Q^, 3. (P ;Q) \ (R;Q) ⊆ (P \ R);Q 4. P,Q ⊆ ι, then (a) P ^;Q = Ø, provided P ∩Q = Ø, (b) (P \ Q);1l = (P ;1l) \ (Q;1l), 5. Q;R ⊆ S ⇔ Q^;S ⊆ R ⇔ S;R^ ⊆ Q (Schro¨der’s Cycle Rules), 6. P ^ = ( P )^ .  Map Letters We assume that we have a countably infinite provision of map letters r1, r2, . . . at our disposal of which we reserve the first T for system purposes. E.-E. Doberkat & E. G. Omodeo February 7, 2003 Page 8 ER Modeling from First Relational Principles 4.2 Projections and Flat Tuples Tuples of length > 2 forcibly enter into the study of database systems, if only because the operation of inserting an entity e into a database often causes the simultaneous assignment of a tuple of values to the attributes of e. To cope with this while avoiding the complications that would result from the treatment of relations of arity greater than 2, we want the universe U of discourse to include A∗—viz., the set of all finite-length sequences whose components are entities drawn from A and are in some sense “atomic”. We will assume for simplicity that U = A∪ A∗ and A∩ A∗ = ∅, and, to avoid triviality, that A 6= ∅. We indicate by  the null tuple in A∗, and put A+ =Den A∗ \ {}. Two operations on non-null tuples t0t1 · · · tm are essential, namely head isolation —which determines the first component t0 of any given such tuple— and tail extraction—which deter- mines the sub-tuple t1 · · · tm resulting from removal of the first component. Let us designate these operations by λ and %, resp. Moreover, let us represent by υ and ε the collection A of atomic entities and the null tuple: more precisely, υ= = {〈a, b〉 ∈ A2 |a = b} and ε= = {〈, 〉} in our intended interpretation =. We can state that • ε designates a singleton and diagonal map, by the condition Const(ε); • υ represents a non-empty collection A of entities distinct from the entity, , represented by ε, by means of the conditions Coll(υ) & NonVoid(υ) & ε ∩ υ = Ø, • λ, % designate functions λ= and %= whose common domain A+ is the complement of A∪ {} in U , by means of the conditions RUniq(λ) & RUniq(%) & λ ; 1l = % ; 1l = (υ ∪ ε) ; 1l • %(p) belongs to A∗ for all p, by means of the condition υ ∩1l ; %=Ø; • for all a in A and all q in A∗ there is a tuple p with λ(p) = a and %(p) = q, by means of the condition υ ; 1l ; υ = λ^ ; % • the function p 7→ 〈λ(p), %(p)〉 is injective, by the condition Coll ( λ ; λ^ ∩ % ; %^ ) . To sum all these conditions up, let us introduce the notation HeadTail(L,R, Y,E) ≡Def RUniq(L) & RUniq(R) & Const(E) & Coll(Y ) & NonVoid(Y ) & E ∩ Y = Ø & Y ∩ 1l ;R = Ø & L ; 1l = R ; 1l = (E ∪Y ) ; 1l & Y ; 1l ; Y = L^ ; R & Coll ( L ;L^ ∩ R ;R^ ) , so that we can concisely state everything by the single requirement HeadTail(λ, %, υ, ε). E.-E. Doberkat & E. G. Omodeo February 7, 2003 Page 9 ER Modeling from First Relational Principles After noticing that (×i−1j=1 % ) ; λ designates the operation of extracting the i-th component of a tuple, let us also provide a handy characterization of h-tuples: ith(L,R) ≡Def ×i−1j=1R ; L h − tuples(R) ≡Def img(R) ∩ dom(hth(R,R) ) \ dom( (h + 1)th(R,R) ). (The notation ith and h− tuples is a little sloppy but rather expressive, so that we trust the reader will appreciate it over a more formal one). Thus, under assumption that HeadTail(λ, %, υ, ε) holds, the map expression h − tuples(%) represents the collection of all tuples t1 · · · th in A∗ for any natural number h. Notice that for the sake of completeness we should also include in our specification of λ, %, υ, ε an induction principle reflecting our assumption that U is the smallest superset of A which is closed w.r.t. tuple formation. One way of stating induction is by means of an equality scheme ensuring that P = 1l follows from ( (ε ∪ υ);1l ∪ (λ;P ∩ %;P ) ) \ P = Ø for all P , namely: 1l; ( (ε ∪ υ);1l ∪ λ;P ∩ %;P \ P ) ;1l ∪ P = 1l (cf. [9, Sect. 6], and also [1, p. 175, Law S8]). 5 Preparations Now let an ER model M be given. All information concerning M can be found in a declaration which represents the static information about the model, and which permits stating the validity of an instantiation for M. Separating concerns, we concentrate for the time being on entities and relations, attributes will be added later on, as the technical development evolves. 5.1 Graphical representation ER models are usually presented through graphical representations where entities are shown as boxes, and relations as diamonds. Every diamond represents a dyadic relation — a map in our terminology — and that no diamond bears any attributes. We need to regard one parameter of a relation as first, and the other one as second, which we do according to the numbering stipulation detailed in Fig. 1 Uniqueness of names is assumed to its fullest extent; viz., we are forbidding synonymy not only between (entity-)boxes, between diamonds and between ovals, but also between a box and a diamond etc. E.-E. Doberkat & E. G. Omodeo February 7, 2003 Page 10 ER Modeling from First Relational Principles 1 2 3 4 Figure 1: Numbering counter clockwise Digression on a little anomaly that may be caused by IsA An IsA-edge connecting box E to box F indicates that the set E of entities designated by E is included at all times in the one, F, designated by F ; the IsA-relation is discussed in greater detail in Sect. 5.4. Unless this inclusion could be satisfied, at least occasionally, as a strict inclusion, distinguish- ing between E and F would hardly make any sense, and the ER-model would be somehow redundant. Notice, however, that the following situation implies that E and F designate the same entity: IsA E F 1 1 (Indeed, F = E ensues from |E| = |F| and E ⊆ F when F is known to be finite). Generalizations of this situation are easily found, e.g. IsA E F 1 G 1 1 1 (where F = E ensues from |F| ≤ |G| ≤ |E| and E ⊆ F in view of the finiteness of F—note that G = E is not entailed, though!). Admitting slightly defective ER-models of the above kind would cause some marginal conceptual difficulties in the part of this research which regards dynamics which will be discussed in Sections 8 and 9. However, since we have the reasonable expectation that any similar anomaly can be revealed by a simple semantic check, so as to be then corrected, we assume that this check is available to us and we do not digress any further on this issue. E.-E. Doberkat & E. G. Omodeo February 7, 2003 Page 11 ER Modeling from First Relational Principles 5.2 The Basic Model If E is the domain of relation R with F as its co-domain, then we will assume that E and F are tight in the sense of the following definition: Definition 3 We call E the tight domain of R in M if M requires that for each e ∈ E there exists an element f in the codomain of R such that 〈e, f〉 ∈ R. In diagrams this is indicated through a dot. E R The definition of tight codomain is symmetric. Introducing another shorthand notation, E •— R —•F indicates that both E and F are tight. In what follows, entities and relations will be considered an element of a fixed (but anony- mous) relation algebra. An entity E is then represented through Coll(E), hence consists of pairs the first and the second component of which agree. This representation permits us to represent entities and relations in a uniform manner, thus allowing us to work in a uniform framework. This is of course a technical assumption which is not mirrored in the conceptual use of ER-modelling; we will have to pay a price for it in our exposition, since we now have always to distinguish “proper” relations from those that represent entities. Tightness, as expressed through E •— R —•F translates to DotDot(E,R, F ) ≡Def Coll(E) & Coll(F ) & DomSub(E,R) & ImgSub(F,R). Either relation •— or —• may be strengthened to • 1— and 1—•, resp., indicating uniqueness. Thus E • 1— R means in additional to E •— R that 〈x, y〉 ∈ R ∧ 〈x′, y〉 ∈ R ⇒ x = x′ holds, which may be translated conveniently into LUniq(R). Similarly, R 1—•F, which means 〈x, y〉 ∈ R ∧ 〈x, y′〉 ∈ R ⇒ y = y′ is translated into RUniq(R). Note that either of these conditions depends only on the relation, not on the domain or the codomain. The different way a relation relates to its domain and its codomain may be captured through the suitable combination of macros which are comprehensively listed in the table below. Situation Characterization E • 1— R 1—•F DotDot(E,R, F ) & LUniq(R) & RUniq(R) E • 1— R —•F DotDot(E,R, F ) & LUniq(R) E •— R 1—•F DotDot(E,R, F ) & RUniq(R) E.-E. Doberkat & E. G. Omodeo February 7, 2003 Page 12 ER Modeling from First Relational Principles 5.3 Adding Place Holders It may sometimes happen that information is incomplete: an element x is inserted into entity E, and E •— R —•F holds, but there is no y in F so that 〈x, y〉 is to be inserted into R. This then would violate the condition E;1l ⊆ R;1l. There may even occur some unpleasant situations when place holders are not admitted. Consider Fig. 2, where E and F are assumed to be different entities. Insert one into E, then 〈one, two〉 into R; then two must be new to F . Insert it into F , then it will be inserted into F ′ which requires the insertion of a pair 〈three, two〉 into S; three must be new to E. In this way a loop is created which will not terminate. FE R 1 F‘E S 1 1 IsA= Figure 2: Possible Circularity For enabling insertions also under somewhat problematic conditions, we postulate the existence of place holders which are collected in a relation P , so that in the situation considered 〈x, ∗〉 with ∗ ∈ P would be inserted into R. We assume that Coll(P ) holds, and that the entities are free of place holders, thus E ∩P = ∅ is true for each entity E (note that this implies both 1l;E ∩ 1l;P = ∅ and E;1l ∩ P ;1l = ∅ by Lemma 1). Let Entity(P,E) ≡Def Coll(E) & E ∩ P = Ø denote that E is an entity. Constraints on place holders We need some constraints on the use of place holders that reflect their adequate use. 1. No placeholder occurs twice as the first or the second component of a pair in a relation R. Put NoTwice(P,R) ≡Def P ∩ (R ∩ R;ι);1l = Ø, then NoTwice(P,R) & NoTwice(P,R^) should hold, 2. No placeholder occurs in two different relations R,S as the first components of a pair, which is formulated as NoBoth(P,R, S) ≡Def P ∩ R;1l ∩ S;1l = Ø. E.-E. Doberkat & E. G. Omodeo February 7, 2003 Page 13 ER Modeling from First Relational Principles 3. No placeholder occurs both as the first component in relation R and as the second component in relation S, hence NoFirstSecond(P,R, S) ≡Def NoBoth(P,R, S^). 4. No pair in a relation has place holders on both sides, thus NoSamePair(P,R) ≡Def R ∩ P ;1l ∩ 1l;P = Ø. 5. The situation 〈∗, y〉 and 〈x, y〉 with x 6= ∗ (and, for symmetry, in the second component) does not occur; this is captured through NoDoubleFirst(P,R) ≡Def R;R^ ∩ P ;1l ∩ 1l;P = Ø and NoDoubleSecond(P,R) ≡Def R^;R ∩ 1l;P ∩ P ;1l = Ø Summing up: If {R1, . . . , Rk} are the identifiers for all the relations in play, the conjunction PlaceHolder(P, {R1, . . . , Rk}) should hold, where PlaceHolder(P, {R1, . . . , Rk}) ≡Def &ki=1NoTwice(P,Ri) & NoTwice(P,R^i ) & &ki=1&kj=i+1NoBoth(P,Ri, Rj) & &ki=1&kj=i+1NoFirstSecond(P,Ri, Rj) & &ki=1NoSamePair(P,Ri) & &ki=1NoDoubleFirst(P,Ri) & &ki=1NoDoubleSecond(P,Ri) We reserve the map letter pi for place holders. 5.4 Inheritance Immediate inheritance between entities is given through the IsA- relation. Hence E IsA F translates into Inherits(P,E, F ) ≡Def Entity(P,E) & Entity(P, F ) & E ⊆ F. This is in accordance with the familiar view of generalization as inclusion. There are some restrictions to be observed concerning the IsA-relation, mainly acyclicity and single inheritance. This will be discussed now. E.-E. Doberkat & E. G. Omodeo February 7, 2003 Page 14 ER Modeling from First Relational Principles The IsA-graph is assumed to be acyclic; i.e., IsA^ ∩×ni=1IsA = Ø must hold for all n ≥ 0. Moreover, we assume that at most one IsA-edge exits from each node: IsA^ ; IsA ⊆ ι. The latter assumption reflects two facts: on the one hand, we are taking into account single inheritance only; on the other hand, we are forbidding shortcuts in inheritance chains, by requiring that ∀n ≥ 0 : IsA ∩×n+2i=1 IsA = Ø. Since an acyclic function defined on a finite set cannot be total, the IsA-graph will have nodes of out degree 0. Such “sink” nodes —which represent objects of maximal genericity— are usually called roots in the object-oriented approach. Clearly, there would be no loss of generality in assuming that there is exactly one root. 6 Renderings of ER-models in relation algebra This section will show how to translate an ER model M into a set of map equalities. This happens in two steps: first M is normalized by making all relations left total as well as right total, the second step iterates over the model and forms the equalities, which in turn are formulated through the macros formulated above. 6.1 Normalization Here is the preprocessing algorithm that converts any given ER-model (devoid of relation- attributes) into one that better fits our translation purposes: Phase 0: Detect all pairs E,F with E 6≡ F such that, due to the IsA-anomaly, E and F must designate the same class of entities. For every such pair, coalesce E with F . Phase 1 (Factorization): Make all relations both left- and right-total, by repeated use of the rewriting rules in Fig. 3 (where s, d ∈ {1, ∗}, and E ′, F ′ are fresh names). After this normalization we may (and do) assume that both domain and codomain are tight. This technical assumption helps avoiding the discussion of cumbersome special cases. Fig. 3 is slightly redundant for reasons of symmetry. 6.2 Translation Here is one way of translating into a set of map equalities an ER-model M resulting from the normalization process presented above: A. Introduce mutually distinct symbols pi, λ, %, ε which also are distinct from υ and from any identifier in M. Postulate that HeadTail(λ, %, υ, ε) & PlaceHolder(pi,R) E.-E. Doberkat & E. G. Omodeo February 7, 2003 Page 15 ER Modeling from First Relational Principles FE R FE R FE R FE R F‘ IsA s s s sd d d d FE R F‘ IsA s d E‘ IsA FE R s d Figure 3: Rewriting rules for normalization (cf. Sections 4.2 and 5.3), where R is the set of all attribute-, and relation-identifiers in M. We are as above regarding the elements of R ∪ {pi, λ, %, υ, ε} —as well as the entity identifiers in M— as symbols drawn from the alphabet of map letters ri. This iden- tification with map letters induces a linear ordering between identifiers which we will tacitly assume. B. For every IsA-edge between E and F in M, impose that dom(E)⊆dom(F ). For every entity identifier F which has no issuing IsA-edge, impose that dom(F )⊆υ. C. For every entity identifier F , let A1, . . . , Ah (with h > 0) be all of the distinct attribute identifiers that refer to F in M. Impose that &hi=1 Attr(Ai, F, υ) & Key({A1, . . . , Ah}, λ, %, pi) holds, where the following abbreviating notation is adopted (see also Sect. 4.2): Attr(B,E, Y ) ≡Def RUniq(B) & dom(B) ⊆ E & img(B) ⊆ Y , Key({B1, . . . , B`}, L,R, P ) ≡Def RUniq ( ⋂` j=1 jth(λ, %) ; B^j ) & &`j=1img(Bj) ∩ P = Ø . Moreover, for each mandatory attribute Ai, require that Attr(Ai, F, υ) & F⊆ dom(Ai); and, for each key Ai1 , . . . , Aik (where k > 0 and the Aij are mandatory attributes drawn from among A1, . . . , Ah), require that Key({Ai1 , . . . , Aik}, λ, %, pi). D. For every part E.-E. Doberkat & E. G. Omodeo February 7, 2003 Page 16 ER Modeling from First Relational Principles FE R s d of M, impose that dom(E) = dom(R) \ pi & dom(F ) = img(R) \ pi . Moreover, if d ≡ 1, then impose that RUniq(R); and, if s ≡ 1, then impose that RUniq ( R^ ) . 7 Map Letters We assume that we have countably many map letters r1, r2, . . . at our disposal, of which we reserve the first T initially for system purposes. We have reserved already pi for place holders, λ and ρ for head isolation, and for tail extraction, and υ resp. ε for the collection A of atomic entities and the null tuple, see Sect. 4.2. Some additional reservations will have to be made now. 7.1 Layout: blocks The map letters with indices beyond T will be used for the ER model under consider- ation in the following way. rT+1, . . . , rT+S will be reserved for entities, the next block rT+S+1, . . . , rT+S+B of B map letters will be reserved for relations, and finally we will re- serve the next block of A map letters for attributes. In case of an insertion or a deletion, we reserve the next block of S map letters for the δ+ resp. δ−-values for entities, the next block of size B for those values for relations, and finally the next block A map letters for at- tributes. We continue the sequence with the results, according to the following scheme (with Σ := S+B+A): if entity E corresponds to map letter rT+i with δ+E corresponding to rT+Σ+i, then E ∪ δ+E will be deposited at rT+2·Σ+i. In the same linear way — proceeding in a block wise fashion — we deposit the changed values for relations and attributes. The arrangement of map letters is indicated in Fig. 4, where corresponding map letters are represented as lying on a vertical line. 7.2 Keeping track We keep a record of the respective relations between entities and relations through a map Track : {T + S + 1, . . . , T + S + B} → {T + 1, . . . , T + S} × {T + 1, . . . , T + S} upon setting Track(t) = 〈i, j〉 ⇔ ri •— rt —• rj. Define for the relational index t ∈ {T + S + 1, . . . , T + S + B} that t ∈ LeftOne ⇔ ∃i, j : ri • 1— rt —• rj E.-E. Doberkat & E. G. Omodeo February 7, 2003 Page 17 ER Modeling from First Relational Principles System Entities Relations Attributes δEntities δRelations δAttributes Entitiesnew Relationsnew Attributesnew T+1 T+S T+S+B T+S+B+A T+Σ+1 T+2Σ+1 T+Σ+ S T+2Σ+ S ... ... ... ... Figure 4: Arrangement of Map Letters. The map letters 1 . . . T intended for system pur- poses are set graphically apart from the recurring blocks for entities, relations and attributes. Corresponding map letters are drawn below each other for greater clarity. and t ∈ RightOne ⇔ ∃i, j : ri •— rt 1—• rj . Through these sets we get access to left- and right-unique relations. Again, Track, LeftOne and RightOne can be shifted linearly along each Σ-block of indices. The reflexive and transitive closure IsA∗ of the inheritance relation is recorded through a reflexive and transitive relation Up on the set {T + 1, . . . , T + S}; note that this relation may be shifted linearly to the sets {T + k · Σ + 1, . . . , T + k · Σ + S}. The necessary properties of IsA∗ are described in 5.3. Attributes If entity E is represented by map letter ri with i ∈ {T + 1, . . . , T + S}, then Attributes(i) ⊆ {T + S + B + 1, . . . , T + S + B + A} is the set of map letters that are associated with E’s attributes. Clearly, {Attributes(i)| T + 1 ≤ i ≤ T + S} forms a partition of the set {T + S + B + 1, . . . , T + S + B + A}. The set Mandatory(i) ⊆ Attributes(i) contains the indices of all mandatory attributes (those attributes which are defined on all of E), and the set Key(i) ⊆ Mandatory(i) contains all indices of the key attributes. We assume having only one set of key attributes per entity. It would be easy to work with a varying number of sets of keys for each entity, but this would only complicate the notation without adding any new ideas. When we execute an insertion or a deletion, we change the contents assigned to the map letters by manipulating the extension of the corresponding data containers. Our block oriented scheme ensures that this process can be repeated without much ado by simply changing the base address where it all begins from T to T + 2 · Σ. E.-E. Doberkat & E. G. Omodeo February 7, 2003 Page 18 ER Modeling from First Relational Principles 8 Insertions: Validity This section formulates the validity of an ER model, first without taking attributes into account. This leads to the notion of weak validity. Conditions are formulated under which the weak validity of an ER model is preserved. Then we add attributes to our discussion, and the notion of validity is formulated. Again, conditions are given under which the attributes of the model arising from insertions satisfy the constraints, this time leading to the instance of a valid ER model. 8.1 Weak Validity An instance M of the ER model under consideration is weakly valid iff it satisfies all the constraints imposed on the entities and the relations laid down in the model’s declaration. This can be described now formally: Definition 4 The instance M is called weakly valid iff &T+S+1≤t≤T+S+BDotDot ( rpi1(Track(t)), rt, rpi2(Track(t)) ) & &t∈LeftOneLUniq(rt) & &t∈RightOneRUniq(rt) & &T+1≤i≤T+S Entity(pi, ri) & PlaceHolder(pi, {rT+S+1, . . . , rT+S+B}) & &〈i,j〉∈Up ri ⊆ rj Note that weak validity is formulated using a fixed base address T , which, however, has not been incorporated into the notation that is already cluttered enough. The definition above gives a formal description of the invariant we have to maintain when inserting a new element. It makes sure that each relation is tight, collects left unique resp. right unique relations, guarantees that the entities are at their proper place, relates placeholders and relations properly and checks that the IsA-relation is set properly. 8.2 Maintaining Weak Validity The insertions to be performed start from a weakly valid ER model and should of course maintain weak validity as an invariant; this issue has been hinted at in 1.2 already. We will need some preconditions. Before formulating them, however, we elaborate on the insertions proper. If E is an entity, and δ+E contains the insertions into E, then E ∪ δ+E will be formed, and this will be the new version of this entity. It is a bit more complicated with a relation R, since we cannot simply form R ∪ δ+R without running the risk of violating NoDoubleFirst(P,R ∪ δ+R) or NoDoubleSecond(P,R ∪ δ+R). Hence we have to clean up R by removing candidates for violations; they are easily seen to belong to (1l;pi ∩ δ+R;1l) ∪ (pi;1l ∩ 1l;δ+R). Thus we work with [ R, δ+R ] ≡Def R \ ( (1l;pi ∩ δ+R;1l) ∪ (pi;1l ∩ 1l;δ+R) ) instead of R and form [R, δ+R] ∪ δ+R as the new version of relation R. Occasionally we will replace the map letter pi by the free variable P ; the expression then will be denoted by [R, δ+R]P . E.-E. Doberkat & E. G. Omodeo February 7, 2003 Page 19 ER Modeling from First Relational Principles For describing under which conditions weak validity is maintained, we need to set the stage by providing some technical preparations. We first show under which conditions the property of domain resp. codomain containment after insertions remains intact. They will be put to use when invariance of weak validity (Prop. 1) and of validity (Prop. 2) under insertions are formulated. The following Lemma states under which conditions domain containment is maintained under insertions. It makes sure that among others domain containment should be true before the insertion, and that the inserted parts are related through domain containment, that the relation the domain of which is contained in the other one represents an entity, and that the same is true for the portion to be inserted; finally there is a technical condition that prevents overlapping parts. The Lemma states symmetrically a similar condition for codomain containment. Lemma 2 Let R be a relation, and assume Entity(P,E). Then these implications hold: 1. DomSub(E,R) Entity(P,E) DomSub(δ+E, δ+R) Entity(P, δ+E) (R ∩ 1l;P ∩ δ+R;1l) ;1l ∩ (E ∪ δ+E);1l = Ø DomSub(E ∪ δ+E, [R, δ+R]P ∪ δ+R) 2. ImgSub(F,R) Entity(P, F ) ImgSub(δ+F, δ+R) Entity(P, δ+F ) 1l; (R ∩ P ;1l ∩ 1l;δ+R) ∩ 1l;(F ∪ δ+F ) = Ø ImgSub(F ∪ δ+F, [R, δ+R]P ∪ δ+R) Proof: Clearly, (E ∪ δ+E);1l ⊆ (R ∪ δ+R);1l, and (R ∪ δ+R);1l = ([ R, δ+R ] P ∪ δ +R ) ;1l ∪ A, where A := ( R ∩ 1l;P ∩ δ+R;1l ) ;1l ∪ ( R ∩ P ;1l ∩ 1l;δ+R ) ;1l. Now (E ∪ δ+E);1l ∩A = (E ∪ δ+E);1l ∩ ( R ∩ P ;1l ∩ 1l;δ+R ) ;1l ⊆ (E ∪ δ+E);1l ∩ P ;1l = Ø. This establishes 1. In order to prove 2., 1l;(R ∪ δ+R) is decomposed similarly into 1l;( [ R, δ+R ] P ∪ δ +R) and a part that is shown to be disjoint from F ∪ δ+F .  E.-E. Doberkat & E. G. Omodeo February 7, 2003 Page 20 ER Modeling from First Relational Principles In a similar way, we can make sure that the new relation maintains its properties as a single-valued map. The condition states that the part which is inserted represents a map, and that there exists no interference between the old, and the new part that might destroy the properties of a map. This will be formulated and proved now, together with the corresponding property for the inverse of a map, which will also be used later. Lemma 3 Let R be a relation, then the following implications hold: 1. LUniq(R) LUniq(δ+R) [R, δ+R] ⊆ ι;δ+R δ+R ⊆ ι; [R, δ+R] LUniq([R, δ+R] ∪ δ+R) 2. RUniq(R) RUniq(δ+R) [R, δ+R] ⊆ δ+R;ι δ+R ⊆ [R, δ+R] ;ι RUniq([R, δ+R] ∪ δ+R) Proof: From ∪-distributivity it is inferred that ([ R, δ+R ] ∪ δ+R ) ; ([ R, δ+R ] ∪ δ+R )^ ⊆ ι, since Schro¨der’s Rule implies [ R, δ+R ] ;(δ+R)^ ⊆ ι, δ+R;( [ R, δ+R ] )^ ⊆ ι. This establishes property 1. The other property is proved similarly.  It may be noted that both implications above can be reversed. Inserting new elements may have an unfortunate effect on placeholders, as we have dis- cussed in Sect. 5.3. Hence we have to make sure — among others — that in the insertion’s result no placeholder appears twice as the first or the second component of a relation, and that no pair in a relation has placeholders on both sides. These properties are formulated now; for completeness, we state also conditions under which inheritance is preserved under insertions. Use in what follows as abbreviations Γ(A,B,C) ≡Def A ∩ (B ∩ (C;ι)) ;1l = Ø, Π(A,B,C) ≡Def A ∩ (B;1l ∩ C;1l) = Ø, Ψ(A,B,C) ≡Def A;B^ ∪ B;A^ ∪ B;B^ ⊆ C. Lemma 4 The following implications hold for any relation R: 1. NoTwice(P,R) NoTwice(P, δ+R, δ+R) Γ(P, [R, δ+R]P , δ+R) Γ(P, δ+R, [R, δ+R]P ) NoTwice(P, [R, δ+R]P ∪ δ+R) E.-E. Doberkat & E. G. Omodeo February 7, 2003 Page 21 ER Modeling from First Relational Principles 2. NoBoth(P,R, S) NoBoth(P, δ+R, δ+S) Π(P, δ+R, [S, δ+S]P ) Π(P, [R, δ+R]P , δ+S) NoBoth(P, [R, δ+R]P ∪ δ+R, [S, δ+S]P ∪ δ+S) 3. NoSamePair(P,R) NoSamePair(P, δ+R) NoSamePair(P, [R, δ+R]P ∪ δ+R) 4. NoDoubleFirst(P,R) NoDoubleFirst(P, δ+R) δ+R; [R, δ+R]^P ⊆ P ;1l ∩ 1l;P NoDoubleFirst(P, [R, δ+R]P ∪ δ+R) 5. NoDoubleSecond(P,R) NoDoubleSecond(P, δ+R) [R, δ+R]^P ;δ+R ⊆ 1l;P ∩ P ;1l NoDoubleSecond(P, [R, δ+R]P ∪ δ+R) 6. Inherits(P,E, F ) Entity(P, δ+E) Entity(P, δ+F ) δ+E ⊆ F ∪ δ+F Inherits(P,E ∪ δ+E,F ∪ δ+F ) Proof: The proofs depend on the algebraic laws imposed for a relational algebra. We give prototypical examples for proving these implications. Regarding 1., ∪-distributivity implies P ∩ (((X ∪ Y ) ∩ (X ∪ Y );ι);1l)) = (P ∩ ((X ∩ X;ι);1l))) ∪ (P ∩ ((X ∩ Y ;ι);1l))) ∪ (P ∩ ((Y ∩X;ι);1l))) ∪ (P ∩ ((Y ∩ Y ;ι);1l))) Matching this against the definition, and against Γ yields the result. In a similar way 2. is established. The distributive law for ∪ implies 3. directly. For establishing 4., put τ` := P ;1l ∩ 1l;P as an abbreviation, then the condition together with Schro¨der’s Rule yields δ+R; [ R, δ+R ]^ P ∩ τ` = Ø. Consequently, by ∪-distributivity, [ R, δ+R ] P ;δ +R^ ∩ τ` = Ø needs to be established. Schro¨der’s Rule again shows this to be equivalent to τ`;δ+R ⊆ [R, δ+R]P , which in turn may be seen from τ`;δ+R ⊆ P ;1l;δ+R ∩ 1l;P ;δ+R ⊆ P ;1l;1l ∩ 1l;1l;δ+R = P ;1l ∩ 1l;δ+R. E.-E. Doberkat & E. G. Omodeo February 7, 2003 Page 22 ER Modeling from First Relational Principles The inference 5. is established in a very similar way. Finally, 6. is obvious.  This Lemma concludes the technical preparations for a characterization of the conditions under which weak validity is preserved under insertions. We need to perform some index calculations on map letters. Define the set Related(t) as the smallest subset K of {T + 1, . . . , T + S + B} with these properties: • t ∈ K, • if u ∈ K and 〈u, v〉 ∈ Up, then v ∈ K, • if ri •— rj or rj —• ri, then i ∈ K iff j ∈ K. Thus if we want to insert something into, say, entity E, and E corresponds to map letter ri, then Related(i) contains the indices of exactly those entities and relations which are affected by this insertion. Now let an entity or a relation correspond to map letter rt. An insertion or a deletion is called local at t iff rs = Ø whenever s ∈ {T + Σ, . . . , T + 2 · Σ + 1} \ Related(t). Introducing this guard prevents the insertion or the deletion from violating the invariants for the model by letting properties creeping in that are not really controlled through our safety measures. From the instance M a new instance M′ is generated by performing the insertions. Put for each j ∈ {1, . . . S} rT+2·Σ+j := rT+j ∪ rT+Σ+j. This accounts for insertions into entities. As far as relations are concerned, we set for each index j ∈ {S + 1, . . . , B} rT+2·Σ+j := [rT+j, rT+Σ+j ] ∪ rT+Σ+j, accounting for the peculiar way we insert into a relation. Upon shifting the base address from T to T + 2 · Σ, the weak validity of M′ can be investigated: Proposition 1 Let M be a weakly valid ER model, assume that an insertion is local at some index t, then the ER model arising from the insertions is weakly valid, provided that the following conditions are all satisfied: 1. &s∈{1,...,S} Entity(pi, rT+Σ+s), 2. &s∈Related(t)∩{T+1,...,T+B} DomSub ( rΣ+pi1(Track(s)), rΣ+s ) & Entity ( rΣ+pi1(Track(s)) ) & ImgSub ( rΣ+pi2(Track(s)), rΣ+s ) & Entity ( rΣ+pi2(Track(s)) ) & (rs ∩ 1l;pi ∩ rΣ+s;1l) ;1l ∩ r2·Σ+pi1(Track(s));1l = Ø & 1l; (rs ∩ pi;1l ∩ 1l;rΣ+s) ∩ 1l;r2·Σ+pi1(Track(s)) = Ø 3. &s∈LeftOne∩Related(t) rΣ+s;r^Σ+s ⊆ ι & [rs, rΣ+s] ⊆ ι;rΣ+s & rΣ+s ⊆ ι; [rs, rΣ+s] 4. &s∈RightOne∩Related(t) r^Σ+s;rΣ+s ⊆ ι & [rs, rΣ+s] ⊆ rΣ+s;ι & rΣ+s ⊆ [rs, rΣ+s] ;ι 5. &s∈Related(t)∩{T+S+1,...,T+S+B} pi ∩ (rΣ+s ∩ rΣ+s;ι;1l) = Ø & pi ∩ ( r^Σ+s ∩ (r^Σ+s;ι);1l ) = Ø & Γ(pi, [rs, rΣ+s] , rΣ+s) & Γ(pi, [ r^s , r^Σ+s ] , r^Σ+s) & Γ(pi, rΣ+s, [rs, rΣ+s]) & Γ(pi, r^Σ+s, [ r^s , r^Σ+s ] ) E.-E. Doberkat & E. G. Omodeo February 7, 2003 Page 23 ER Modeling from First Relational Principles 6. &s∈Related(t)∩{T+S+1,...,T+S+B} &v∈Related(t)∩{s,...,T+S+B} & pi ∩ (rΣ+s;1l) ∩ (rΣ+v;1l) = Ø Π(pi, rΣ+s, [rv, rΣ+v]) & Π(pi, [rs, rΣ+s] , rv) 7. &s∈Related(t)∩{T+S+1,...,T+S+B} rΣ+s ∩ pi;1l ∩ 1l;pi = Ø 8. &s∈Related(t)∩{T+S+1,...,T+S+B} rΣ+s;r^Σ+s ∩ pi;1l ∩ 1l;pi = Ø & rΣ+s; [rs, rΣ+s]^ ⊆ pi;1l ∩ 1l;pi 9. &s∈Related(t)∩{T+S+1,...,T+S+B} r^Σ+s;rΣ+s ∩ 1l;pi ∩ pi;1l = Ø & [rs, rΣ+s]^ ;rΣ+s ⊆ 1l;pi ∩ pi;1l 10. &〈i,j〉∈Up∩Related(T )×Related(T ) Entity(pi, rΣ+i) & Entity(pi, rΣ+j) & rΣ+i ⊆ rj ∪ rΣ+j Proof: 0. This looks at first like a confusing bag of details. So let us sort them out by providing a table which permits stating a correspondence between the properties stated in the Proposition, and the properties of an ER model. Item# Property addressed 1 Entities are preserved 2 Domain properties are preserved 3 Left uniqueness 4 Right uniqueness 5 NoTwice 6 NoBoth 7 NoSamePair 8 NoDoubleFirst 9 NoDoubleSecond 10 Inheritance is preserved 1. Property 1. establishes together with the assumption &T+1≤j≤T+S Entity(rj) that &T+2·Σ+1≤j≤T+2·Σ+S Entity(rj) is true. 2. Take s ∈ Related(t), and assume that 〈i, j〉 = Track(s). Because M is weakly valid, we know that DotDot(ri, rs, rj) holds. In particular, DomSub(ri, rs) & Entity(ri) are true. This implies together with DomSub(rΣ+i, rΣ+s) & Entity(rΣ+i) & (rs ∩ 1l;pi ∩ rΣ+s;1l) ;1l ∩ r2·Σ+i;1l = Ø E.-E. Doberkat & E. G. Omodeo February 7, 2003 Page 24 ER Modeling from First Relational Principles through Lemma 2 (Property 1.) that DomSub(r2·Σ+i, r2·Σ+s) holds. In a similar way (by appealing to Lemma 2, part 2.), SImgSub(rΣ+j , rs) is established. Collecting things, we have established that DotDot(r2·Σ+i, r2·Σ+s, r2·Σ+j) is true. 3. Let s ∈ LeftOne ∩ Related(t), then we know from M′s validity that rs;rs ⊆ ι holds. From Lemma 3, Property 1. we now see that LUniq(r2·Σ+s) is true, provided 3. holds. In a very similar manner, RUniq(r2·Σ+s) is deduced from 4. for s ∈ RightOne ∩ Related(t). 4. From 5. we infer that &T+2·Σ+S+1≤j≤T+2·Σ+S+B NoTwice(pi, rj) & NoTwice ( pi, r^j ) holds (where we use Lemma 1 to establish that the identity [rT+j, rT+Σ+j ]^ = [ r^T+j, r^T+Σ+j ] holds. 5. In similar ways one establishes the desired properties, resorting to Lemma 4 for estab- lishing the necessary conditions.  To help the reader appreciate the content of the conditions above, we interpret the second and the last of them. Interpretations for the other conditions are quite similar and left to the reader. The first conditions states conditions under which E ∪ δ+E •— R ∪ δ+R —•F ∪ δ+F holds, i.e., under which conditions E ∪ δ+E and F ∪ δ+F remain the tight domain and the tight codomain, resp., of [R, δ+R] ∪ δ+R, provided E was the tight domain, and F was the tight codomain of R before the insertion, i.e., provided E •— R —•F holds. The conditions state that δ+E needs to be an entity such that dom ( δ+E ) ⊆ dom ( δ+R ) is true, hence each element to be inserted into E should be the first component of a pair to be inserted into R. In the same way δ+F is required to be an entity such that img ( δ+F ) ⊆ img ( δ+R ) holds. In addition we make sure that the required conditions on place holders are not violated, so that ( R ∩ 1l;P ∩ δ+R;1l ) ;1l ∩ (E ∪ δ+E);1l = Ø. 1l; ( R ∩ P ;1l ∩ 1l;δ+R ) ∩ 1l;(F ∪ δ+F ) = Ø holds, as we have discussed above. The last condition simply states that for E ∪ δ+E to inherit from R ∪ δ+R it is sufficient that E inherits from F , and that δ+E is a subset of F ∪δ+F , and that the new sets are entities indeed. E.-E. Doberkat & E. G. Omodeo February 7, 2003 Page 25 ER Modeling from First Relational Principles 8.3 Looking at Attributes We did neglect attributes for greater ease of discussion; now is the place for introducing conditions on them. Attributes are defined on entities (this is one of our restrictions, cf. Sect. 3.1), they come in different flavors, as we will discuss now. An attribute α on entity E is a partial map, so RUniq(α) should be satisfied, and its domain should be contained in (the domain of) E, thus dom(α) ⊆ dom(E) should hold. Moreover we assume attributes to have atomic values. This requirement will be modelled as follows: We assume our universe U to be structured as U = A∪A∗, where A 6= ∅ are the atomic values, and A∗ denotes the set of all words over the alphabet A, hence A∩A∗ = ∅ with  as the empty word; as usual, we put A+ := A∗ \ {}. We reserve a map letter ε ∈ {r1, . . . , rT } for representing  (hence Snglt(ε) & Coll(ε)) and permit only interpretations I that satisfy εI = {〈, 〉}. The atomic entities in A are modelled through the map letter υ with Coll(υ) & NonVoid(υ) & υ ∩ ε = Ø. In addition we postulate that pi ⊆ υ holds. Interpretations are restricted further by postulating that υI = {〈a, b〉 ∈ A2| a = b}. Moreover we assume the existence of canonic projections CAR and CDR separating the head from the tail of a non-empty word, hence CAR : { A+ → A t1 . . . tk 7→ t1 and CDR : { A+ → A t1 . . . tk 7→ t2 . . . tk These projections are represented through the map letters λ and ρ, corresponding to CAR and CDR, resp; their properties are discussed in Sect. 6.2. We abbreviate for later use the ith projection (hence the operation of extracting the ith component of a tuple) by Z(i) ≡Def (i = 1 ? CAR : Z(i−1);CDR) τ (i) ≡Def (i = 1 ? λ : τ (i−1);ρ), the latter abbreviation preparing for the use of map letters later on. E.-E. Doberkat & E. G. Omodeo February 7, 2003 Page 26 ER Modeling from First Relational Principles Returning to attributes: a mandatory attribute α on entity E is characterized through dom(α) = dom(E) & 1l;pi ∩ img(α) = Ø. If {α0, . . . αw} is a collection of key attributes on E, then Sect. 4.2 shows that this property means RUniq ( w ⋂ i=0 Z(i+1);α^i ) to hold. We proceed now in the same manner as in Sect. 8.2 by first formulating conditions which govern the preservation of the relevant properties, and begin with the property that being a map is preserved. The second part of the following Lemma reflects the postulate that insertions must preserve the domain: inserting into an attribute and inserting into its domain may not lead to the situation that the updated attribute has no longer the update of the domain as its domain. Lemma 5 The following properties hold: 1. RUniq(α)Ψ(α, δ+α, ι) RUniq(α ∪ δ+α) 2. dom(α) = dom(E) (δ+α \ α) ;1l = (δ+E;1l) \ (E;1l) dom(α ∪ δ+α) = dom(E ∪ δ+E) Proof: Both parts follows directly from ∪-distributivity. Note that the implication in the first part can be reversed.  The conditions laid down in Lemma 5 permit stating conditions under which some at- tribute conditions persist under insertion. The exception is a condition which permits being a member of a family of key attributes stable under insertions. The criterion is formulated in Lemma 6. It requires some preparations. Remember that in a relation algebra the equality ⋂ i∈I (A1,i ∪ A2,i) = ⋃ J⊆I   ⋂ j∈J A1,j ∩ ⋂ j /∈J A2,j   holds, whenever I is finite. This is so because a relation algebra is a Boolean algebra, in particular a distributive lattice. Abbreviate for the map expressions A0, . . . , Ak, B0, . . . , Bk and for J,K ⊆ {0, . . . , k} Λ(J, 〈A0, . . . , Ak〉, 〈B0, . . . , Bk〉) ≡Def ⋂ j∈J Aj; ( Z(j+1) )^ ∩ ⋂ j /∈J (Bj \ Aj); ( Z(j+1) )^ , Γ(J,K, 〈A0, . . . , Ak〉, 〈B0, . . . , Bk〉) ≡Def Λ(J, 〈A0, . . . , Ak〉, 〈B0, . . . , Bk〉) ; Λ(K, 〈A0, . . . , Ak〉, 〈B0, . . . , Bk〉)^ With these notations we may formulate: E.-E. Doberkat & E. G. Omodeo February 7, 2003 Page 27 ER Modeling from First Relational Principles Lemma 6 Invariance of a key under insertion is maintained by the following condition: &ki=0RUniq(αi) &ki=0RUniq(δ+αi) &J⊆{0,...,k}&K⊆{0,...,k} Γ(J,K, 〈α0, . . . , αk〉, 〈δ+α0, . . . , δ+αk〉) ⊆ ι RUniq ( ⋂k i=0 Z(i+1); (αi ∪ δ+αi)^ ) It should be noted that the formulation above requires 〈α0, . . . , αk〉 as well as 〈δ+α0 \ α0, . . . , δ+αk \ αk〉 to have the properties of key attributes. Proof: The distributive law (in the lattice), ∪-distributivity (with respect to composition), and Lemma 1 together show that ( k ⋂ i=0 Z(i+1); ( αi ∪ δ+αi )^ )^ ; ( k ⋂ i=0 Z(i+1); ( αi ∪ δ+αi )^ ) equals ⋃ J⊆{0,...,k} ⋃ K⊆{0,...,k} Γ(J,K, 〈α0, . . . , αk〉, 〈δ+α0, . . . , δ+αk〉). This implies the desired result.  The condition just formulated is exponential in the size of the key, consequently, it is not very convenient for practical purposes. On the other hand, it is exact, because a key can be extended if and only if the condition above is satisfied. It would be desirable to develop a more practical, if only sufficient condition for the invariance under insertions of the property being a key. Now call an ER model M valid iff it is weakly valid, and if the conditions on attributes that have been laid down in the model’s declaration are satisfied. Formally: Definition 5 The ER model M is called valid iff 1. M is weakly valid, 2. the attributes satisfy &T+1≤i≤T+S&j∈Attributes(i) RUniq(rj) & dom(rj) ⊆ dom(ri) & img(rj) ⊆ υ & &T+1≤i≤T+S&j∈Mandatory(i) dom(rj) = dom(ri) & 1l;pi ∩ img(rj) = Ø & &T+1≤i≤T+S let {i1, . . . , ij}=Key(i) in RUniq ( ⋂j `=1 Z(`+1);r^i` ) We will state now conditions under which the attributes of the changes ER model M ′ will cater for the model’s validity after the construction process is extended to attributes in the obvious way. Investigating validity requires us to exploit properties of the change sets δ+· for attributes in the context of their relations to the change sets for entities (note that we do for the time being without attributes on the relations on M). E.-E. Doberkat & E. G. Omodeo February 7, 2003 Page 28 ER Modeling from First Relational Principles Proposition 2 Suppose that the ER model M is valid, and that in addition to the prop- erties 1 – 10 from Proposition 1 the following properties are satisfied, when performing an insertion that is local at some index t: 1. &i∈Related(t)∩{T+1,...,T+B}&j∈Attributes(i) r^Σ+j;rΣ+j ⊆ ι & Ψ(rj, rΣ+j , ι) & 1l;rΣ+j ⊆ υ 2. &i∈Related(t)∩{T+1,...,T+B}&j∈Mandatory(i) (rΣ+j \ rj) ;1l = (rΣ+i;1l) \ (ri;1l) 3. &i∈Related(t)∩{T+1,...,T+B}&j∈Mandatory(i) 1l;pi ∩ 1l;rΣ+j = Ø 4. &i∈Related(t)∩{T+1,...,T+B} let Key(i) = {i0, . . . , ik} in &J⊆{0,...,k}&K⊆{0,...,k} Γ(J,K, 〈ri0 , . . . , rik〉, 〈rΣ+i0 , . . . , rΣ+ik〉) ⊆ ι Then M′ is a valid ER model. Proof: Lemma 5 makes sure that condition 1. implies that we indeed obtain attributes, and that by condition 2. mandatory attributes remain mandatory. Condition 3. caters for banning place holders from the image of mandatory attributes. The last condition helps together with Lemma 6 in ascertaining the properties of keys in the new model.  9 Deletions: Validity Deletions are treated in a similar fashion: we formulate conditions under which deletions maintain the validity of the ER model. We will again deal initially with entities and relations only, and then in a second step extend our considerations to attributes. This application of the principle of Separation of Concerns will again first formulate conditions under which weak validity is preserved, and then upgrade these conditions with the goal of finding criteria for unconstrained validity. We will use the same initial setup of map letters as in sect. 7, but now interpret the map letters between T + Σ + 1 and T + 2 · Σ as the place where we store the values to be deleted; they are now prefixed with δ−. If entity E corresponds to map letter rT+i with δ−E corresponding to rT+Σ+i, then E \ δ−E will be deposited at rT+2·Σ+i. In the same linear way, proceeding in a block wise fashion, we deposit the changed values for relations and attributes. The reader may wish to consult Fig. 4 again. 9.1 Weak validity The following observation shows that for maintaining weak validity we need not consider place holders separately, that left or right uniqueness of relations is of no concern, and that the defining property of key attributes remains intact, when deleting elements from the maps constituting the key. In other words, the properties of interest are downward closed. Lemma 7 The following implications hold: 1. R1 ⊆ R2 LUniq(R2) LUniq(R1) E.-E. Doberkat & E. G. Omodeo February 7, 2003 Page 29 ER Modeling from First Relational Principles 2. R1 ⊆ R2 RUniq(R2) RUniq(R1) 3. &ki=1R1,i ⊆ R2,i PlaceHolder(P, {R2,1, . . . , R2,k}) PlaceHolder(P, {R1,1, . . . , R1,k}) 4. &ki=1R1,i ⊆ R2,i RUniq ( ⋂n i=1 T (i+1);R^2,i ) RUniq ( ⋂n i=1 T (i+1);R^1,i ) Proof: Because the composition operator ; is monotone in both arguments, the first two assertions are immediate. The monotonicity of the converse operator (which sends R to R^) is used on top of that in establishing the third assertion. This is done by inspecting the auxiliary macros that constitute the conjunction defining the PlaceHolder-macro, and that are formulated in sect. 5.3. Monotonicity of both operations is finally used to establish the last implication.  Thanks to Lemma 7, the technical base for maintaining weak validity in the following statement (which corresponds to Lemma 4 for insertions), is rather easier to formulate: Lemma 8 The following implications hold: 1. DomSub(E,R) Entity(E) Entity(δ−E) DomSub(R, δ−E ∪ R \ δ−R) DomSub(E \ δ−E,R \ δ−R) 2. ImgSub(F,R) Entity(F ) Entity(δ−F ) DomSub(R^, δ−F^ ∪ (R \ δ−R)^) ImgSub(F \ δ−F,R \ δ−R) Proof: Only the first implication needs to be established, since the second follows by inver- sion. Since R;1l ⊆ ( δ−E ∪ (R \ δ−R) ) ;1l, an elementary calculation establishes (R;1l) \ (δ−E;1l) ⊆ (R \ δ−R);1l. Consequently, (E \ δ−E);1l = (E;1l) \ (δ−E;1l) ⊆ (R;1l) \ (δ−E;1l) (since E;1l ⊆ R;1l) ⊆ (R \ δ−R);1l. E.-E. Doberkat & E. G. Omodeo February 7, 2003 Page 30 ER Modeling from First Relational Principles This establishes the claim.  From the instance M a new instance M′ is generated by performing the deletions. This is very similar to the insertion discussed above: Put for each j ∈ {1, . . . , S + B} rT+2·Σ+j := rT+j \ rT+Σ+j. Upon shifting the base address from T to T + 2 · Σ, the weak validity of M′ can be investigated: Proposition 3 Let M be a weakly valid ER model, assume that a deletion is local at some index t, then the ER model arising from the deletions is weakly valid, provided the following conditions are all satisfied: 1. &s∈Related(t)∩{T+1,...,T+B} Entity ( rΣ+pi1(Track(s)) ) & Entity ( rΣ+pi2(Track(s)) ) & DomSub ( rs, rΣ+pi1(Track(s)) ∪ (rs \ rΣ+s) ) & DomSub ( r^s , r^Σ+pi2(Track(s)) ∪ (rs \ rΣ+s) ^ ) 2. &〈i,j〉∈Up∩Related(t)×Related(t) rΣ+j ⊆ rΣ+i Proof: Because condition 2. takes care of inheritance, and because of Lemma 7 we have to establish only DotDot ( r2·Σ+pi1(Track(s)), r2·Σ+s, r2·Σ+pi2(Track(s)) ) for all indices s ∈ Related(t). But this follows through a straightforward calculation from the assumption together with Lemma 8.  9.2 Adding Attributes Turning to attributes, we see that the functional character of attributes together with that of their domains is maintained when changing to a subset of each: Lemma 9 Let α be an attribute on entity E, then the following implications show how to maintain attribute conditions under deletion: 1. RUniq(α) RUniq(δ−α) Entity(E) dom(α) ⊆ dom(E) (α \ δ−α)^;δ−E = Ø dom(α \ δ−α) ⊆ dom(E \ δ−E) 2. RUniq(α) RUniq(δ−α) Entity(E) dom(α) ⊆ dom(E) (α \ δ−α)^;δ−E = Ø dom(δ−E) ⊆ dom(δ−α) dom(α \ δ−α) = dom(E \ δ−E) Proof: 1. Schro¨der’s Cycle Rule implies that (α\δ−α)^;δ−E = Ø is equivalent to dom(α \ δ−α)∩ dom(δ−E) = Ø, thus dom ( α \ δ−α ) ⊆ dom(E) \ dom ( δ−E ) = dom ( E \ δ−E ) . E.-E. Doberkat & E. G. Omodeo February 7, 2003 Page 31 ER Modeling from First Relational Principles This proves 1. 2. It remains to show that the domain of α \ δ−α contains E \ δ−E under the conditions from 2: Using Lemma 1, we see dom ( E \ δ−E ) = dom(E) \ dom ( δ−E ) ⊆ dom(α) \ dom ( δ−α ) ⊆ dom ( α \ δ−α )  Now we are able to state conditions under which deletions from an ER model maintain its validity: Proposition 4 Suppose that the ER model M is valid, and that in addition to the properties 1 and 2 from Proposition 3 the following properties are satisfied, when performing an insertion that is local at some index t: 1. &i∈Related(t)∩{T+1,...,T+B}&j∈Attributes(i) r^Σ+j;rΣ+j ⊆ ι & (rj \ rΣ+j)^;rΣ+i = Ø 2. &i∈Related(t)∩{T+1,...,T+B}&j∈Mandatory(i) rΣ+i;1l ⊆ rΣ+j;1l Proof: Since the weak validity of M′ is already being taken care of by Proposition 3, we have to cater for the integrity of the attributes. Condition 1. maintains together with Lemma 9 the condition under which the property of being an attribute is preserved, the second condition states when a mandatory attribute remains one; this also makes use of the same lemma. It was noted already in Lemma 7 that key attributes are not sensitive to deletions.  It comes as a surprise that deletions are so much easier to handle, than insertions. Some data structures (like binary search trees or heaps) are rather sensitive to deletions. Hence a deletion requires much there more attention in maintaining the invariant, than an insertion does (and consequently makes the analysis a much harder and more unpleasant undertaking). In the case of the data structures just mentioned, however, intrinsic properties of the keys do not enter the discussion, while in the case considered here the monotonicity together with the downward closeness of some properties played a simplifying role. 10 Implementation issues The reader may now wonder to what extent the ideas presented in this paper have led, or can lead, to concrete implementations. In its most classical version in which we have relied, relation calculus has a charming simplicity which makes it attractive not only as a theoretical grounding but also from a computational point of view. To become practically usable, however, relation calculus needs a platform of computerized methods, able to effect translations from higher-level formalisms into it, providing automated support to reasoning, supplying algebraic simplification techniques, and the like. Among various platforms which are being developed concurrently, let us mention the PROLOG-based Metamorpho system: the portion of it which is already publicly available1 1See under the URL http://costantini.dm.univaq.it/online.htm E.-E. Doberkat & E. G. Omodeo February 7, 2003 Page 32 ER Modeling from First Relational Principles supports various definitional extension mechanisms, namely flexible means to extend the native syntax of relation calculus. Some of the shortening definitions in this paper (cf., e.g., the one of PlaceHolder(P,Q) introduced in Sect. 5.3) are elaborate enough that they turned out to be ideal test cases for Metamorpho’s definitional apparatus. A user can load into Metamorpho one or several texts specifying the primitive operators and derived constructs of his/her version of relation calculus: thereby, the system acquires the ability to expand derived constructs in terms of the primitive ones, through abbreviating definitions (‘macros’, in a sense, of the kind illustrated before Lemma 1) which have been supplied by means of these files. The user can then load his/her selection of logical axioms for relation calculus from another file: these are laws of the kind shown in the fourth item of Def. 1. The rationale of this is: since many competing axiomatic systems exist for dyadic relation algebras, this essential component of the logical apparatus should be customizable, very much like the selection of basic language constructs. The current Metamorpho system does not possess any autonomous capabilities to carry out equational reasoning; hence, at least on a temporary basis, it will be interfaced with an existing theorem-prover which can assist—even with some degree of autonomy—the user who is performing (or simply re-checking) deductions of the kind we have highlighted several times while proving the lemmas of Sections 8 and 9. In the next step, the Metamorpho user will load proper axioms of a theory based on relation calculus. Unlike logical axioms, these are specific of the intended application. The formal statements of the properties of λ, %, υ, ε, by which we have set up in Sect. 4.2 the theory of flat tuples, are a typical example of proper axioms: it will be a subtheory of the theory associated with an ER-model in the way explained in Sect. 6. Like developing a sophisticated computer program, constructing a theory can at times be an overly complicated task, unless the environment provides mechanisms which support modularity conveniently: to cater for this, Metamorpho offers so-called ‘templates’ acting like procedures in the construction of theories. Fig. 5 shows an excerpt of the definition files which one submits to Metamorpho in prepa- ration for the treatment of ER-models. We do not enter into a full explanation here, because most of the notation introduced by this table agrees with what has been discussed at length already, both in its constructs and in their definitions (save for syntactic details and, perhaps, for minor semantic refinements). We could prolong this sequence of definitions, to culminate in a template definition ERmodel( IsAChains , Relations , DotDots , LeftUniques, RightUniques,Attributes , Keys , PlHolders , Left , Right , Individuals , NullTup, PN , LN , RN , YN , NN ) Θ: · · · The ERmodel template, when invoked with actual parameters, will generate a relational theory reflecting the semantics of a specific ER-model. An example of invocation is the following E.-E. Doberkat & E. G. Omodeo February 7, 2003 Page 33 ER Modeling from First Relational Principles mult(P ) =: P∩P ; ι lA(P ) =: 1l ; P rA(P ) =: P ; 1l RUniq(P,Q) ↔: mult(Q)∩ rA(P )=Ø LUniq(P,Q) ↔: RUniq(P,QS^) RXcl(P,Q) ↔: mult(Q)∩ lA(P )=Ø LXcl(P,Q) ↔: RXcl(P,Q^) th(L,R ‖ 1) =: L th(L,R ‖ i + 1) =: R ; th(L,R, i) succth(L,R‖N − 1) =: th(L,R,N) lBoth(S,Q) =: rA(S)∩ rA(Q) NoLLBoth(P,Q, S) ↔: lBoth(Q,S)∩ P=Ø NoLRBoth(P,Q, S) ↔: NoLLBoth(P,Q, S^) NoTogether(P,Q) ↔: rA(P )∩ lA(P )∩ Q=Ø NoTwice([P ]) Θ: true(P ) NoTwice([P,Q|T ]) Θ: [ RUniq(P,Q), LUniq(P,Q), NoTogether(P,Q), RXcl(P,Q), LXcl(P,Q) | NoTwice([P |T ]) ] NoLRBoth([P,Q]) Θ: true([P,Q]) NoLRBoth([P,Q, S|T ]) Θ: [ NoLLBoth(P,Q, S), NoLRBoth(P,Q, S), NoLRBoth(P, S,Q) | NoLRBoth([P,Q|T ])] NoBoth([P ]) Θ: true(P ) NoBoth([P,Q|T ]) Θ: [ NoLRBoth(P,Q,Q) | {NoLRBoth([P,Q|T ]), NoBoth([P |T ])} ] PlaceHolders([P,Q|T ]) Θ: {NoTwice([P,Q|T ]), NoBoth([P,Q|T ])} keyFunc([A], L,R ‖ I) =: succth(L,R, I) ; A^ keyFunc([A,B|T ], L,R ‖ J − 1) =: th(L,R, J) ; A^∩keyFunc([B|T ], L,R, J) Key([L,R|S]) ↔: RUniq(keyFunc(S,L,R, 0)) IsA([Y, P, F ]) Θ: [ι][F⊆Y, F∩P=Ø] IsA([Y, P,E, F |T ]) Θ: [ι][E⊆F | IsA([Y ∩P, F |T ])] IsAChains([Y, P ]) Θ: true([Y, P ]) IsAChains([Y, P,R|S]) Θ: [ true(Y ) | {IsA([Y, P |R]), IsAChains([Y, P |S])} ] DotDots([P ]) Θ: true(P ) DotDots([P,E,R, F |D]) Θ: [ R⊆rA(E−P ) ; (F−P ) | DotDots([P |D]) ] Figure 5: Some macros and templates submitted to Metamorpho for treatment of ER-models E.-E. Doberkat & E. G. Omodeo February 7, 2003 Page 34 ER Modeling from First Relational Principles (not very significant, indeed, as an ER-model): ERmodel( [[r11, r15], [r12, r13, r14]], -- IsA-chains [r6, r7, r8, r9, r10, r16], -- Relations and Attributes -- DotsDots: [r11, r10, r12, -- r10 (total on both sides) relates r11 with r12 r15, r16, r14], -- r16 (total on both sides) relates r15 with r14 [r16], -- LeftUniques (left-functional relations), [r10], -- RightUniques (functional relations). -- Attributes: [r9, r11, 0, -- r9 is an optional attribute on r11-entities r8, r15, 1, -- r8 is a mandatory attribute on r15-entities r7, r15, 1, -- r7 is a mandatory attribute of entity r15 r6, r14, 1], -- r6 is a mandatory attribute of entity r14 -- Keys: [[r7, r8], -- r7,r8 form a key of r15 [r6]], -- r6 is a key of r14 -- PlaceHolders,Left,Right,Individuals,NullTup: r5, -- r5 represents the collection of all placeholders . . . r1, r2, -- . . . in a universe with projections r1,r2 r3, -- . . . with individuals r3 r4, -- . . . and with void tuple r4 pi, λ, %, υ,  -- Corresponding external names ). When a text containing this single invocation is loaded into Metamorpho as proper axiom file (after the whole sequence of definitional macros and templates has been fed into it), a theory consisting of 5 definitions (namely, pi =: r5, λ =: r1, . . . , ε =: r4) and 133 equations will arise: a few of the equalities axiomatize the flat tuple domain, all others traslate the given ER-model. 11 Further Work The obvious line of attack for further work is removing some restrictions we imposed for technical reasons. This work was performed under some simplifying assumptions: we did assume that we work only with attributes on entities, and that we have a rather scant selection of cardinality restrictions. Both assumptions are not essential for our approach, and we feel that they should be removed. Another technical issue addresses the fact that we work with binary relations only. The discussions concerning projections shows, however, that it should not be too difficult to extend our set up for incorporating n-ary relations (although the notation then becomes slightly unbearable). From a modelling point of view, we work here in a somewhat untyped environment: we do not have sorts for different entities, but rather assume that one sort fits all. This is fairly problematic in applications, and not entirely practical. Introducing sorts is another step we feel should be undertaken (along with a more detailed comparison of both approaches). This together with further developing the Metamorpho system (cf. 10) will be some challenging tasks for the future. E.-E. Doberkat & E. G. Omodeo February 7, 2003 Page 35 ER Modeling from First Relational Principles References [1] R. Berghammer and G Schmidt. Relational specifications. In C. Rauszer, editor, Proc. XXXVIII Stefan Banach Seminar on Algebraic Methods in Logic and Their Computer Science Applications, volume 28 of Banach Center Publications, pages 167 – 190, Warsaw, 1993. Institute of Mathematics, Polish Academy of Sciences. [2] C. Brink, W. Kahl, and G. Schmidt, editors. Relational Methods in Computer Science. Advances in Computing Science. Springer-Verlag, Wien, New York, 1997. [3] D. Cantone, E. G. Omodeo, and A. Policriti. Set Theory for Computing. Springer-Verlag, 2001. [4] I. Claßen, M. Lo¨we, S. Waßerroth, and J. Wortmann. Static and dynamic semantics of entity-relationship models based on algebraic methods. Technical report, Department of Computer Science, Technical University, Berlin, 1994. [5] E. F. Codd. A relational model for large shared data banks. Communications of the ACM, 13(6):377–387, 1970. [6] E.-E. Doberkat. Generating an algebraic specification from an ER-model. International Journal of Software Engineering and Knowledge Engineering, 7(4):525–552, 1997. [7] E.-E. Doberkat. The categorial semantics of ER-models. Technical report, Chair for Software Technology, University of Dortmund, 1999. [8] E.-E. Doberkat and E. G. Omodeo. Algebraic semantics of ER-models in the context of the calculus of relations. Part II: Dynamic view. In H. M. de Swart, editor, Relational Methods in Computer Science, number 2561 in Lecture Notes in Computer Science, pages 50 – 65. Springer-Verlag, Dec. 2002. [9] A. Formisano, E.G. Omodeo, and M. Temperini. Goals and benchmarks for automated map reasoning. Journal of Symbolic Computation, 29(2):259–297, 2000. [10] M. Gogolla and U. Hohenstein. Towards a semantic view of an extended entity- relationship model. ACM Transactions on Database Systems, 16:369–416, 1991. [11] R. Hettler. Zur U¨bersetzung von E/R-Schemata nach SPECTRUM. Technical Report TUM I-9333, Technical University, Munich, 1993. [12] U. Hohenstein. Formale Semantik eines erweiterten Entity-Relationship Modells. B. G. Teubner, Stuttgart und Leipzig, 1993. [13] A. Jaoua, N. Belkhiter, H. Ounalli, and T. Moukam. Databases. In [2], pages 197 – 210. Springer-Verlag, 1997. [14] B. Meyer. Object-oriented Software Construction. Prentice-Hall, Englewood Cliffs, NJ, 5 edition, 1998. [15] E. G. Omodeo and E.-E. Doberkat. Algebraic semantics of ER-models in the context of the calculus of relations. Part I: Static view. Electronic Notes in Theoretical Computer Science, 2001. in print. E.-E. Doberkat & E. G. Omodeo February 7, 2003 Page 36 ER Modeling from First Relational Principles [16] M. Page-Jones. Fundamentals of Object-Orientied Design in UML. Dorset House Pub- lishing & Addison-Wesley, New York and Boston, 2000. [17] G. Schmidt, C. Hattensperger, and M. Winter. Heterogeneous relation algebra. In [2], pages 39 – 53. Springer-Verlag, 1997. [18] E. Schro¨der. Vorlesungen u¨ber die Algebra der Logik (exakte Logik), volume 2.1. B. Teubner, 1891. Reprinted by Chelsea Publishing Co., New York, 1966. [19] E. Schro¨der. Vorlesungen u¨ber die Algebra der Logik (exakte Logik), volume 3, Alge- bra und Logic der Relative, part 1. B. Teubner, Leipzig, 1895. Reprinted by Chelsea Publishing Co., New York, 1966. [20] A. Tarski and S. Givant. A formalization of set theory without variables, volume 41 of Colloquium Publications. American Mathematical Society, 1987. [21] B. Thalheim. Entity-Relationship Modeling: Foundations of Database Technology. Springer-Verlag, 2000. E.-E. Doberkat & E. G. Omodeo February 7, 2003 Interne Berichte des Lehrstuhls Software-Technologie (ISSN 0933-7725) /99/ T. Bühren, M. Cakir, E. Can, A. Dombrowski, G. Geist, V. Gruhn, M. Gürgrn, S. Handschumacher, M. Heller, C. Lüer, D. Peters, G. Vollmer, U. Wellen, J. von Werne Endbericht der Projektgruppe eCCo (PG 315) Electronic Commerce in der Versicherungsbranche Beispielhafte Unterstützung verteilter Geschäftsprozesse Februar 1999 /100/ A. Fronk, J. Pleumann, Der DoDL-Compiler August 1999 /101/ K. Alfert, E.-E. Doberkat, C. Kopka Towards Constructing a Flexible Multimedia Environment for Teaching the History of Art September 1999 /102/ E.-E. Doberkat An Note on a Categorial Semantics for ER-Models November 1999 /103/ Christoph Begall, Matthias Dorka, Adil Kassabi, Wilhelm Leibel, Sebastian Linz, Sascha Lüdecke, Andreas Schröder, Jens Schröder, Sebastian Schütte, Thomas Sparenberg, Christian Stücke, Martin Uebing, Klaus Alfert, Alexander Fronk, Ernst-Erich Doberkat Abschlußbericht der Projektgruppe PG-HEU (326) Oktober 1999 /104/ Corina Kopka Ein Vorgehensmodell für die Entwicklung multimedialer Lernsysteme März 2000 /105/ Stefan Austen, Wahid Bashirazad, Matthais Book, Traugott Dittmann, Bernhard Flechtker, Hassan Ghane, Stefan Göbel, Chris Haase, Christian Leifkes, Martin Mocker, Stefan Puls, Carsten Seidel, Volker Gruhn, Lothar Schöpe, Ursula Wellen Zwischenbericht der Projektgruppe IPSI April 2000 /106/ Ernst-Erich Doberkat Die Hofzwerge — Ein kurzes Tutorium zur objektorientierten Modellierung September 2000 /107/ Leonid Abelev, Carsten Brockmann, Pedro Calado, Michael Damatow, Michael Heinrichs, Oliver Kowalke, Daniel Link, Holger Lümkemann, Thorsten Niedzwetzki, Martin Otten, Michael Rittinghaus, Gerrit Rothmaier Volker Gruhn, Ursula Wellen Zwischenbericht der Projektgruppe Palermo November 2000 /108/ Stefan Austen, Wahid Bashirazad, Matthais Book, Traugott Dittmann, Bernhard Flechtker, Hassan Ghane, Stefan Göbel, Chris Haase, Christian Leifkes, Martin Mocker, Stefan Puls, Carsten Seidel, Volker Gruhn, Lothar Schöpe, Ursula Wellen Endbericht der Projektgruppe IPSI Februar 2001 /109/ Leonid Abelev, Carsten Brockmann, Pedro Calado, Michael Damatow, Michael Heinrichs, Oliver Kowalke, Daniel Link, Holger Lümkemann, Thorsten Niedzwetzki, Martin Otten, Michael Rittinghaus, Gerrit Rothmaier Volker Gruhn, Ursula Wellen Zwischenbericht der Projektgruppe Palermo Februar 2001 /110/ Eugenio G. Omodeo, Ernst-Erich Doberkat Algebraic semantics of ER-models from the standpoint of map calculus. Part I: Static view März 2001 /111/ Ernst-Erich Doberkat An Architecture for a System of Mobile Agents März 2001 /112/ Corina Kopka, Ursula Wellen Development of a Software Production Process Model for Multimedia CAL Systems by Applying Process Landsca- ping April 2001 /113/ Ernst-Erich Doberkat The Converse of a Probabilistic Relation Oktober 2002 /114/ Ernst-Erich Doberkat, Eugenio G. Omodeo Algebraic semantics of ER-models in the context of the calculus of relations. Part II: Dynamic view Juli 2001 /115/ Volker Gruhn, Lothar Schöpe (Eds.) Unterstützung von verteilten Softwareentwicklungsprozessen durch integrierte Planungs-,Workflow- undGroupware- Ansätze September 2001 /116/ Ernst-Erich Doberkat The Demonic Product of Probabilistic Relations September 2001 /117/ Klaus Alfert, Alexander Fronk, Frank Engelen Experiences in 3-Dimensional Visualization of Java Class Relations September 2001 /118/ Ernst-Erich Doberkat The Hierarchical Refinement of Probabilistic Relations November 2001 /119/ Markus Alvermann, Martin Ernst, Tamara Flatt, Urs Helmig, Thorsten Langer, Ingo Röpling, Clemens Schäfer, Nikolai Schreier, Olga Shtern Ursula Wellen, Dirk Peters, Volker Gruhn Project Group Chairware Intermediate Report November 2001 /120/ Volker Gruhn, Ursula Wellen Autonomies in a Software Process Landscape Januar 2002 /121/ Ernst-Erich Doberkat, Gregor Engels (Hrsg.) Ergebnisbericht des Jahres 2001 des Projektes “MuSofT – Multimedia in der SoftwareTechnik” Februrar 2002 /122/ Ernst-Erich Doberkat, Gregor Engels, Jan Hendrik Hausmann, Mark Lohmann, Christof Veltmann Anforderungen an eine eLearning-Plattform — Innovation und Integration — April 2002 /123/ Ernst-Erich Doberkat Pipes and Filters: Modelling a Software Architecture Through Relations Juni 2002 /124/ Volker Gruhn, Lothar Schöpe Integration von Legacy-Systemen mit Eletronic Commerce Anwendungen Juni 2002 /125/ Ernst-Erich Doberkat A Remark on A. Edalat’s Paper Semi-Pullbacks and Bisimulations in Categories of Markov-Processes Juli 2002 /126/ Alexander Fronk Towards the algebraic analysis of hyperlink structures August 2002 /127/ Markus Alvermann, Martin Ernst, Tamara Flatt, Urs Helmig, Thorsten Langer Ingo Röpling, Clemens Schäfer, Nikolai Schreier, Olga Shtern Ursula Wellen, Dirk Peters, Volker Gruhn Project Group Chairware Final Report August 2002 /128/ Timo Albert, Zahir Amiri, Dino Hasanbegovic, Narcisse Kemogne Kamdem, Christian Kotthoff, Dennis Müller, Matthias Niggemeier, Andre Pavlenko, Stefan Pinschke, Alireza Salemi, Bastian Schlich, Alexander Schmitz Volker Gruhn, Lothar Schöpe, Ursula Wellen Zwischenbericht der Projektgruppe Com42Bill (PG 411) September 2002 /129/ Alexander Fronk An Approach to Algebraic Semantics of Object-Oriented Languages Oktober 2002 /130/ Ernst-Erich Doberkat Semi-Pullbacks and Bisimulations in Categories of Stochastic Relations November 2002 /131 Yalda Ariana, Oliver Effner, Marcel Gleis, Martin Krzysiak, Jens Lauert, Thomas Louis, Carsten Röttgers, Kai Schwaighofer, Martin Testrot, Uwe Ulrich, Xingguang Yuan Prof. Dr. Volker Gruhn, Sami Beydeda Endbericht der PG nightshift: Dokumentation der verteilten Geschäftsprozesse im FBI und Umsetzung von Teilen dieser Prozesse im Rahmen eines FBI-Intranets basierend auf WAP- und Java-Technologie Februar 2003 /132/ Ernst-Erich Doberkat, Eugenio G. Omodeo ER Modelling from First Relational Principles Februar 2003