DB Systems - Data Modeling

June 5, 2016 | Author: hoangtucuagio | Category: Types, School Work

Share Embed Donate

Report this link

Short Description

DB Systems - Data Modeling...

Description

Faculty of Computer Science & Engineering Ho Chi Minh City University of Technology

Data modeling

Dr. Vo Thi Ngoc Chau ([email protected])

Semester 1 – 2014-2015

1

Main references [1] R. Elmasri, S. R. Navathe, Fundamentals of Database

Systems- 4th Edition, Pearson- Addison Wesley, 2003.

[2] H. G. Molina, J. D. Ullman, J. Widom, Database System Implementation, Prentice-Hall, 2000. [3] H. G. Molina, J. D. Ullman, J. Widom, Database Systems: The Complete Book, Prentice-Hall, 2002

[4] A. Silberschatz, H. F. Korth, S. Sudarshan, Database System Concepts –3rd Edition, McGraw-Hill, 1999. [5] R. Ramakrishnan, J. Gehrke, Database Management Systems- 2rd Edition, McGraw-Hill. [6] M. P. Papazoglou, S. Spaccapietra, Z. Tari, Advances in ObjectOriented Data Modeling, MIT Press, 2000. [7]. G. Simsion, Data Modeling: Theory and Practice, Technics Publications, LLC, 2007.

2

Content 

2.1. Concepts



2.2. Why is data modeling important?



2.3. Data modeling with entity-

relationship and relational models 

2.4. Dependencies and normalization



2.5. Conclusion 3

2.1. Concepts  Data

modeling

 Conceptual 

data model

Entity relationship model

 Representational 

Relational data model

 Database 

data model

design

Relational database design 4

Data modeling 

Modeling - Cambridge dictionary 

Model = a representation of something, either as a physical object which is usually smaller than the real object, or as a simple description

of the object which might be used in calculations 

Modeling = constructing a representation of something, either as a physical object which is usually smaller than the real object, or as a simple description of the object which might be used in calculations

5

Data modeling 

Data modeling – [7], pp. 31-32 

“formalizing and representing the data structures of reality” 



“ a representation of the things of significance to an enterprise and the relationships among those things” 



Shoval and Frumermann, 1994, p28

Hay, 1996a

“an attempt to capture the essence of things both concrete and abstract” 

Keuffel, 1996 6

Data modeling 

Data modeling – [7], pp. 31-32 

“an abstract representation of the data about entities, events, activities, and their associations within an organization” 



McFadden, Hoffer et al., 1999

“The core idea underlying all the definitions is the same: a data model is used for describing entities and their relationships within a core domain.” 

Topi and Ramesh, 2002 7

Data modeling 

Data modeling – [7], pp. 31-32 

“Data modeling is generally viewed as a design activity” 



“an activity that involves the creation of abstractions” 



Davydov, 1994

“the art and science of arranging the structure and relationship of data” 



Srinivasan and Te‟eni, 1990

McComb, 2004, p293

“data modeling is a design discipline” 

Simsion and Witt, 2005, p7

8

Conceptual data model 

A data model (aka semantic data model) 

Provides concepts close to the way many users perceive data



Provides the concepts essential for supporting the application environment at a very high nonsystem-specific level



Used for a conceptual schema of a database from data requirements



Example: entity-relationship model 9

Conceptual data model 

Characteristics of a conceptual data model 

Expressiveness 



Simplicity 



A small number of basic concepts that are distinct and orthogonal in their meaning

Formality 



Simple enough for an end user to use and understand  an easy diagrammatic notation

Minimality 



Distinctions between data, relationships, constraints

Concepts must be formally defined  state criteria for the validity of a schema in the model

Unique interpretation 

Complete and unambiguous semantics for each modeling construct

S. Navathe. Evolution of data modeling for databases. Communications of the ACM 35(9)(1992) 112-123.

10

Representational data model 

A data model (aka implementation/logical data model) 

Provides concepts understood by end-users but not too far removed from the way data is organized within the computer



Hides some details of data storage but able to be implemented on a computer system in a direct way (in some DBMS)



Example: relational data model 11

Database design 

Design the logical and physical structure of one or more databases to accommodate the information

needs of the users in an organization for a defined set of applications 

The overall database design activity has to undergo a systematic process called the design methodology,

whether the target database is managed by an RDBMS, ODBMS, or ORDBMS, ... 

The result of the design activity is a rigidly defined database schema that cannot easily be modified once the database is implemented.

12

Database design 

Goals: 

Satisfy the information content requirements of

the specified users and applications 

Provide a natural and easy-to-understand

structuring of the information 

Support processing requirements and any

performance objectives, such as response time, processing time, and storage space 13

Database design

Figure 2.1 Phases of database design and implementation for

large databases [1], pp. 368

14

Database design 

Six main phases of the overall database design and implementation process 

Requirements collection and analysis



Conceptual database design



Choice of a DBMS



Data model mapping (also called logical

database design) 

Physical database design



Database system implementation and tuning

15

Database design 

Phase 1: Requirements Collection and Analysis 

Identify the major application areas and user groups that will use the database or whole work will be affected by the database



Study and analyze existing documentation concerning the applications



Study the current operating environment and planned use of the information 







Analyze the types of transactions and their frequencies as well as of the flow of information within the system Study geographic characteristics regarding users, origin of transactions, destination of reports, … Specify the input and output data for the transactions

Collect written responses to sets of questions from the potential database users or user groups 

Users‟ priorities and the importance users place on various applications



Requirements from users and applications of the information system that will interact with the database system



Time-consuming but crucial to the success of the information system



Know and analyze the expectations of the users and the intended uses of 16 the database in as much detail as possible

Database design 

Phase 2: Conceptual Database Design 

Phase 2a: Conceptual Schema Design  



Examine the data requirements resulting from Phase 1 Produce a conceptual database schema in a DBMSindependent high-level data model

Phase 2b: Transaction Design 

Design the characteristics of known database transactions (applications) in a DBMS-independent way to ensure that the database schema will include all the information required by these transactions  Identify transactions‟ input/output and functional behavior



Group transactions into three categories: retrieval transactions, update transactions, mixed transactions

17

Database design 

Phase 2: Conceptual Database Design 

Phase 2a: Conceptual Schema Design 

Goal = a complete understanding of the database structure, meaning (semantics), interrelationships, and constraints

 Identify: entity types, relationship types, attributes, key attributes, cardinality and participation constraints on relationships, weak entity types, specialization/generalization hierarchies 

Approaches

 The centralized (or one-shot) schema design approach  The view integration approach

18

Database design 

Phase 2a: Conceptual Schema Design 

The centralized (or one-shot) schema design approach 







The requirements of the different applications and user groups from Phase 1 are merged into a single set of requirements before schema design begins. A single schema corresponding to the merged set of requirements is then designed. The DBA is responsible for deciding how to merge the requirements and for designing the conceptual schema for the whole database

Once the conceptual schema is designed and finalized, external schemas for the various user groups and applications can be specified by the DBA.

19

Database design 

Phase 2a: Conceptual Schema Design 

The view integration approach  

The requirements are not merged. A schema (or view) is designed for each user group or application based only on its own requirements.  Each user group or application



These schemas are merged or integrated into a global conceptual schema for the entire database.

 The DBA 

The individual views can be reconstructed as external schemas after view integration. 20

Database design 

Phase 3: Choice of a DBMS 

Technical factors  





Storage structures and access paths, the user and programmer interfaces, high-level query languages, development tools, architectural options, … supported by DBMS DBMS portability among different types of hardware

Non-technical factors 





Suitability & type of the DBMS for the task

…

Cost: software acquisition cost, maintenance cost, hardware acquisition cost, database creation and conversion cost, personnel cost, training cost, operating cost Availability of vendor services 21

Database design 

Phase 4: Data Model Mapping (Logical Database Design) 

Create a conceptual schema and external schemas in the data model of the selected DBMS



The result of this phase should be DDL statements in the language of the chosen DBMS that specify the conceptual and external level schemas of the database system. 22

Database design 

Phase 5: Physical Database Design 

Restricted to choosing the most appropriate structures for the database files from among the options offered by that DBMS



Choose specific storage structures and access paths for the database files 

Response time



Space utilization



Transaction throughput



Estimate record size and number of records in each database file



Estimate the update and retrieval patterns for the file cumulatively from all the transactions



Estimate the file growth, either in the record size because of new attributes or in the number of records

23

Database design 

Phase 6: Database System Implementation and Tuning 

Create the database schemas and (empty) database files 

Responsibility of the DBA in conjunction with the database designers



Reformat the data for loading into the new database if needed



Load/populate with the data if needed



Implement database transactions referring to the conceptual specifications of transaction, then write and test program code with embedded DML commands 



Responsibility of the application programmers

Database tuning continues as long as the database is in existence, as long as performance problems are discovered, and while the requirements keep changing. 24

Relational database design 

Two main approaches 

A top-down design technique  





Designing a conceptual schema in a high-level data model Mapping the conceptual schema into a relational database schema Analyzing each of the resulting relations based on the functional dependencies and assigned primary keys

A bottom-up design technique (relational synthesis) 



View relational database schema design strictly in terms of functional and other types of dependencies specified on the database attributes Synthesize the relation schemas applying a normalization algorithm

25

2.2. Why is data modeling important? 



Leverage 

Make programming simpler and cheaper



Poor data organization can be expensive to fix.

Conciseness 



Take more directly to the heart of the business requirements

Data quality 



Problems with data quality can be traced back to a lack of consistency in: 

Defining and interpreting data



Implementing mechanisms to enforce the definitions

A key role in achieving good data quality by establishing a common understanding of what is to be held in each table and column and how it is to be interpreted

G. C. Simsion, G. C. Witt, Data modeling essentials – 3rd edition, Elsevier Inc, 2005, pp. 8-10.

26

2.3. Data modeling with entity-relationship and relational models  Entity-relationship  Relational  Data

model

data model

model mapping

27

Database design Entity Relationship Model P. P-S. Chen. The EntityRelationship Model – Toward

a Unified View of Data. ACM Transactions on Database Systems 1(1)(March 1976) 9-36.

Figure 2.1 Phases of database design and implementation for

large databases [1], pp. 368

28

Entity Relationship Model

P. P-S. Chen. The Entity-Relationship Model – Toward a Unified View of Data. ACM Transactions on Database Systems 1(1)(March 1976) 9-36.



The entity-relationship model adopts the more natural view that the real world consists of entities and relationships.



The model can achieve a high degree of data independence and is based on set theory and relation theory.



The entity-relationship model can be used as a basis for a unified view of data.



A special diagrammatric technique, the entity-relationship diagram, is introduced as a tool for database design.

29

Entity Relationship Model

P. P-S. Chen. The Entity-Relationship Model – Toward a Unified View of Data. ACM Transactions on Database Systems 1(1)(March 1976) 9-36.

Fig. 1. Analysis of data models using multiple levels of logical views

30

Entity Relationship Model

P. P-S. Chen. The Entity-Relationship Model – Toward a Unified View of Data. ACM Transactions on Database Systems 1(1)(March 1976) 9-36.



The entity-relationship (ER) modeling concepts 

Entity types



Relationship types



Attributes



Keys



Structural constraints 31

Entity Relationship Model

P. P-S. Chen. The Entity-Relationship Model – Toward a Unified View of Data. ACM Transactions on Database Systems 1(1)(March 1976) 9-36.

Symbol

Meaning

Example

Employee

Entity type

Employees

Dependent

Weak entity type

Dependents of each employee

works_on

Relationship type

Employee works_on Project

dependents_of

Identifying relationship of the weak entity type

Dependents dependents_of Employees 32

Entity Relationship Model

P. P-S. Chen. The Entity-Relationship Model – Toward a Unified View of Data. ACM Transactions on Database Systems 1(1)(March 1976) 9-36.

Street

Symbol

Meaning

Example

Name

Attribute

Name of an employee

EmployeeID

Key Attribute

Distinct identifier of an employee

PhoneNumber

Multivalued Attribute

Phone numbers of an employee

Composite Attribute

Address (Street, District, City) of an employee

Derived Attribute

Age of an employee (derived from date of birth)

District

Address

Age

City

33

Entity Relationship Model

P. P-S. Chen. The Entity-Relationship Model – Toward a Unified View of Data. ACM Transactions on Database Systems 1(1)(March 1976) 9-36.

Constraints

34

Figure 3.2. An ER schema diagram for the COMPANY database [1], pp. 54.

35

a. List the entity types b. Is there a weak entity type? If so, give its name, partial key, and identifying relationship c.

What constraints do the partial key and the identifying relationship of the weak entity type specify in this diagram?

d. List the names of all relationship types, and specify the (min, max) constraint on each participation of an entity type in a relationship type.

Figure 3.18. An ER diagram for a BANK database schema [1], pp. 81.

36

Enhanced Entity Relationship Modeling 

Class/subclass relationships and type inheritance 





The relationship between a superclass and any one of its subclasses

Specialization 

Define a set of subclasses of an entity type



Establish additional specific attributes with each subclass



Establish additional specific relationship types between each subclass and other entity types or other subclasses

Generalization 

A reverse process of abstraction in which we suppress the differences among several entity types



Identify their common features



Generalize them into a single superclass of which the original entity types are special subclasses 37

partial specialization

total specialization

Figure 4.1. EER diagram notation to represent subclasses and specialization. [1], pp. 87.

38

Figure 4.3. Generalization [1], pp. 91.

39

- Disjoint (d): the subclasses of the specialization must be disjoint.

- Overlapping (o): the subclasses are not constrained to be disjoint, their sets of entities may overlap. Figure 4.5. EER diagram notation for overlapping (nondisjoint) specialization. [1], pp. 93.

40

Figure 4.7. A specialization lattice with multiple inheritance for a UNIVERSITY database [1], pp. 96. a. List superclasses, subclasses of each superclass

b. List class/subclass relationships c. Describe constraints on each specialization

41

Database design

A representational data

model supported by the chosen DBMS Relational Data Model E. F. Codd. A Relational Model of Data for Large Shared Data Banks. Communications of the ACM 13(6)(June, 1970) 377-387.

Figure 2.1 Phases of database design and implementation for

large databases [1], pp. 368

42

Data model

E. F. Codd. Data models in database management, ACM, 1980.



A combination of three following components 

(1). A collection of data structure types (the building

blocks of any database that conforms to the model); 

(2). A collection of operators or inferencing rules, which can be applied to any valid instances of the data types listed in (1), to retrieve or derive data from any parts

of those structures in any combinations desired; 

(3). A collection of general integrity rules, which implicitly or explicitly define the set of consistent

database states or changes of state or both –-- these rules may sometimes be expressed as insert-updatedelete rules

43

Relational Data Model

E. F. Codd. A Relational Model of Data for Large Shared Data Banks. Communications of the ACM 13(6)(June, 1970) 377-387. 

The term relation is used here in its accepted mathematical sense.



A relational database based on the relational data model is a collection of relations.



Given sets S1, S2, …, Sn (not necessarily distinct), R is a relation on these n sets if it is a set of ntuples each of which has its first element from S1, its second element from S2, and so on.





Sj is the jth domain of R.



R is said to have degree n.

More concisely, R is a subset of the Cartesian product S1xS2x…xSn.

44

Relational Data Model

E. F. Codd. A Relational Model of Data for Large Shared Data Banks. Communications of the ACM 13(6)(June, 1970) 377-387.

Fig. 1. A relation of degree 4 A relation of degree 4, called supply, which reflects the shipments-in-progress of parts from specified suppliers to specified projects in specified quantities. 45

Relational Data Model

E. F. Codd. A Relational Model of Data for Large Shared Data Banks. Communications of the ACM 13(6)(June, 1970) 377-387.



An n-ary relation R has the following properties: 

Each row represents an n-tuple of R.



The ordering of rows is immaterial.



All rows are distinct.



The ordering of columns is significant – it corresponds to the ordering S1, S2, …, Sn of the domains on which R is defined.



The significance of each column is partially conveyed by labeling it with the name of the corresponding domain.

46

Data model

E. F. Codd. Data models in database management, ACM, 1980.



A combination of three following components 

(1). A collection of data structure types (the building

blocks of any database that conforms to the model); 

(2). A collection of operators or inferencing rules, which can be applied to any valid instances of the data types listed in (1), to retrieve or derive data from any parts

of those structures in any combinations desired; 

(3). A collection of general integrity rules, which implicitly or explicitly define the set of consistent

database states or changes of state or both –-- these rules may sometimes be expressed as insert-updatedelete rules

47

Relational Data Model

E. F. Codd. A Relational Model of Data for Large Shared Data Banks. Communications of the ACM 13(6)(June, 1970) 377-387.



Operations on relations 

Usual set operations: union, intersection, minus 



Projection () 



Select a subset of the tuples from a relation that satisfy a selection condition

Cartesian product (cross product) (x) 



Select certain columns of a relation (striking out the others)

Selection () 



Compatible relations

Combine tuples from two relations in a combinatorial fashion

Join (inner/outer join, equijoin, natural join) 

Combine related tuples from two relations into single tuples

49

Relational Data Model

E. F. Codd. A Relational Model of Data for Large Shared Data Banks. Communications of the ACM 13(6)(June, 1970) 377-387.

50

Relational Data Model

E. F. Codd. A Relational Model of Data for Large Shared Data Banks. Communications of the ACM 13(6)(June, 1970) 377-387.

T ←R :S

51

Data model

E. F. Codd. Data models in database management, ACM, 1980.



A combination of three following components 

(1). A collection of data structure types (the building

blocks of any database that conforms to the model); 

(2). A collection of operators or inferencing rules, which can be applied to any valid instances of the data types listed in (1), to retrieve or derive data from any parts

of those structures in any combinations desired; 

(3). A collection of general integrity rules, which implicitly or explicitly define the set of consistent

database states or changes of state or both –-- these rules may sometimes be expressed as insert-updatedelete rules

52

Relational Data Model

E. F. Codd. A Relational Model of Data for Large Shared Data Banks. Communications of the ACM 13(6)(June, 1970) 377-387.



Constraints inherent in the data model 

Domain constraints



Key constraints



Constraints on nulls



Entity integrity constraints



Referential integrity constraints

53

Relational Data Model

E. F. Codd. A Relational Model of Data for Large Shared Data Banks. Communications of the ACM 13(6)(June, 1970) 377-387.



Domain constraints 



Within each tuple, the value of each attribute A must be an atomic value from the domain dom(A).

Key constraints 1. Two distinct tuples in any state of the relation cannot have identical values for (all) the attributes in the key. 2. It is a minimal superkey – that is, a superkey from which we cannot remove any attributes and still have the uniqueness constraint in condition 1 hold. 

Superkey of the relation schema R specifies a uniqueness constraint that no two distinct tuples in any state r of R can have the same value for the superkey.

54

Relational Data Model

E. F. Codd. A Relational Model of Data for Large Shared Data Banks. Communications of the ACM 13(6)(June, 1970) 377-387.



Key constraints 

A relation schema may have more than one key. 



Constraints on nulls 



Specify whether null values are or are not permitted on attributes

Entity integrity constraint 



Candidate keys: primary key and secondary keys

No primary key value can be null.

Referential integrity constraint 

Specify a referential integrity constraint between the two relation schemas R1 and R2 55

Relational Data Model

E. F. Codd. A Relational Model of Data for Large Shared Data Banks. Communications of the ACM 13(6)(June, 1970) 377-387.



Referential integrity constraint 

A set of attributes FK (foreign key) in relation schema R1 is a foreign key of R1 that references relation R2 if it satisfies the following two rules: 

The attributes in FK have the same domain(s) as the primary key attributes PK of R2; the attributes FK are said to reference or refer to the relation R2.



A value of FK in a tuple t1 of the current state r1(R1) either occurs as a value of PK for some tuple t2 in the

current state r2(R2) or is null. In the former case, we have t1[FK] = t2[PK], and we say that the tuple t1 references or refers to the tuple t2.

56

Relational Data Model

E. F. Codd. A Relational Model of Data for Large Shared Data Banks. Communications of the ACM 13(6)(June, 1970) 377-387.

Relation Employee Relation schema: EMPLOYEE (FNAME, MINIT, LNAME, SSN,

BDATE, ADDRESS, SEX, SALARY, SUPERSSN, DNO) Attributes = FNAME, MINIT, LNAME, SSN, … Domain of Attribute SEX = {F, M}

Domain of SALARY = a set of positive integer numbers Primary key = SSN

Foreign key = SUPERSSN

57

The SQL standard has gone through a number of revisions. http://en.wikipedia.org/wiki/SQL (Accessed on 09/05/2012)



High-level database language on relational databases: SQL-86, SQL-89, SQL-92, SQL-99, …

58

SQL (structured query language) 

Data Definition Language (DDL): Create, Alter, Drop



Data Manipulation Language (DML): Select, Insert, Update, Delete



Data Control Language (DCL): Commit, Rollback, Grant, Revoke 59

SQL (structured query language) CREATE TABLE TableName {(colName dataType [NOT NULL] [UNIQUE]

[DEFAULT defaultOption] [CHECK searchCondition] [,...]} [PRIMARY KEY (listOfColumns),] {[UNIQUE (listOfColumns),] […,]}

{[FOREIGN KEY (listOfFKColumns) REFERENCES ParentTableName [(listOfCKColumns)], [ON UPDATE referentialAction] [ON DELETE referentialAction ]] [,…]}

{[CHECK (searchCondition)] [,…] })

60

SQL (structured query language) SELECT [DISTINCT | ALL] {* | [columnExpression [AS newName]] [,...] } FROM TableName [alias] [, ...] [WHERE condition] [GROUP BY columnList] [HAVING condition] [ORDER BY columnList] 61

Database design

Data model mapping from a conceptual schema based on the ER model to a relational database schema based on the relational data

model

Figure 2.1 Phases of database design and implementation for

large databases [1], pp. 368

62

Data model mapping

Table 7.1. Correspondence between ER and Relational Models [1], pp. 198.

63

Data model mapping 

Mapping of specialization or generalization 

Convert each specialization with m subclasses {S1, S2, …, Sm} and (generalized) superclass C, where the attributes of C are {k, a1, …, an} and k is the (primary) key, into relation schemas using one of the four following options: 

Multiple relations – Superclass and subclasses



Multiple relations – Subclass relations only (total)



Single relation with one type attribute (disjoint)



Single relation with multiple type attributes (overlapping)

64

Using Multiple relations – Subclass relations only (total) Figure 7.4, pp. 200.

Figure 4.3. Generalization [1], pp. 91.

65

Figure 7.4. Using Single relation with multiple type attributes with Boolean fields MFlag and PFlag. [1], pp. 200.

66

Data model mapping 

Mapping algorithms 



Automatically create a relational schema from a conceptual schema design

Notes on data model mapping 

Slightly different for other post-relational data models as compared to the relational data model



Constraints on databases 

Inherent model-based constraints



Schema-based constraints



Application-based constraints

67

68

69

70

71

Query 

Example: For every project located in „Stafford‟, retrieve the project number, the controlling department number and the department manager‟s last name, address and birthdate.



SQL query: SELECT P.NUMBER, P.DNUM, E.LNAME, E.ADDRESS, E.BDATE FROM PROJECT AS P, DEPARTMENT AS D, EMPLOYEE AS E WHERE P.DNUM=D.DNUMBER AND D.MGRSSN=E.SSN AND

P.PLOCATION=„STAFFORD‟; 



Relational algebra expression: PNUMBER, DNUM, LNAME, ADDRESS, BDATE(((σPLOCATION=‘STAFFORD’(PROJECT))

 DNUM=DNUMBER (DEPARTMENT)) 

MGRSSN=SSN

(EMPLOYEE))72

Query: For each project on which more than two employees work, retrieve the project number, project name, and the number of employees who work on that project. SELECT

PNUMBER, PNAME, COUNT (*) AS PROJ_NUMBER

FROM

PROJECT, WORKS_ON

WHERE

PNUMBER=PNO

GROUP BY

PNUMBER, PNAME

HAVING

COUNT (*) > 2

Temp (PNUMBER, PNAME, PROJ_NUMBER) ← PNUMBER, PNAME



(

PNUMBER=PNO(PROJECT

Result ←





COUNT *

X WORKS_ON))

PROJ_NUMBER > 2

(Temp) 73

Query: retrieve the last name and first name of each employee

whose salary is greater than the max salary in department 5.

LNAME, FNAME σSALARY > C EMPLOYEE

MAX SALARY σDNO = 5 EMPLOYEE

74

Data model mapping 

Informal measures of quality for relation schema design 

Semantics of the attributes 



Reducing the redundant values in tuples 



Storage space & update anomalies

Reducing the null values in tuples 



How to interpret the attribute values stored in a tuple of the relation – how the attribute values in a tuple relate to one another

Multiple interpretations of nulls

Disallowing the possibility of generating spurious tuples 

Join relations with equality conditions on attributes that are either primary keys or foreign keys in a way that guarantees that no spurious tuples are generated 75

2.4. Dependencies and normalization 

Functional dependency (FD) 

The single most important concept in relational schema design theory



A constraint between two sets of attributes from the database



A property of the semantics or meaning of the attributes.



Examples 

SSN  ENAME  The value of an employee‟s social security number (SSN) uniquely determines the employee name (ENAME)



PNUMBER  {PNAME, PLOCATION}



{SSN, PNUMBER}  HOURS

76

2.4. Dependencies and normalization 

Functional dependency (FD) 

A constraint between two sets of attributes from the database 



A constraint on the possible tuples that can form a relation state r of a relation schema R = {A1, A2, …, An} Denoted by XY

 X, Y: two sets of attributes that are subsets of R  For any two tuples t1 and t2 in r that have t1[X] = t2[X], they must also have t1[Y] = t2[Y]. The values of Y of a tuple in r depend on (are determined by) the values of X. The values of X of a tuple uniquely (functionally) determine the values of Y. If XY in R, this does not say whether or not YX in R. 77

2.4. Dependencies and normalization 

Armstrong‟s inference rules (IR1-3) and others (IR4-6)

78

2.4. Dependencies and normalization 

Closure F+ of a set of functional dependencies F specified on relation schema R F+, the closure of F, is formally the set of all dependencies that include F as well as all dependencies that can be inferred from F.

79

2.4. Dependencies and normalization 

Closure F+ of a set of functional dependencies F



F = {SSN→{ENAME, BDATE, ADDRESS, DNUMBER}, DNUMBER→{DNAME, DMGRSSN}}



F+ = {SSN→{ENAME, BDATE, ADDRESS, DNUMBER}, DNUMBER→{DNAME, DMGRSSN}, SSN→SSN, …,

DNUMBER→ DNAME, SSN→{DNAME, DMGRSSN}}

80

2.4. Dependencies and normalization 

Closure F+ of a set of functional dependencies F



F = {{SSN, PNUMBER}→HOURS, SSN→ENAME, PNUMBER→{PNAME, PLOCATION}}



F+ = ???

81

2.4. Dependencies and normalization 

Closure X+ of a set of attributes X under F where X is a set of attributes that appears as a left-hand side of a functional dependency in F

X+, the closure of X under F, is the set of attributes that are functionally determined by X based on F. 82

2.4. Dependencies and normalization 

Closure X+ of a set of attributes X under F where X is a set of attributes that appears as a left-hand side of a functional dependency in F

83

2.4. Dependencies and normalization 

Closure X+ of a set of attributes X under F where X is a set of attributes that appears as a left-hand side of a functional dependency in F 

F = {{SSN, PNUMBER}→HOURS, SSN→ENAME,

PNUMBER→{PNAME, PLOCATION}} 

{SSN, PNUMBER}+ = {SSN, PNUMBER, HOURS, ENAME, PNAME, PLOCATION}



{SSN}+ = {SSN, ENAME}



{PNUMBER}+ = {PNUMBER, PNAME, PLOCATION}

84

2.4. Dependencies and normalization 

Closure X+ of a set of attributes X under F where X is a set of attributes that appears as a left-hand side of a functional dependency in F 

F = {{A, B}→{C}, {A}→{D, E}, {B}→{M}, {M}→{G, H},

{D}→{I, J}} 

{A, B}+ = ???



{A}+ = ???



{B}+ = ???



{M}+ = ???, and {D}+ = ???

85

2.4. Dependencies and normalization 

Identifying a key K of R based on a set of given functional dependencies F

86

2.4. Dependencies and normalization 

Identifying a key K of R based on a set of given functional dependencies F 

R = (A, B, C, D, E, G)



F = {CE, AEG, EAD}



K = ???

87

2.4. Dependencies and normalization 

Identifying a key K of R based on a set of given functional dependencies F 

R = (A, B, C, D, E, G)



F = {CE, AEG, EAD}



K = {A, B, C, D, E, G}



(K-{A})+ = {B, C, D, E, G, A}  K = {B, C, D, E, G}



(K-{B})+ = {C, D, E, G, A}  K = {B, C, D, E, G}



(K-{C})+ = {B, D, E, G, A}  K = {B, C, D, E, G}



(K-{D})+ = {B, C, E, G, A, D}  K = {B, C, E, G}



(K-{E})+ = {B, C, G, E, A, D}  K = {B, C, G}



(K-{G})+ = {B, C, E, A, D, G}  K = {B, C}

88

2.4. Dependencies and normalization 

Identifying a key K of R based on a set of given functional dependencies F 

R = (A, B, C, E, G)



F = {AB, BCE, EGA}



K = ???

89

2.4. Dependencies and normalization 

Identifying all keys K of R based on a set of given functional dependencies F 

Find N = U - fFright(f) N is a set of the attributes not in the right side of any functional dependency in F. U is a set of all the attributes of R.



Find N+, the closure of N under F



If N+ = U then there is only one key K = N.



Otherwise, find D = fFright(f) - fFleft(f) D is a set of the attributes only in the right side of the functional dependencies in F.



Find L = U – (N+D) L is a set of the attributes neither in the right side nor functionally determined by N.



Let Li be any subset of L. Find (NLi)+ under F.



If (NLi)+ = U then K = NLi.



Ki, Kj such that Ki  Kj, remove Kj.

90

2.4. Dependencies and normalization 

Identifying all keys K of R based on a set of given functional dependencies F 

R = (A, B, C, E, G)



F = {AB, BCE, EGA}



All keys K ???

91

2.4. Dependencies and normalization 

R = (A, B, C, E, G)



F = {AB, BCE, EGA}



N = U - fFright(f) = {C, G}



N+ = {C, G}



D = fFright(f) - fFleft(f) = 



L = U – (N+D) = {A, B, E}



Subsets of L: L1 = {A}, L2 = {B}, L3 = {E}, L4 = {A, B}, L5 = {A, E}, L6 = {B, E}, L7 = {A, B, E}



(NL1)+ = {C, G, A, B, E}  K1 = {C, G, A}



(NL2)+ = {C, G, B, E, A}  K2 = {C, G, B}



(NL3)+ = {C, G, E, A, B}  K3 = {C, G, E}

92

2.4. Dependencies and normalization 

Identifying all keys K of R based on a set of given functional dependencies F 

R = (A, B, C, E, G)



F = {ABC, CGE, BG, EA}



All keys K ???

93

2.4. Dependencies and normalization 

Membership of a functional dependency X→Y with respect to a set of functional dependencies F, i.e. X→Y is inferred from F, denoted by F╞ X→Y 

A functional dependency X→Y is inferred from a set of dependencies F specified on R if X→Y holds in every legal relation state r of R; that is, whenever r satisfies all the dependencies in F, X→Y also holds in r.

94

2.4. Dependencies and normalization 

Membership of a functional dependency X→Y with respect to a set of functional dependencies F, i.e. X→Y is inferred from F, denoted by F╞ X→Y 

Check if F╞ X→Y 

Find X+, the closure of X under F



If Y  X+ then F╞ X→Y 95

2.4. Dependencies and normalization 

Membership of a functional dependency X→Y with respect to a set of functional dependencies F, i.e. X→Y is inferred from F, denoted by F╞ X→Y 

F = {A → C, AC → D, E → AD, E → H}



F ╞ A → CD 



A+ under F = {A, C, D}  {C, D}  F ╞ A → CD

F ╞ E → AH ???

96

2.4. Dependencies and normalization 

A set of functional dependencies F is said to cover another set of functional dependencies E if every FD in E is also in F+; that is, if every dependency in E can be inferred from F. Alternatively, E is covered by F.



Two sets of functional dependencies E and F are equivalent if E+ = F+; that is, every dependency in E can be inferred from F and every dependency in F can be inferred from E; that is also, E covers F and F covers E.

97

2.4. Dependencies and normalization 

F = {A → C, AC → D, E → AD, E → H}



G = {A → CD, E → AH}

Are F and G equivalent? * Check if F covers G. * Check if G covers F.

98

2.4. Dependencies and normalization 

The minimal cover F of a set of functional dependencies E is a set of functional dependencies that satisfies the property that every dependency in E is in the closure F+ of F. In addition, this property is lost if any dependency from the set F is removed; F must have no redundancies in it, and the dependencies in F are in a standard form. 99

2.4. Dependencies and normalization 

The minimal cover F of a set of functional dependencies E 1.

Every dependency in F has a single attribute for its righthand side.

2.

We cannot replace any dependency X → A in F with a dependency Y → A, where Y is a proper subset of X, and still

have a set of dependencies that is equivalent to F. 3.

We cannot remove any dependency from F and still have a set of dependencies that is equivalent to F. 100

2.4. Dependencies and normalization 

The minimal cover F of a set of functional dependencies E

101

2.4. Dependencies and normalization 

The minimal cover F of a set of functional dependencies E 

E = {A → B, ABCD → I, IJ → G, IJ → H, ACDJ → GI}



The minimal cover F of E ???

102

2.4. Dependencies and normalization 

Normalization 

A process of analyzing the given relation schemas based on their FDs and primary keys to achieve the desirable properties of:  



(1) minimizing redundancy (2) minimizing the insertion, deletion, and update anomalies

Normal form of a relation 

The highest normal form condition that the relation meets  the degree to which it has been normalized

103

2.4. Dependencies and normalization 

The process of normalization through decomposition produces the relational schemas that have: 

The lossless join or non-additive join property 



No spurious tuple generation problem occurs.

The dependency preservation property 

Each FD is represented in some individual relation. 104

2.4. Dependencies and normalization Table 10.1. Summary of normal forms based on primary keys and corresponding normalization ([1], pp. 321)

105

2.4. Dependencies and normalization 

Pure relations based on the relational data model are all in 1NF.



Relations in 2NF have non-key attributes

functionally dependent on the whole primary key. 

Relations in 3NF have non-key attributes

only functionally dependent on the whole primary key.

106

Primary key = {SSN, PNUMBER} Non-key attributes = HOURS, ENAME, PNAME, PLOCATION

FD1: {SSN, PNUMBER}  HOURS FD2: SSN  ENAME FD3: PNUMBER  {PNAME, PLOCATION}

Functionally dependent on a part of the primary key 107

Primary key = SSN

Non-key attributes = ENAME, BDATE, ADDRESS, DNUMBER, DNAME, DMGRSSN FD1: SSN  {ENAME, BDATE, ADDRESS, DNUMBER} FD2: DNUMBER  {DNAME, DMGRSSN}

Non-key attributes are functionally dependent on another non-key attribute. 108

2.4. Dependencies and normalization 

Relation: LOTS (PROPERTY_ID#, COUNTY_NAME, LOT#, AREA, PRICE, TAX_RATE) 

Primary key: PROPERTY_ID#



Secondary key: {COUNTY_NAME, LOT#}



Functional dependencies 



FD1: PROPERTY_ID#  {COUNTY_NAME, LOT#, AREA, PRICE, TAX_RATE} FD2: {COUNTY_NAME, LOT#}  {PROPERTY_ID#, AREA, PRICE, TAX_RATE}



FD3: COUNTY_NAME  TAX_RATE



FD4: AREA  PRICE

109

Is LOTS really in 1NF? Normalization is carried out as follows. Determine LOTS1, LOTS2, LOTS1A, and LOTS1B.

Figure 10.11 (d). Summary of the progressive normalization of LOTS. [1], pp. 322. 110

2.4. Dependencies and normalization 

Relation: LOTS (PROPERTY_ID#, COUNTY_NAME, LOT#, AREA, PRICE, TAX_RATE) 

Primary key: PROPERTY_ID#



Secondary key: {COUNTY_NAME, LOT#}



Functional dependencies 



FD1: PROPERTY_ID#  {COUNTY_NAME, LOT#, AREA, PRICE, TAX_RATE} FD2: {COUNTY_NAME, LOT#}  {PROPERTY_ID#, AREA, PRICE, TAX_RATE}



FD3: COUNTY_NAME  TAX_RATE



FD4: AREA  PRICE

2NF 111

2.4. Dependencies and normalization 



Relation: LOTS1 (PROPERTY_ID#, COUNTY_NAME, LOT#, AREA, PRICE) -- in 2NF 

Primary key: PROPERTY_ID#



Secondary key: {COUNTY_NAME, LOT#}



Functional dependencies 

FD1: PROPERTY_ID#  {COUNTY_NAME, LOT#, AREA, PRICE}



FD2: {COUNTY_NAME, LOT#}  {PROPERTY_ID#, AREA, PRICE}



FD4: AREA  PRICE

3NF

Relation: LOTS2 (COUNTY_NAME, TAX_RATE) -- in 3NF 

Primary key: COUNTY_NAME



FD: COUNTY_NAME  TAX_RATE 112

2.4. Dependencies and normalization 





Relation: LOTS1A (PROPERTY_ID#, COUNTY_NAME, LOT#, AREA) -- in 3NF 

Primary key: PROPERTY_ID#



Secondary key: {COUNTY_NAME, LOT#}



Functional dependencies 

FD1: PROPERTY_ID#  {COUNTY_NAME, LOT#, AREA}



FD2: {COUNTY_NAME, LOT#}  {PROPERTY_ID#, AREA}

Relation: LOTS1B (AREA, PRICE) -- in 3NF 

Primary key: AREA



FD: AREA  PRICE

Relation: LOTS2 (COUNTY_NAME, TAX_RATE) -- in 3NF 

Primary key: COUNTY_NAME



FD: COUNTY_NAME  TAX_RATE

113

2.4. Dependencies and normalization 

Boyce-Codd normal form (BCNF) 

A simpler form of 3NF, but stricter than 3NF



Every relation in BCNF is also in 3NF; however, a relation in 3NF is not necessarily in BCNF. 

For a relation in 3NF, each non-key attribute must:

 Fully functionally dependent on every key  Nontransitively dependent on every key  

Not functionally dependent on other nonkey attributes

For a relation in BCNF, all key/non-key attributes must be functionally dependent on superkeys.  XA holds in R => X is a superkey of R.  XA holds in R and X is not a superkey of R => R is not 114 in BCNF.

2.4. Dependencies and normalization

Figure 10.12. (a) BCNF normalization of LOTS1A [1], pp. 325.

FD5: AREA  COUNTY_NAME

AREA is not a superkey of LOTS1A. LOTS1A is in 3NF; but not in BCNF.

115

2.4. Dependencies and normalization

Candidate key = {STUDENT, COURSE} FD1: {STUDENT, COURSE}  INSTRUCTOR FD2: INSTRUCTOR  COURSE

Normalization??? 116

2.4. Dependencies and normalization 

Multivalued dependencies 

A multivalued dependency X

Y specified on

relation schema R, where X and Y are both

subsets of R, specifies the following constraint on any relation state r of R: if two tuple t1 and t2 exist in r such that t1[X] = t2[X], then two tuples

t3 and t4 should also exist in r with the following properties, where Z is used to denote (R-(XUY)): 

t3[X] = t4[X] = t1[X] = t2[X]



t3[Y] = t1[Y] and t4[Y] = t2[Y]



t3[Z] = t2[Z] and t4[Z] = t1[Z]

117

2.4. Dependencies and normalization 

Multivalued dependencies



ENAME

PNAME



ENAME

DNAME



No association between PNAME and DNAME 118

2.4. Dependencies and normalization 

Multivalued dependencies (MVDs) 

A multivalued dependency X

Y is a trivial MVD

if: 

(a) Y is a subset of X



(b) X U Y = R

 A trivial MVD does not specify any significant or meaningful constraint on R. 

A nontrivial MVD satisfies neither (a) nor (b). 119

2.4. Dependencies and normalization 

Fourth normal form 

A relation schema R is in 4NF with respect to a set of dependencies (that includes functional dependencies and multivalued dependencies) if,

for every nontrivial multivalued dependency X

Y

in F+, X is a superkey for R.

120

2.5. Conclusion 

Data modeling



Database design process: 6 phases 

Conceptual database design 



Choice of a DBMS  a representational data model 



Entity-relationship model

Relational data model

Data model mapping



Functional/multivalued dependencies



Normalization: 1NF, 2NF, 3NF, BCNF, 4NF 121

Questions ???

122

Review - Chapter 12 in [1] 12.1. What are the six phases of database design? Discuss each phase.

12.2. Which of the six phases are considered the main activities of the database design process itself? Why? 12.3. Why is it important to design the schemas and applications in parallel? 12.4. Why is it important to use an implementation-independent data model during conceptual schema design? What models are used in current design tools? Why?

12.5. Discuss the importance of Requirements Collection and Analysis. 12.7. Discuss the characteristics that a data model for conceptual schema design should possess. 12.8. Compare and contrast the two main approaches to conceptual schema design.

12.9. Discuss the strategies for designing a single conceptual schema from its requirements. 12.10. What are the steps of the view integration approach to conceptual schema design? What are the difficulties during each step? 12.13. Discuss the factors that influence the choice of a DBMS package for the information system of an organization. 12.14. What is system-independent data model mapping? How is it different from system-dependent data model mapping? 12.15. What are the important factors that influence physical database design?

123

Review - Chapters 3, 4 in [1] 3.1. Discuss the role of a high-level data model in the database design process.

3.2. List the various cases where use of a null value would be appropriate. 3.3. Define the following terms: entity, attribute, attribute value, relationship instance, composite attribute, multivalued attribute, derived attribute, complex attribute, key attribute, value set (domain). 3.4. What is an entity type? What is an entity set? Explain the differences among an entity, an entity type, and an entity set. 3.6. What is a relationship type? Explain the differences among a relationship instance, a relationship type, and a relationship set. 3.7. What is a participation role? When is it necessary to use role names in the description of relationship types? 3.12. When is the concept of a weak entity used in data modeling? Define the terms owner entity type, weak entity type, identifying relationship type, and partial key. 4.1. What is a subclass? When is a subclass needed in data modeling? Define the following terms: superclass of a subclass, superclass/subclass relationship, is-a relationship, specialization, generalization, category, specific (local) attributes) specific relationships. 4.6. Discuss the two main types of constraints on specializations and generalizations.

124

Review - Chapter 5 in [1] 5.1. Define the following terms: domain, attribute, n-tuple, relation schema, relation state, degree of a relation, relational database schema, relational database state.

5.2. Why are tuples in a relation not ordered? 5.3. Why are duplicate tuples not allowed in a relation? 5.4. What is the difference between a key and a superkey? 5.8. Discuss the entity integrity and referential integrity constraints. Why is each considered important? 5.9. Define foreign key. What is this concept used for? 125

Review - Chapter 10 in [1] 10.1. Discuss attribute semantics as an informal measure of goodness for a relation schema. 10.2. Discuss insertion, deletion, and modification anomalies. Why are they considered bad? Illustrate with examples. 10.3. Why should nulls in a relation be avoided as far as possible? Discuss the problem of spurious tuples and how we may prevent it. 10.4. State the informal guidelines for relation schema design that we discussed. Illustrate how violation of these guidelines may be harmful. 10.5. What is a functional dependency? What are the possible sources of the information that defines the functional dependencies that hold among the attributes of a relation schema? 10.6. Why can we not infer a functional dependency automatically from a particular relation state? 10.12. What does the term unnormalized relation refer to? How did the normal forms develop historically from first normal form up to Boyce-Codd normal form? 10.13. Define first, second, and third normal forms when only primary keys are considered. How do the general definitions of 2NF and 3NF, which consider all keys of a relation, differ from those that consider only primary keys? 10.14. What undesirable dependencies are avoided when a relation is in 2NF? 10.15. What undesirable dependencies are avoided when a relation is in 3NF? 10.16. Define Boyce-Codd normal form. How does it differ from 3NF? Why is it 126 considered a stronger form of 3NF?

DB Systems - Data Modeling

Short Description

Description

Comments

We need your help!