Data Normalisation

In order to allow for the flexible use of data held in a database and to minimise the effect of changes in application systems on the database structure, a process called .normalisation・ has been developed to produce anomoly-free data structures containing the minimum of redundant data. The advantages claimed for normalised structures are that they:

There are FIVE types of normalised relations of which THREE are commonly used when normalisation is undertaken. These are often referred to as Normal Forms (NF). Hence the First Normal Form is denoted as 1NF, the Second Normal Form 2NF, the Third Normal Form 3NF etc. There follows a brief demonstration of the normalisation techniques applied to a set of data relating to a Course database:

Course Relation in UN-normalised Form (0NF)

Course Code

Course Title

Level

Module Ref

Subject

Year

Maximum Credits

BS10

Business Studies

BSc

BS101

Accounting

1

50

 

 

BSc

BS102

Finance

2

100

 

 

BSc

BS103

Costing

3

150

HM10

Hospitality Mangement

BA

HM101

Catering

1

50

 

 

BA

HM102

Front of House

2

100

 

 

BA

HM103

Bar Management

3

150

IM20

Information Management

MSc

IM201

Database

1

50

 

 

MSc

IM202

IT Strategy

2

100

 

 

 

 

 

 

 

Course Relation in First Normal Form (1NF)

This step ensures that all of the Attributes are discrete i.e. can stand alone. This is achieved by the removal of .Repeating-Groups・ such as the Subject details in the Course Relation. This invariably means filling in the gaps, but sometimes involves rearranging the Tuples, without loss of their .meaning・.

Course Code

Course Title

Level

Module Ref

Subject

Year

Maximum Credits

BS10

Business Studies

BSc

BS101

Accounting

1

50

BS10

Business Studies

BSc

BS102

Finance

2

100

BS10

Business Studies

BSc

BS103

Costing

3

150

HM10

Hospitality Mangement

BA

HM101

Catering

1

50

HM10

Hospitality Management

BA

HM102

Front of House

2

100

HM10

Hospitality Management

BA

HM103

Bar Management

3

150

CO20

Information Management

MSc

IM201

Database

1

50

CO20

Information Management

MSc

IM202

IT Strategy

2

100

 

 

 

 

 

 

 

Course Relations in Second Normal Form (2NF)
This involves splitting off into separate Relations any Attributes that do not wholly depend on the Entire Key. Thus referring to the Course Relation in 1NF, it will be seen that the Attributes: Subject, Year and Maximum Credits depend on the Module Ref. but not the Course Code, so a new Relation called Subject is formed. Likewise for the Course-Module Relation.

Course Relation

Course Code

Course Title

Level

BS10

Business Studies

BSc

HM10

Hospitality Management

BA

IM20

Information Management

MSc

 

 

 

 

Course-Module Relation

Course Code

Module Ref

BS10

BS101

BS10

BS102

BS10

BS103

HM10

HM101

HM10

HM102

HM10

HM103

IM20

IM201

IM20

IM202

 

 

Subject Relation

Module Ref

Subject

Year

Maximum Credits

BS101

Accounting

1

50

BS102

Finance

2

100

BS103

Costing

3

150

HM101

Catering

1

50

HM102

Front of House

2

100

HM103

Bar Management

3

150

IM201

Database

1

50

IM202

IT Strategy

2

100

 

 

 

 

Course Relations in Third Normal Form (3NF)
This step ensure that all Non-Key attributes are independent of one another. In this case the only change involves breaking the Subject Relation as the Attribute Maximum Credits depends on the Attrubute Year.

Course Relation

Course Code

Course Title

Level

BS10

Business Studies

BSc

HM10

Hospitality Management

BA

IM20

Information Management

MSc

 

 

 

Course-Module Relation

Course Code

Module Ref

BS10

BS101

BS10

BS102

BS10

BS103

HM10

HM101

HM10

HM102

HM10

HM103

IM20

IM201

IM20

IM202

 

 

Subject Relation

Module Ref     Subject Year
BS101   Accounting     1
BS102   Finance 2
BS103   Costing 3
HM101   Catering       1
HM102   Front of House 2
HM103   Bar Management 3
IM201   Database       1
IM202   IT Strategy    2

Year Relation

Year    Maximum Credits
1       50
2       100
3       150

It is apparent from the Normalisation procedures that Relational Databases, although straightforward in principle, can be cumbersome to store and organise. They involve the creation of large .virtual・ files that, although temporary and transparent to the user, take up considerable storage space on the computer system supporting their operation.

Reading:

Kent, W. (1983). A Simple guide to Five Normal Forms in Relational Database Theory. Communications of the ACM Vol 26, No 2, Feb 1983 pp120-125.