Relational Database Design – Phase 2: Refine
So far, I have posted three Database Design Posts:
You pay for it - are you using it? Microsoft Access pointed out that there are circumstances where Microsoft Access was able to handle the system requirements in a far easier and more professional way than Microsoft Excel.
Making Sense of the Relational Database Structure is an infographic which shows the relational database terms and how they interrelate. If you are unfamiliar with these terms, this is a useful resource to look at as we go along.
Relational Database Design – Phase 1: Rough provided 2 steps to easily brainstorm the initial concept of what you were thinking of for your database.
This post will explain how to turn your rough concept into a Relational Database Structure. The process is known as Database Normalization.
The final post in this series, Relational Database Design – Phase 3: Finalize, will look at setting up the relationships between the tables using foreign keys and will go into more detail about defining the characteristics of each individual field.
Let’s start Relational Database Design – Phase 2: Refine
There are three main reasons to normalize a database:
minimize duplicate data
minimize or avoid data modification issues
simplify queries (a query would be something like “give me a list of all hotel rooms that have two beds.”)
The example I am using throughout is hotel booking information (see my Relational Database Design – Phase 1: Rough showing the steps I used to develop this database). At the end of these posts, you will be able to design a relational database for your own requirements.
So far in my rough design, I have:
Room Number and Guest ID are what is known as Primary Keys and are critical to relational database design. If you are uncertain what a primary key is, please refer to my Relational Database Design – Phase 1: Rough post.
Now we move onto refining our design (database normalization)
Rule 1a) There are no duplicate rows in the table
My very first draft of my database the “Rooms” table could have contained the following information:
*Note: This table doesn’t have the Primary Key “Room Number” for this example
Here we can see that there are 2 identical entries/records for the “Garden View” room. Probably the person entering the information got distracted and entered the room information twice.
This could lead so all sorts of problems such as the one room being booked by a different person on the same date.
To avoid duplicate rows we give each unique item a Primary Key. For the “Rooms” table a “Room Number” could be used.
A Primary Key must be unique within a table. In my example the “Garden View” room might be room 27. By making the “Room Number” a Primary Key the “Garden View” room number 27 could only appear once in the “Rooms” Table.
Rule 1b): Each cell in a table should contain exactly 1 value
For example: A guest makes 2 separate bookings in the same phone call
We could store the information like this:
Guest 10 now has 2 arrival dates.
One of the many problems this would cause is how do we store information that is relevant to each separate booking e.g. 5 guests are booked for the arrival date 14/04/2020 and 3 guests are booked for the arrival data 18/08/2020
A better way to store this information (in First Normal Form) would be:
Rule 1c) Each field should contain only one type of item
This one is pretty basic. e.g. the field “Arrival date” shouldn’t sometimes contain an arrival date and for another person contain the name of the person who booked. e.g.
It just wouldn’t make sense.
Rule 1d) Unique name for Fields/Columns
This rule expects that each column in a table should have a unique name. This is to avoid confusion at the time of retrieving data or performing any other operation on the stored data.
e.g. if the information about when the guests were going to arrive and depart both had the field name “Date” there would be confusion about which date was being referred to. Therefore, we have, “Arrival date” and “Departure date”.
Rule 1e) Order doesn't matter
This rule is just letting you know that the order that the fields/columns appear in the database doesn’t matter. So, for example, if the guest table was structured:
there would be no difference from if the table were structured
It is the field names “Arrival Date” and “Number of Guests” that are important.
Rule 2a) There should be no partial dependency in a table
This is best explained with an example:
At the moment our rough Guest table contains the fields
Name of the person who booked (First Name and Surname)
Number of guests
If there is a person who makes frequent bookings, each time we would enter their Guest ID, First Name and Surname. Their names won’t change for each booking made under their Guest ID. There is no need to enter their names every time. Their names are partially dependent on their Guest ID.
It would be a lot better to have a “Booker” Table (the person who made the booking) and a “Bookings” table. These tables would be:
Rule 3a) Remove transitive dependencies
My spelling is pretty bad especially when I am typing quickly. In the “Rooms” table we have a field “Bath or Shower”. When I am entering the information for all the different rooms I might enter:
If I now run a report to tell me the number of rooms that have a “Shower” I would only get 1 room since the other 2 rooms don’t match the search word “Shower”.
Fortunately, we typically use a form when entering data. Our form for the “Rooms” table might look like:
On the form I only have the option of selecting a Bath or a Shower. This information is stored in the “BathShower” table I have created.
Using the “BathShower” table and a List Box in the form the use can only enter one of these options.
If I now run a report to tell me the number of rooms that have a “Shower” I would get the correct number of rooms since the user doesn’t enter the text where they could misspell it or use only lower case etc.
This tool is used in many situations such as
Title (Mr, Mrs, Ms, Dr, Prof …)
Country (USA, South Africa, Australia, France …)
The Database Normalization process can continue with Boyce and Codd Normal Form (BCNF) and Fourth Normal Form. However, typically Third Normal is sufficient.
Making Sense of the Relational Database Structure is an infographic which shows the relational database terms and how they interrelate. If you are not familiar with these terms, this is a useful resource to look at as we go along.
This is the 2nd post of a 3 post series:
Relational Database Design – Phase 1: Rough provides 2 steps to easily brainstorm the initial concept of what you were thinking of for your database.
This post, Relational Database Design – Phase 2: Refine explains Relational Database Normalization rules and how to achieve them.
The final post in this series, Relational Database Design – Phase 3: Finalize will look at setting up the relationships between the tables using foreign keys and will go into more detail about defining the characteristics of each individual field.
At the end of these posts, you will be able to design your own Relational Database. A good database design allows the straightforward changing of requirements over the life of the database.
Over the years, users have come to me with sheepish looks on their faces and asked if something is possible. I have quickly been able to meet the required to the amazement of the user. This is due to a comprehensive initial database design.