Introduction
Before you start your work and get your hands dirty in database designing, it is very important to gather the complete requirements of the system. Because when you’re developing something from the start, it is easy to design it according to the system requirement. But it would be difficult to change the things in the existing system.
So if you’ve already made some of your database parts and then you notice that something is remaining which it was not clear before then would be so difficult to change the things in the existing system. There are 2 information sources out there.
- People (interviews)
- Documentation (Problem Statement)
So, different stakeholders are part of the People category. Managers, requirement engineer gather the requirements from the customers by conducting meetings, interviews, and many approaches out there to gather the live requirement of the system. And documentation is the written requirements bundle and here we conclude all the things which we need to implement.
Data and Fact Types
People who design databases sometimes forget what all is about. Data Model (Database Design) is not all about in entity types, attributes, relationships, tables, columns but all of these are just tools. For the end-users, the content of any database is a collection of facts. I’ve found a few things we need to do to create a good data model and it would be a lot of easier if we occasionally return to the level of those facts.
Data Model determines 2 things
- What type of facts (data) database can store?
- What are the constraints we need to apply to those facts?
What Type of Facts to Store?
Finding the right set of facts to store is very important. If the database is unable to store information that the end-user needs to do their work, the application is worthless. End-user either will not use it or start using creative ways to work around the limitations. So we need to make sure about the data and what data type we can use to store this data in our database because data is the most prior thing. If we’re successful to store the data into the database then obviously we’ve created the tables and now we can make them normalize or we can apply different relationships upon the tables very easily.
Constraints
Constraints are also important. Constraints are the rules or restrictions which we apply on the data like
- Valid data (Data Pattern Mach or Not)
- Valid Range of Data
- Duplication
- Consistent with other data
How to create a Data Model?
First of all, we need to make our problem statement or mission statement more clear. Most of the time it is just general, contains a broad overview. It doesn’t have a detailed discussion on the things. This document usually is not complete.
The 2nd source of information is the interviews. Most Data Modelers schedules interviews with Management and with Designated Subject Matter Experts either to get the general overview of the application to build or to clarify the questions that come from processing earlier interviews or the mission statements. The benefit of these interviews is the management and subject level experts, these people are usually comfortable discussing data design at the abstraction layer of the data model. And the downside is they are too far away from the everyday reality to really know exactly what is happening on the workflow.
3rd source of information is the end-user. The problem with the end-user is that they are not always able to think and discuss the abstraction layer of a data model. So, we just ask the simple questions which they can understand very easily. And here we use very simple examples for the better understandability of the things for the end-user.
And after clear requirement gathering, we create our data model. And there is no additional information needed to start. And then we develop the design of the database and start developing it.
Analyzing the Requirement Document
When you’ve to create a data model for a new application, there often already is a documented outline of how the new application will look. Those documents can have different names and here we’re using the name ‘Requirement Document or Mission Statement’ for that document.
To read a mission statement (requirement document)
- Use Highlighters of Various Colors to assign each sentence or even part of the sentence in the document for data modeling purposes.
- Fact Types (Green)
- Constraints (Red)
- Concrete Examples (Blue)
- Other (Yellow)
And if your role is more than the data modeler than obviously, you may have more things to mention in the document and you can use different colors for it.
- For better understandability of the requirement document, we use concrete examples when needed. And we verify these examples from the Subject Matter Experts if the text is giving the right interpretation of the text or not.
- In the first reading just read the complete document thoroughly from the start to end. And in the 2nd turn, use highlighters to highlight the important points.
So, this is the approach to how we read our requirement document.
If there is no required document?
Now, might be there is no required document given to you then you might think in that case how will you make your requirements complete. In that case, here we have got the tool of the mindmap. Take a look at my article to know
how to start the project.
Interviews
Interviews are an essential part of data modeling unless you’re creating a data model for purely personal use. You conduct the interviews and you can use the
mind mapping approach to make your requirement complete.
You’ve to verify with the people about your understanding of the system. Use concrete and simple examples while you’re talking with the end-user to get the requirements because he can’t understand the logical terms and things of your level. Assumptions are always dangerous. The things which are not clear to you in interviews, you can ask from subject matter experts. This approach can save your lot of time and effort.
Use their jargon, notation forms, language during communication. Avoid our jargon like entity types, relationships, attributes, cardinality, constraints, etc.
Creating the Initial ER Model
So here we’ll create the first version of the Entity-Relationship Model. So we’re ready to create an Entity Relationship Diagram (ERD).
- We first classify similar facts and different kinds of facts and finalize different fact types.
- Then we’ll use the collection of fact types to identify what entity types we need in the diagram.
- Each of those entity types will have an identifying fact type that may or may not already be in your collection.
- Draw the ERD
- We first put the entity types in the diagram.
- We add their attributes.
- And their relationships.
- Some fact types are not represented as normal entity types, attributes, or relationships. To represent them, we need to introduce extra Artificial Entity Types.
Generalizing Fact Types
We’ve discussed how to gather data from various information sources. You should now know exactly what facts need to be stored and you should have a lot of information about constraints that tell you what combination of the facts is invalid. Ideally, you should also have examples to illustrate how facts are represented by the end-user. And some examples of invalid fact combinations to illustrate the constraints.
Before we can use this information to draw the first data model, we must move away from the level of individual concrete facts to the more abstract level of collections of similar facts called fact types. So instead of discussing the individual facts,
So we can describe all members of this class of facts with templates. And in this template, members and dates are placeholders. Indicates the instances of these fact types’ occurrences of the collection of all members and from the collection of all dates should be inserted at these places.
Now, let’s take a look on another example,
So, this is the wrong template, because the fact doesn’t really give the information about Member or Date facts. But it carries information about a specific match and in this match, we use members to identify which match. So the correct template would be,
Templates can have any number of placeholders.
- Now, we see where the duplicates are allowed in these templates. Take a look at these examples and think in the sense of database.
This picture is clearing everything where the duplicate is allowed and where it is not allowed in the sense of a single record. Don’t confuse in the 2nd and 3rd template. The 2nd template is describing the information about the table if it is played for a match or not and in the 3rd template statement members come and sign up and there might be many tournaments.
Finding Entity Types
Now, we’ve found all the fact types and they will be represented in the Data Model.
We’ve seen the ERD model support 2 ways to represent fact types.
- As an attribute of Entity Types
- Relationship Between Entity Types
Whatever the representation they always connect to the entity types. We can’t diagram fact types without first having the entity types in the diagram. So we now need to find what entity types to add. So that we look at our collection of fact types because fact instances give information on the entity occurrences. So those occurrences are bound to be somewhere in the facts.
Now, you might be thinking how we identity Member fact type is an identity and Date is not? So if the fact is occurring again and again in many templates, it is our entity like.
- Member was born in Date.
- The match is played between Member on Date.
- Member has EmailAddress.
So this is a good indication that members would probably be an entity type. But this is not always the case, take an example.
- Member was born on Date.
- The match is played between members on the Date.
- The subscription is valid for the due date.
- The date is not an entity type here.
Make it simple
Now if we make it simple, read the requirement document thoroughly and conclude the entities from the requirement document. An entity can be any object which contains any information. Like,
The date is a field, it is the information of an object. And Member itself is an object containing much information. So our entity is Member. And in the database, each table represents an entity.
This is how we conclude the entities from the requirement document and the remaining part defines the attributes of that object/entity, what does the value these attributes can contain and it will become our constraints on that attribute. And we conclude the relationship in the same way from our requirement document file.
Once all the entity types we display in the diagram, we can start adding representation for the fact type. Remember there are various ways to represent fact types in the ERD. We can represent fact types as entities, attributes or we can identify them as relationships.
Now, suppose we add one more attribute.
Now, let’s take a look at this example.
Now look here we’ve 2 entities and we know that 1 league can contain many members for participation and one member can participate in multiple leagues. So here is the relationship is many to many. If you want to learn how to identify the relationship and make the ERD of the entities here we’ve some video tutorials where you can easily understand how to make ERD of any object.
Sometimes when we have the collection of fact types, we use some of the fact types as normal fact types and some fact types we use for the attributes and for the relationship. Let’s suppose we’ve multiple records for one member.
In such kind of cases, we need to handle this by creating one more entity Reservation. When we’re working with the database, sometimes, to clear the relationship of 2 entities, we create a 3
rd table and we place the repeating records in that 3
rd entity. In most of the cases, when the relationship is Many-to-Many, the third table is automatically created if we don’t create it manually. But if we create the 3
rd table our own than we can assign the data to it which is repeating again and again for a specific record.
Relational Database Design
The Entity-Relationship model is great for the design process. It enables the user to make a quick overview of the complete model. However, it is not very much useful when we’re developing the database. The elements in the ER Model do map relatively close to the elements in the relational database but the mapping is not exactly on one. We can’t directly go inside the Relational database design, ERD is the best tool for quick overview inside of the data model. That’s why we should start with ERD first. And then we convert ERD to Relational.
Sometimes we convert back from the relational model to the ER model, it is because it provides us a great understanding of the underlying data model.
Representation of Relational Design
There are many ways to represent the Relational Design of the database.
It is easy to create and take very little time but the downside is the lack of details in this approach.
Here, the underlying attributes are the keys and here in the compact form, we use dashed underlying keys as Foreign Keys, and sometimes it is Alternate keys.
The next approach is the Data Definition Language (actual SQL statements that can be executed to create the tables). It can contain all the details but it is very verbose and doesn’t work for any non-technical user.
Many programs allow you to easily create representations of the relational model, that looks like this.
Table relationships are lines connecting the tables with a line. And details for keys, optionality, and datatypes are defined in this way.
We can use this notation as well to represent the relational model of the database.
It looks very easy to understand, we can show the constraints here as well. Arrow lines are primary keys and dashed arrows are the alternate keys. We can define the data type and we can show the foreign keys to define the relationship between the tables. We can add the demo content here for the great understandability of the user as well.
The first step of conversion from ERD to the relational database design is incredibly simple. For each entity type, we create a separate table. How our relationship in the ER diagram is represented in the relational design depends upon the maximum cardinality of the relationship. And again, here we see how to identify the relationship between these tables and identify the cardinality of the tables.
When we’re applying a many-to-many relationship, then we create an extra table, and sometimes it is referred to as.
- Junction Table
- Linking Table
- Cross-Reference Table
- Join Table
When you encounter these names anywhere, keep in mind that they only tell you something about the ER diagram underlying the relational design.
Look at this diagram to get an idea of how we convert from ERD to relational database design.
As you can see, here, we’ve defined the type of data that we can input in the fields. DayOfWeek can contain Mon, Tue, Wed, etc. The round contains R1 (Round 1), R2, QF (Quarter-Final), SF (Semi-Final), Fn (Final). And in Matches entity if it Matches are played then, of course, there will be 2 frames otherwise both should be empty. BlockedDates table is there because of the many-to-many relationship between the tables. That’s why it creates the 3
rd table.