Interview Questions on Database Concepts
(Q)
What is a database or database management system (DBMS)?
(Q)
What is the difference between DBMS and RDBMS?
(DB)
What are COD rules?
(Q)
Is Access database a RDBMS?
(Q)
What is the main difference between ACCESS and SQL SERVER?
(Q)
What is the difference between MSDE and SQL SERVER 2000?
(Q)
What is SQL SERVER Express 2005 Edition?
(DB)
What is SQL Server 2000 Workload Governor?
(Q)
What is the difference between SQL SERVER 2000 and 2005?
(Q)
What are E-R diagrams?
(Q)
How many types of relationships exist in database designing?
(Q)
What is normalization? What are the different type of normalization?
(Q)
What is denormalization?
(DB)
Can you explain Fourth Normal Form?
(DB)
Can you explain Fifth Normal Form?
(DB)
What is the difference between Fourth and Fifth normal form?
(DB)
Have you heard about sixth normal form?
(Q)
What is Extent and Page?
(DB)
What are the different sections in Page?
(Q)
What are page splits?
(Q)
In which files does SQL Server actually store data?
(DB)
Can we have a different collation for database and table?
Chapter 1: Database Concepts
Twist: What is the difference between a file and a database? Can
files qualify as a database?
Note: Probably these questions are too basic for
experienced SQL SERVER guys. But from a fresher’s point of view, it can
be a difference between getting a job and being jobless.
- Database provides a systematic and organized way of storing,
managing and retrieving from a collection of logically related
information.
- Secondly, the information has to be persistent, that means even
after the application is closed the information should be persisted.
- Finally, it should provide an independent way of accessing data and
should not be dependent on the application to access the information.
Ok, let me spend a few more sentences on explaining the third aspect.
Below is a simple figure of a text file that has personal detail
information. The first column of the information is Name
,
second Address
and finally Phone Number
. This
is a simple text file, which was designed by a programmer for a specific
application.
Figure 1.1: Non-Uniform Text File
It works fine in the boundary of the application. Now, some years
down the line a third party application has to be integrated with this
file. In order for the third party application to be integrated
properly, it has the following options:
- Use the interface of the original application.
- Understand the complete details of how the text file is organized,
example the first column is
Name
, then Address
and
finally Phone Number
. After analyzing, write a code which
can read the file, parse it etc. Hmm, lot of work, right.
That’s what the main difference is between a simple file and a
database; database has an independent way (SQL) of accessing information
while simple files do not (That answers my twisted question defined
above). File meets the storing, managing and retrieving part of a
database, but not the independent way of accessing data.
Note: Many experienced programmers think that the
main difference is that file cannot provide multi-user capabilities
which a DBMS provides. But if you look at some old COBOL and C programs
where files were the only means of storing data, you can see
functionalities like locking, multi-user etc. provided very efficiently.
So it’s a matter of debate. If some interviewers think of this as a
main difference between files and database, accept it… going in to
debate means probably losing a job.
(Just a note for fresher’s: Multi-user capabilities mean that at one
moment of time more than one user should be able to add, update, view
and delete data. All DBMS' provides this as in-built functionalities,
but if you are storing information in files, it’s up to the application
to write logic to achieve these functionalities).
As mentioned before, DBMS provides a systematic and organized way of
storing, managing and retrieving from a collection of logically related
information. RDBMS also provides what DBMS provides, but above that, it
provides relationship integrity. So in short, we can say:
RDBMS = DBMS + REFERENTIAL INTEGRITY
For example, in the above Figure 1.1, every person should have an Address
.
This is a referential integrity between Name
and Address
.
If we break this referential integrity in DBMS and files, it will not
complain, but RDBMS will not allow you to save this data if you have
defined the relation integrity between person and addresses. These
relations are defined by using “Foreign Keys” in any RDBMS.
Many DBMS companies claimed that their DBMS product was RDBMS
compliant, but according to industry rules and regulations, if the DBMS
fulfills the twelve CODD rules, it’s truly a RDBMS. Almost all DBMS (SQL
SERVER, ORACLE etc.) fulfill all the twelve CODD rules and are
considered truly as RDBMS.
Note: One of the biggest debates is whether
Microsoft Access is an RDBMS? We will be answering this question in
later section.
Twist: Does SQL SERVER support all the twelve CODD rules?
Note: This question can only be asked on two
conditions when the interviewer is expecting you to be at a DBA job or
you are complete fresher, yes and not to mention the last one he treats
CODD rules as a religion. We will try to answer this question from the
perspective of SQL SERVER.
In 1969, Dr. E. F. Codd laid down 12 rules, which a DBMS should
adhere to in order to get the logo of a true RDBMS.
Rule 1: Information Rule
"All information in a relational database is represented explicitly
at the logical level and in exactly one way - by values in tables."
In SQL SERVER, all data exists in tables and are accessed only by
querying the tables.
Rule 2: Guaranteed Access Rule
"Each and every datum (atomic value) in a relational database is
guaranteed to be logically accessible by resorting to a combination of
table name, primary key value and column name."
In flat files, we have to parse and know the exact location of field
values. But if a DBMS is truly an RDBMS, you can access the value by
specifying the table name, field name, for instance Customers.Fields
[‘Customer Name’]
.
SQL SERVER also satisfies this rule. In ADO.NET we can access field
information using table name and field names.
Rule 3: Systematic Treatment of Null Values
"Null values (distinct from the empty character string or a string of
blank characters and distinct from zero or any other number) are
supported in fully relational DBMS for representing missing information
and inapplicable information in a systematic way, independent of data
type.”
In SQL SERVER, if there is no data existing, NULL
values
are assigned to it. Note NULL
values in SQL SERVER do not
represent spaces, blanks or a zero value; it is a distinct
representation of missing information and thus satisfies rule 3 of CODD.
Rule 4: Dynamic On-line Catalog Based on the Relational Model
"The database description is represented at the logical level in the
same way as ordinary data, so that authorized users can apply the same
relational language to its interrogation as they apply to the regular
data."
The Data Dictionary is held within the RDBMS. Thus, there is no need
for off-line volumes to tell you the structure of the database.
Rule 5: Comprehensive Data Sub-language Rule
"A relational system may support several languages and various modes
of terminal use (for example, the fill-in-the-blanks mode). However,
there must be at least one language whose statements are expressible,
per some well-defined syntax, as character strings and that is
comprehensive in supporting all the following items:
- Data Definition
- View Definition
- Data Manipulation (Interactive and by program)
- Integrity Constraints
- Authorization
- Transaction boundaries ( Begin, commit and rollback)"
SQL SERVER uses SQL to query and manipulate data, which has a
well-defined syntax and is being accepted as an international standard
for RDBMS.
Note: According to this rule, CODD has only
mentioned that some language should be present to support it, but not
necessary that it should be SQL. Before the 80’s, different’s database
vendors were providing their own flavor of syntax until in 1980,
ANSI-SQL came in to standardize this variation between vendors. As
ANSI-SQL is quite limited, every vendor including Microsoft introduced
their additional SQL syntax in addition to the support of ANSI-SQL. You
can see SQL syntax varying from vendor to vendor.
Rule 6: View-updating Rule
"All views that are theoretically updatable are also updatable by the
system."
In SQL SERVER, not only views can be updated by the user, but also by
SQL SERVER itself.
Rule 7: High-level Insert, Update and Delete
"The capability of handling a base relation or a derived relation as a
single operand applies not only to the retrieval of data, but also to
the insertion, update and deletion of data."
SQL SERVER allows you to update views that in turn affect the base
tables.
Rule 8: Physical Data Independence
"Application programs and terminal activities remain logically
unimpaired whenever any changes are made in either storage
representations or access methods."
Any application program (C#, VB.NET, VB6, VC++ etc) does not need to
be aware of where the SQL SERVER is physically stored or what type of
protocol it is using, the database connection string encapsulates
everything.
Rule 9: Logical Data Independence
"Application programs and terminal activities remain logically
unimpaired when information-preserving changes of any kind that
theoretically permit un-impairment are made to the base tables."
Application programs written in C# or VB.NET do not need to know
about any structure changes in SQL SERVER database. Example: adding of
new field etc.
Rule 10: Integrity Independence
"Integrity constraints specific to a particular relational database
must be definable in the relational data sub-language and storable in
the catalog, not in the application programs."
In SQL SERVER, you can specify data types (integer
, nvarchar
,
Boolean
etc.) which put in data type checks in SQL SERVER
rather than through application programs.
Rule 11: Distribution Independence
"A relational DBMS has distribution independence."
SQL SERVER can spread across more than one physical computer and
across several networks; but from application programs, it has not a big
difference but just specifying the SQL SERVER name and the computer on
which it is located.
Rule 12: Non-subversion Rule
"If a relational system has a low-level (single-record-at-a-time)
language, that low level cannot be used to subvert or bypass the
integrity Rules and constraints expressed in the higher level relational
language (multiple-records-at-a-time)."
In SQL SERVER whatever integrity rules are applied on every record
are also applicable when you process a group of records using
application program in any other language (example: C#, VB.NET, J#
etc.).
Readers can see from the above explanation that SQL SERVER satisfies
all the CODD rules, some database gurus consider SQL SERVER as not truly
being an RDBMS, but that’s a matter of debate.
Access fulfills all rules of CODD, so from this point of view, yes
it’s truly an RDBMS. However, many people can contradict it as a large
community of Microsoft professionals think that Access is not an RDBMS.
As mentioned before, Access fulfills all the CODD rules and behaves
as a true RDBMS. But there’s a huge difference from an architecture
perspective, due to which many developers prefer to use SQL SERVER as
the major database rather than Access. Following is the list of
architecture differences between them:
- Access uses file server design and SQL SERVER uses the Client /
Server model. This forms the major difference between SQL SERVER and
ACCESS.
Note: Just to clarify what is client server
and file server I will make a quick description of widely accepted
architectures. There are three types of architectures: - Main frame architecture (This is not related to the above
explanation but just mentioned as it can be useful during an interview
and also for comparing with other architectures)
- File sharing architecture (Followed by ACCESS)
- Client Server architecture (Followed by SQL SERVER).
In Main Frame architecture, all the processing happens on central
host server. User interacts through a dumb terminal that only sends
keystrokes and information to the host. All the main processing happens
on the central host server. So the advantage in such type of
architecture is that you need least configuration clients. But the
disadvantage is that you need a robust central host server like Main
Frames.
In File sharing architecture, which is followed by Access database,
all the data is sent to the client terminal and then processed. For
instance, if you want to see customers who stay in India, in File
Sharing architecture all customer records will be sent to the client PC
regardless whether the customer belongs to India or not. On the client
PC customer records from India are sorted/filtered out and displayed, in
short all processing logic happens on the client PC. Therefore, in this
architecture, the client PC should have heavy configuration and it
increases network traffic as a lot of data is sent to the client PC.
However, the advantage of this architecture is that your server can be
of a low configuration.
Figure 1.2: File Server Architecture of Access
In client server architecture, the above limitation of the file
server architecture is removed. In client server architecture, you have
two entities, client and the database server. File server is now
replaced by database server. Database server takes up the load of
processing any database related activity and the client does any
validation aspect of database. As the work is distributed between the
entities it increases scalability and reliability. Second, the network
traffic also comes down as compared to file server. For example if you
are requesting customers from India, database server will sort/ filter
and send only Indian customer details to the client, thus bringing down
the network traffic tremendously. SQL SERVER follows the client-server
architecture.
Figure 1.3: Client Server Architecture of SQL SERVER
- The second issue comes in terms of reliability. In Access, the
client directly interacts with the Access file, in case there is some
problem in the middle of a transaction, there are chances that an Access
file can get corrupt. But in SQL SERVER, the engine sits in between the
client and the database, so in case of any problems in the middle of a
transaction, it can revert back to its original state.
Note: SQL
SERVER maintains a transaction log by which you can revert back to your
original state in case of any crash. - When your application has to cater to a huge load demand, highly
transactional environment and high concurrency, then its better to go
for SQL SERVER or MSDE.
- But when it comes to cost and support, Access stands better than SQL
SERVER. In case of SQL SERVER, you have to pay for per client license,
but Access runtime is free.
Summarizing: SQL SERVER gains points in terms of
network traffic, reliability and scalability whereas Access gains points
in terms of cost factor.
MSDE is a royalty free, redistributable and cut short version of the
giant SQL SERVER database. It is primarily provided as a low cost option
for developers who need a database server, which can easily be shipped
and installed. It can serve as a good alternative for Microsoft Access
database as it overcomes quite a few problems which Access has.
Below is a complete list, which can give you a good idea of the
differences:
- Size of database: Microsoft Access and MSDE have a
limitation of 2GB while SQL SERVER has 1,048,516 TB1.
- Performance degrades in MSDE 2000 when maximum number of concurrent
operations goes above 8 or is equal to 8. It does not mean that you
cannot have more than eight concurrent operations but the performance
degrades. Eight-connection performance degradation is implemented by
using SQL SERVER 2000 workload governor (we will be looking into more
detail of how it works). As compared to SQL SERVER 2000, you can have
32,767 concurrent connections.
- MSDE does not provide OLAP and Data warehousing capabilities.
- MSDE does not have support facility for SQL mail.
- MSDE 2000 does not have GUI administrative tool such as enterprise
manager, Query analyzer or Profiler. But there are roundabout ways by
which you can manage MSDE 2000:
- Old command line utility OSQL.EXE
- VS.NET IDE Server Explorer: Inside VS.NET IDE, you have a
functionality which can give you a nice GUI administrative tool to
manage IDE.
- SQL SERVER WEB Data administrator installs a web based GUI which you
can use to manage your database.
For any details refer here.
- SQL-DMO objects can be used to build your custom UI
- There are many third party tools, which provide administrative
capability GUI, which is out of scope of the book as it is only meant
for interview questions.
- MSDE does not support Full text search.
Summarizing: There are two major differences: The
first is the size limitation (2 GB) of the database and second is the
concurrent connections (eight concurrent connections) which are limited
by using the workload governor. During an interview, this answer will
suffice if the interviewer is really testing your knowledge.
Twist: What is the difference between SQL SERVER Express 2005 and
MSDE 2000?
Note: Normally comparison is when the product is
migrating from one version to other version. When SQL SERVER 7.0 was
migrating to SQL 2000, asking differences was one of the favorite
questions.
SQL SERVER Express edition is a scaled down version of SQL SERVER
2005 and the next evolution of MSDE.
Listed below are some major differences between them:
- MSDE maximum database size is 2GB while SQL SERVER Express has
around 4GB.
- In terms of programming language support MSDE has only TSQL, but
SQLSERVER Express has TSQL and .NET. In SQL SERVER Express 2005, you can
write your stored procedures using .NET.
- SQL SERVER Express does not have connection limitation, which MSDE
had and was controlled through the workload governor.
- There was no XCOPY support for MSDE, SQL SERVER Express has it.
- DTS is not present in SQL SERVER express while MSDE has it.
- SQL SERVER Express has reporting services while MSDE does not.
- SQL SERVER Express has native XML support and MSDE does not.
Note: Native XML support means now in SQL SERVER 2005:
- You can create a field with data type
XML
. - You can provide SCHEMA to the SQL SERVER fields with
XML
data type. - You can use new XML manipulation techniques like
XQUERY
also called as XML QUERY
.
There is a complete chapter on SQL SERVER XML Support, so till then
this will suffice.
Summarizing: The major difference is the database
size (2 GB and 4 GB), support of .NET support in stored procedures and
native support for XML. This much can convince the interviewer that you
are clear about the differences.
Workload governor limits the performance of SQL SERVER Desktop engine
(MSDE) if the SQL engine receives more load than what is meant for
MSDE. MSDE was always meant for trial purpose and non-critical projects.
Microsoft always wanted companies to buy their full blow version of SQL
SERVER, so in order that they can put limitation on MSDE performance
and number of connections, they introduced Workload governor.
Workload governor sits between the client and the database engine and
counts the number of connections per database instance. If Workload
governor finds that the number of connections exceeds eight connections,
it starts stalling the connections and slowing down the database
engine.
Note: It does not limit the number of connections
but makes the connection request go slow. By default 32,767 connections
are allowed both for SQL SERVER and MSDE. But it just makes the database
engine go slow above eight connections.
Twist: What is the difference between Yukon and SQL SERVER 2000?
Note: This question will be one of the favorites
during SQL SERVER interviews. I have marked the points which should be
mentioned by developers as PG and DBA for Database Administrator.
Following are some major differences between the two versions:
- (PG) The most significant change is the .NET integration with SQL
SERVER 2005. Stored procedures, user-defined functions, triggers,
aggregates, and user-defined types can now be written using your own
favorite .NET language (VB.NET, C#, J# etc.). This support was not there
in SQL SERVER 2000 where the only language was T-SQL. In SQL 2005, you
have support for two languages T-SQL and .NET.
- (PG) SQL SERVER 2005 has reporting services for reports which is a
newly added feature and does not exist for SQL SERVER 2000. It was a
separate installation for SQL Server 2000.
- (PG) SQL SERVER 2005 has introduced two new data types
varbinary
(max)
and XML
. If you remember in SQL SERVER 2000,
we had image and text data types. Problem with image
and text
data types is that they assign the same amount of storage
irrespective of what the actual data size is. This problem is solved
using varbinary (max)
which acts depending on amount of
data. One more new data type is included XML
which enables
you to store XML documents and does schema verification. In SQL SERVER
2000, developers used varchar
or text
data
type and all validation had to be done programmatically. - (PG) SQL SERVER 2005 can now process direct incoming HTTP request
without IIS Web server. In addition, stored procedure invocation is
enabled using the SOAP protocol.
- (PG) Asynchronous mechanism is introduced using server events. In
Server event model the server posts an event to the SQL Broker service,
later the client can come and retrieve the status by querying the
broker.
- For huge databases, SQLSERVER has provided a cool feature called
“Data partitioning”. In data partitioning, you break a single database
object such as a table or an index into multiple pieces. But for the
client application accessing the single database object, “partitioning”
is transparent.
- In SQL SERVER 2000, if you rebuilt clustered indexes even the
non-clustered indexes where rebuilt. But in SQL SERVER 2005 building the
clustered indexes does not build the non-clustered indexes.
- Bulk data uploading in SQL SERVER 2000 was done using BCP (Bulk copy
program’s) format files. Now in SQL SERVER 2005 bulk, data uploading
uses XML file format.
- In SQL SERVER 2000 there were maximum 16 instances, but in 2005 you
can have up to 50 instances.
- SERVER 2005 has support of “Multiple Active Result Sets” also called
as “MARS”. In previous versions of SQL SERVER 2000 in one connection,
you could only have one result set. Now in one SQL connection, you can
query and have multiple results set.
- In previous versions of SQL SERVER 2000, system catalog was stored
in the master database. In SQL SERVER 2005, it’s stored in a resource
database which is stored as sys object. You cannot access the sys object
directly as in the older version we were accessing the master database.
- This is one of the hardware benefits which SQL SERVER 2005 has over
SQSERVER 2000 – support of hyper threading. WINDOWS 2003 supports hyper
threading; SQL SERVER 2005 can take advantage of the feature unlike SQL
SERVER 2000 which did not support hyper threading.
Note: Hyper
threading is a technology developed by INTEL which creates two logical
processors on a single physical hardware processor. - SMO will be used for SQL Server Management.
- AMO (Analysis Management Objects) to manage Analysis Services
servers, data sources, cubes, dimensions, measures, and data mining
models. You can mapm AMO in old SQL SERVER with DSO (Decision Support
Objects).
- Replication is now managed by RMO (Replication Management Objects).
Note:
SMO, AMO and RMO are all using .NET Framework. - SQL SERVER 2005 uses current user execution context to check rights
rather than ownership link chain, which was done in SQL SERVER 2000.
Note:
There is a question on this later see for execution context
questions. - In previous versions of SQL SERVER the schema and the user name was
same, but in current, the schema is separated from the user. Now the
user owns schema.
Note: There are questions on this,
refer “Schema” later.
Note: Ok below are some GUI
changes. - Query analyzer is now replaced by query editor.
- Business Intelligence development studio will be used to create
Business intelligence solutions.
- OSQL and ISQL command line utility is replaced by SQLCMD utility.
- SQL SERVER Enterprise manager is now replaced by SQL SERVER
Management studio.
- SERVER Manager which was running in system tray is now replaced by
SQL Computer manager.
- Database mirror concept is supported in SQL SERVER 2005, which was
not present in SQL SERVER 2000.
- In SQL SERVER 2005 Indexes can be rebuilt online when the database
is in actual production. If you look back in SQL SERVER 2000, you cannot
do insert, update, and delete operations when you are building indexes.
- (PG) Other than Serializable, Repeatable Read, Read Committed, and
Read Uncommitted isolation levels, there is one more new isolation level
“Snapshot Isolation level”.
Note: We will see
“Snapshot Isolation level” in detail in the coming questions.
Summarizing: The major significant difference
between SQL SERVER 2000 and SQL SERVER 2005 is in terms of support of
.NET Integration, Snap shot isolation level, Native XML support,
handling HTTP request, Web service support and Data partitioning. You do
not have to really say all the above points during an interview. A
sweet summary and you will rock.
E-R diagram also termed as Entity-Relationship diagram shows the
relationship between various tables in the database. Example: Tables Customer
and Customer Addresses
have a one to many relationship
(i.e. one customer can have multiple addresses) this can be shown using
the ER diagram. ER diagrams are drawn during the initial stages of a
project to forecast how the database structure will shape up. Below is a
screen shot of a sample ER diagram of “Asset Management” which ships
free with Access.
Figure 1.4: Asset management ER diagram.
There are three major relationship models:
-
One-to-one
Figure 1.5: One-to-One relationship ER diagram
- One-to-many
In this many records in one table correspond to the
one record in another table.
Example: Every one customer can have
multiple sales. So there exist one-to-many relationships between
customer and sales table.
One Asset
can have multiple Maintenance
. So
Asset
entity has one-to-many relationship between them as
the ER model shows below.
Figure 1.6: One-to-Many Relationship ER diagram
- Many-to-many
In this, one record in one table corresponds to many
rows in another table and also vice-versa.
For instance: In a
company, one employee can have many skills like Java , C# etc. and also
one skill can belong to many employees.
Given below is a sample of many-to-many relationship. One employee
can have knowledge of multiple Technology
. So in order to
implement this, we have one more table Employee Technology
which is linked to the primary key of Employee
and Technology
table.
Figure 1.7: Many-to-Many Relationship ER diagram
Note: A regular .NET programmer working on projects
often stumbles on this question, which is but obvious. The bad part is
sometimes the interviewer can take this as a very basic question to be
answered and it can be a turning point for the interview. So let's cram
it.
It is set of rules that have been established to aid in the design of
tables that are meant to be connected through relationships. This set
of rules is known as Normalization.
Benefits of Normalizing your database include:
- Avoiding repetitive entries
- Reducing required storage space
- Preventing the need to restructure existing tables to accommodate
new data
- Increased speed and flexibility of queries, sorts, and summaries
Note: During an interview, people expect to answer a
maximum of three normal forms and that's what is expected practically.
Actually you can normalize database to fifth normal form. But believe
this book, answering three normal forms will put you in a decent shape
during an interview.
The three normal forms as follows:
First Normal Form
For a table to be in first normal form, data must be broken up into
the smallest units possible. In addition to breaking data up into the
smallest meaningful values, tables in first normal form should not
contain repetitions groups of fields.
Figure 1.8: Repeating groups example
In the above example, city1
and city2
are
repeating. In order for these tables to be in First normal form, you
have to modify the table structure as follows. Also note that the
Customer Name is now broken down to first name and last name (First
normal form data should be broken down to the smallest unit).
Figure 1.9: Customer table normalized to first normal
form
Second Normal Form
The second normal form states that each field in a multiple field
primary key table must be directly related to the entire primary key. In
other words, each non-key field should be a fact about all the fields
in the primary key.
In the above table of customer
, city
is not
linked to any primary field.
Figure 1.10: Normalized customer table.
Figure 1.11: City is now shifted to a different master
table.
That takes our database to a second normal form.
Third Normal Form
A non-key field should not depend on another Non-key field. The field
Total
is dependent on Unit price
and qty
.
Figure 1.12: Fill third normal form
So now the Total
field is removed and is the
multiplication of Unit price * Qty
.
Denormalization is the process of putting one fact in numerous places
(it is vice-versa of normalization). Only one valid reason exists for
denormalizing a relational design - to enhance performance. The
sacrifice to performance is that you increase redundancy in a database.
Note: Whenever the interviewer is trying to go above
the third normal form, there can be two reasons, ego or to fail you.
Three normal forms are really enough, practically anything more than
that is an overdose.
In fourth normal form, it should not contain two or more independent
multi-valued facts about an entity and it should satisfy “Third Normal
form”.
So let us try to see what
multi-valued facts are. If there are two or more many-to-many
relationship in one entity and they tend to come to one place, it is
termed as “multi-valued facts”.
Figure 1.13: Multi-valued facts
In the above table, you can see that there are two many-to-many
relationships between Supplier
/ Product
and “Supplier
/ Location
(or in short multi-valued facts). In
order for the above example to satisfy the fourth normal form, both the
many-to-many relationships should go in different tables.
Figure 1.14: Normalized to Fourth Normal form.
Note: UUUHHH if you get this question after joining
the company, do ask him if he himself really uses it?
Fifth normal form deals with reconstructing information from smaller
pieces of information. These smaller pieces of information can be
maintained with less redundancy.
Example: Dealers
sell Product
which can be
manufactured by various Companies
. Dealers
in
order to sell the Product
should be registered with the Company
.
So these three entities have a mutual relationship within them.
Figure 1.15: Not in Fifth Normal Form.
The above table shows some sample data. If you observe closely, a
single record is created using lot of small information. For instance: JM
Associate
can sell sweets under the following two conditions:
JM Associate
should be an authorized dealer of Cadbury
Sweets
should be manufactured by Cadbury
company
These two smaller bits of information form one record of the above
given table. So in order for the above information to be “Fifth Normal
Form” all the smaller information should be in three different places.
Below is the complete fifth normal form of the database.
Figure 1.16: Complete Fifth Normal Form
Note: There is a huge similarity between Fourth and
Fifth normal form, i.e. they address the problem of “Multi-Valued
facts”.
“Fifth normal form” multi-valued facts are interlinked and “Fourth
normal form” values are independent. For instance in the above two
questions Supplier
/Product
and Supplier
/Location
are not linked. While in fifth form, the Dealer
/Product
/Companies
are completely linked.
Note: Arrrrggghhh yes there exists a sixth normal
form also. But note guys you can skip this statement. Just in case you
want to impress the interviewer...
If you want a relational system in conjunction with time, you use
sixth normal form. At this moment SQL Server does not support it
directly.
Twist: What is the relationship between Extent and Page?
Extent is a basic unit of storage to provide space for tables. Every
extent has a number of data pages. As new records are inserted new data,
pages are allocated. There are eight data pages in an extent. So as
soon as the eight pages are consumed, it allocates a new extent with
data pages.
While extent is basic unit storage from a database point of view,
page is a unit of allocation within extent.
Page has three important sections:
- Page header
- Actual data i.e. Data row
- Row pointers or Row offset
Page header has information like timestamp, next page number,
previous page number etc.
Data rows are where your actual row data is stored. For every data
row, there is a row offset which points to that data row.
Figure 1.17: General view of a Extent
Pages are contained in extent. Every extent will have around eight
data pages. But all the eight data pages are not created at once; they
are created depending on data demand. So when a page becomes full it
creates a new page, this process is called as “Page Split”.
Any SQL Server database is associated with two kinds of files: *.mdf
and *.ldf. *.mdf files are actual physical database
files where your data is stored finally. *.ldf (LOG) files are
actually data, which is recorded from the last time data was committed
in the database.
Figure 1.18: MDF and LDF files.
(Q) What is Collation in SQL Server?
Collation refers to a set of rules that determine how data is sorted
and compared. Character data is sorted using rules that define the
correct character sequence, with options for specifying
case-sensitivity, accent marks, kana character types, and character
width.
Figure 1.19: Collation according to language
Note: Different languages will have different sort
orders.
Case Sensitivity
If A
and a
, B
and b
,
etc. are treated in the same way, then it is case-insensitive. A
computer treats A
and a
differently because it
uses ASCII code to differentiate the input. The ASCII value of A
is 65, while a
is 97. The ASCII value of B
is
66 and b
is 98.
Accent Sensitivity
If a
and A
, o
and O
are treated in the same way, then it is accent-insensitive. A computer
treats a
and A
differently because it uses
ASCII code for differentiating the input. The ASCII value of a
is 97 and A
225. The ASCII value of o
is 111
and O
is 243.
Kana Sensitivity
When Japanese kana characters Hiragana and Katakana are treated
differently, it is called Kana sensitive.
Width Sensitivity
When a single-byte character (half-width) and the same character when
represented as a double-byte character (full-width) are treated
differently then it is width sensitive.
Yes, you can specify different collation sequence for both the
entities differently