Database and SQL Server FAQ: Part I

Article

Introduction

In this article, we will go through the most basic and frequently asked questions on SQL Server. It is written as a series of potential questions that you may encounter during a job interview. A great candidate can be confused by simple variations in SQL Server questions, so this FAQ is the first in a series that I will be writing and hope you will find helpful.

Interview Question on Database Concepts

(Q) What is a database or database management system (DBMS)?

(Q) What is the difference between DBMS and RDBMS?

(Q) What are COD rules?

(Q) Is an Access database an RDBMS?

(Q) What is the main difference between Access and SQL Server?

(Q) What is the difference between MSDE and SQL Server 2000?

(Q) What is SQL Server Express 2005 Edition?

(Q) What is SQL Server 2000 Workload Governor?

(Q) What is the difference between SQL Server 2000 and 2005?

(Q) What are E-R diagrams?

(Q) How many types of relationships are available in database designing?

(Q) What is normalization? What are different type of normalization?

(Q) What is denormalization? Can you explain Fourth Normal Form? Can you explain Fifth Normal Form?

(Q) What is the difference between Fourth and Fifth Normal Form?

(Q) Have you heard about Sixth Normal Form?

(Q) What is Extent and Page?

(Q) What are the different sections in Page?

(Q) What are page splits?

(Q) In which files does actually SQL Server store data?

(Q) Can we have a different collation for database and table?

(Q) What is database or database management systems (DBMS)?

Twist: What is the difference between a (flat) file and a database? Can files qualify as a database?

Note: Though very basic to an experienced SQL Server guru, it can be the difference in a less experienced person getting the job.

A database provides a systematic and organized way of storing, managing and retrieving from a collection of logically related information.

Secondly, the information has to be persistent, which means even after the application is closed the information should be stored in some way.

Finally, it should provide an independent way of accessing data and should not be dependent on the application to access the information.

Permit me to spend a few sentences more explaining the third aspect. Below is a simple figure of a text file that has detailed personal information. The first column of the information is Name; Second, address; and finally, the phone number. This is a simple text file, which was designed by a programmer for a specific application.

Figure 1.1: Non-Uniform Text File

It works fine in the boundary of the application; however some years later, a third party application has to be integrated with this file, so in order for the third party application to integrate properly, it must use one of the following options:

Use interface of the original application.
Understand the text file structure (i.e. column layout) and then write code that can parse the file.

That is the main difference between a simple file and database. The database has an independent way (SQL) of accessing information, while simple files do not. A file meets the storing, managing and retrieving part of a database, but not the independent way of accessing the data.

Note: Many experienced programmers think that the main difference is that a file cannot provide multi-user capabilities, which a DBMS provides. But if you look at some old COBOL and C programs where the file was the only means of storing data, you can see functionalities like locking and multi-user access handled very efficiently. It is a matter of debate whether interviewers will agree that this is the main difference between files and databases, although going into a debate with the interviewer probably will mean losing the job.

Just a note for the less experienced, multi-user capability/access means the ability for multiple users to simultaneously add, update, view and delete data. All DBMS provides this as built-in functionality; however, if you are storing information in files, it is up to the application to manage the access.

(Q) What is difference between DBMS and RDBMS?

DBMS provides a systematic and organized way of storing, managing and retrieving from collection of logically related information. RDBMS provides this and additionally provides relationship integrity. So in short, we can say

RDBMS = DBMS + REFERENTIAL INTEGRITY

Example in above figure 1.1, every person should have an address. This is referential integrity between "Name" and "Address". If we break this referential integrity in DBMS or in flat files, there will be no error message, but an RDBMS will not allow you to save this data if you have defined the relation integrity between person and addresses. These relations are defined by using "Foreign Keys" in any RDBMS.

Many DBMS companies claimed their DBMS product was RDBMS compliant, but according to industry rules and regulations, the DBMS must fulfill the twelve CODD rules to truly be an RDBMS. Almost all DBMSs (SQL SERVER, ORACLE etc) fulfill all the twelve CODD rules and are considered RDBMSs.

Note: One of the biggest debates, is Microsoft Access an RDBMS? We will be answering this question in a later section.

(Q) What are CODD rules?

Twist: Does SQL SERVER support all twelve CODD rules?

Note: This question can only be asked on three conditions: the interviewer is looking for a DBA, the interviewee is known to be less experienced or the interviewer treats CODD rules as a religion. We will try to answer this question from perspective of SQL Server.

In 1969, Dr. E. F. Codd wrote twelve rules to which a DBMS should adhere in order to be considered a true RDBMS.

Rule 1: Information Rule.

"All information in a relational data base is represented explicitly at the logical level and in exactly one way by values in tables." In SQL Server all data exists in tables and are accessed only by querying the tables.

Rule 2: Guaranteed access Rule.

"Each and every datum (atomic value) in a relational data base is guaranteed to be logically accessible by resorting to a combination of table name, primary key value and column name."

In flat files, we have to parse and know the exact location of field values. But if a DBMS is truly RDBMS, you can access the value by specifying the table name and field name, for instance Customers.Fields ['Customer Name']

SQL Server also satisfies this rule in ADO.NET where we can access field information using table names and field names.

Rule 3: Systematic treatment of null values.

"Null values (distinct from the empty character string or a string of blank characters and distinct from zero or any other number) are supported in fully relational DBMS for representing missing information and inapplicable information in a systematic way, independent of data type."

In SQL Server if there is no data existing NULL values are assigned to it. Note NULL values in SQL Server do not represent spaces, blanks or a zero value; it is a distinct representation of missing information and thus satisfying Rule 3 of CODD.

Rule 4: Dynamic online catalog based on the relational model.

"The data base description is represented at the logical level in the same way as ordinary data, so that authorized users can apply the same relational language to its interrogation as they apply to the regular data."

The Data Dictionary is held within the RDBMS. Thus, there is no need for off line volumes to tell you the structure of the database.

Rule 5: Comprehensive Data Sub Language Rule.

"A relational system may support several languages and various modes of terminal use (for example, the fill in the blanks mode). However, there must be at least one language whose statements are expressible, per some well-defined syntax, as character strings and that is comprehensive in supporting all the following items:

Data Definition
View Definition
Data Manipulation (Interactive and by program)
Integrity Constraints
Authorization
Transaction boundaries (Begin, commit and rollback)

SQL Server uses SQL to query and manipulate data, which has well defined syntax and is being accepted as an international standard for RDBMS.

Note: According to this rule CODD has only mentioned that some language should be present to support it, but not necessary that it should be SQL. Before the 80's, different database vendors where providing there own flavor of syntaxes until in 1980 when ANSI SQL became standardized between vendors. As ANSI SQL is quite limited, every vendor including Microsoft introduced there additional SQL syntaxes to supplement ANSI SQL. You can see varying SQL syntaxes from vendor to vendor.

Rule 6: View-updating Rule

"All views that are theoretically updatable are also updatable by the system."

In SQL Server, views can be updated by user and by SQL Server, itself.

Rule 7: High-level insert, update and delete.

"The capability of handling a base relation or a derived relation as a single operand applies not only to the retrieval of data but also to the insertion, update and deletion of data."

SQL Server allows you to update views that in turn affect the base tables.

Rule 8: Physical data independence.

"Application programs and terminal activities remain logically unimpaired whenever any changes are made in either storage representations or access methods."

Any application program (C#, VB.NET, VB6, VC++ etc) does not need to be aware of where the SQL Server is physically stored or what type of protocol it is using; the database connection string encapsulates everything.

Rule 9: Logical data independence.

"Application programs and terminal activities remain logically unimpaired when information preserving changes of any kind that theoretically permit unimpairment are made to the base tables."

Application programs written in C# or VB.NET do not need to know about any structure changes in SQL Server database, for example inserting a new field, etc.

Rule 10: Integrity independence.

"Integrity constraints specific to a particular relational data base must be definable in the relational data sub language and storable in the catalog, not in the application programs."

In SQL Server, you can specify data types (integer, nvarchar, Boolean etc), which puts in data type checks in SQL Server rather than through application programs.

Rule 11: Distribution independence.

"A relational DBMS has distribution independence."

SQL Server can spread across more than one physical computer and across several networks; but in application programs you simply specify the SQL Server name and the computer on which it is located.

Rule 12: Non-subversion Rule.

"If a relational system has a low level (single record at a time) language, that low level cannot be used to subvert or bypass the integrity Rules and constraints expressed in the higher level relational language (multiple records at a time)."

In SQL Server whatever integrity rules are applied on every record are also applicable when you process a group of records using application program in any language (example: C#, VB.NET, J# etc...).

CODD Summary

Readers can see from the above explanation that SQL Server satisfies all the CODD rules. Some database gurus consider SQL Server to not truly be an RDBMS, but that is a matter of debate.

(Q) Is Access database a RDBMS?

Access fulfills all rules of CODD, so from this point of view, yes, it is truly RDBMS. However, be aware that a large community of Microsoft Professionals thinks that Access is not.

(Q) What is the main difference between Access and SQL Server?

Access fulfills all the CODD rules and behaves as a true RDBMS. But there is a huge difference from an architectural perspective, which explains why many developers prefer to use SQL Server as major database. Following is a list of architectural differences between them:

Access uses file server design and SQL Server uses the Client / Server model. This is the major difference between them.

Note: Just to clarify what is client server and file server, here is a quick description of widely accepted architectures.

There are three types of architecture:

Mainframe architecture (This is not related to the above explanation but just mentioned it as it can be useful during interview and also for comparing with other architectures)
File sharing architecture (Access).
Client / Server architecture (SQL Server).

In Mainframe architecture, all the processing happens on a central host server. The user interacts through a dumb terminal that sends keystrokes and information to the host. All the main processing happens on the central host server. An advantage of this type of architecture is that you need minimally configured clients with little processing power. But the disadvantage is that you need a robust central host server like a mainframe.

In File sharing architecture, which is used by Access, all the data is sent to the client terminal and then processed. For instance if you want to see customers who stay in India, in File Sharing architecture all customer records will be sent to the client PC regardless of whether the customer belongs to India or not. On the client PC, customer records from India are sorted/filtered out and displayed, so all the processing logic happens on the client PC. In this architecture the client PC requires a significantly powerful configuration and network traffic is considerable as the entire data set is sent to the client PC. An advantage of this architecture is that your server can be a very basic configuration.

Figure 1.2: File Server Architecture of Access

In client/server architecture the above limitation of the file server architecture is removed. In Client server architecture, you have two entities: client and database server. The file server is now replaced by a database server. The database server takes up the load of processing any database queries and the client, any validation aspect of database. As the work is distributed between the entities, it increases scalability and reliability. Second, the network traffic also comes down as compared to file server. For example, if you are requesting customers from India, the database server will sort/ filter and send only Indian customer details to the client, thus reducing the network traffic tremendously. SQL Server follows the client-server architecture.

Figure 1.3: Client Server Architecture of SQL SERVER

Second issue comes in terms of reliability. In Access the client directly interacts with the access file; in the event of some problem mid-transaction, there is a chance that the Access file can become corrupt. In SQL Server the engine sits in between the client and the database, so in case of any problems in middle of the transaction it can revert back to its original state.

Note: SQL Server maintains a transaction log by which you can revert back to your original state in case of any crash.
When your application has to cater to huge load demand, highly transactional environment and high concurrency, then it is better to use SQL Server or MSDE.
When it comes to cost and support, the Access is better than SQL Server. In case of SQL Server, you have to pay for per client license, but Access runtime is free.

Summarizing: SQL Server gains points in terms of network traffic, reliability and scalability; in contrast, Access gains points in terms of cost.

(Q) What is the difference between MSDE and SQL Server 2000?

MSDE is a royalty free, redistributable and cut short version of the giant SQL Server database. It is primarily provided as a low cost option for developers who need database server, which can easily be shipped and installed. It can serve as good alternative for Microsoft Access database, as MSDE overcomes many of the concerns with Access.

Below is a complete list of differences:

Size of database: MS Access and MSDE have a limitation of 2GB, while SQL SERVER has 1,048,516 TB1.
Performance degrades in MSDE 2000 when maximum number of concurrent operations is greater than or equal to eight. It does not mean that you cannot have more than eight concurrent operations, but the performance degrades. The eight connection performance degradation is caused by a Workload Governor that is discussed later. In SQL Server 2000 you can have 32,767 concurrent connections.
MSDE does not provide OLAP and Data warehousing capabilities.
MSDE does not have support facility for SQL mail.
MSDE does not support full text search.
MSDE 2000 does not have GUI administrative tool such as enterprise manager, query analyzer or Profiler. But there are round about ways by which you can manage MSDE 2000:

Old command line utility OSQL.EXE.

VS.NET IDE Server Explorer: Inside VS.NET IDE you have a functionality which can give you a nice GUI administrative tool to manage IDE

SQL Server Web Data Administrator installs a web based GUI which you can use to manage your database.
For any details refer to http://www.microsoft.com/downloads/details.aspx?familyid=C039A798-C57A-419E-ACBC-2A332CB7F959&displaylang=en

SQL DMO objects can be used to build your custom UI.

There are also many third party tools that provide administrative capability GUI.

Summarizing: There are two major differences: the size limitation (2 GB) of database and the concurrent connections (eight concurrent connections) which are limited by using the work load governor. During interview this answer will suffice if he is really testing your knowledge.

(Q) What is SQL Server Express 2005 Edition?

Twist: What is difference between SQL Server Express 2005 and MSDE 2000?

Note: Typical comparison is when the product is migrating from one version to the other version. When SQL Server 7.0 was migrating to SQL 2000, asking differences was one of the favorite questions.

SQL Server Express edition is a scaled down version of SQL Server 2005 and the next evolution of MSDE 2000.

Below listed are some major differences between them:

MSDE maximum database size is 2GB while SQL Server Express has around 4GB.
In terms of programming language support MSDE has only TSQL, but SQL Server Express has TSQL and .NET. In SQL Server Express 2005, you can write your stored procedures using .NET.
SQL Server Express does not have the connection limitation MSDE had.
There was no XCOPY support in MSDE; SQL SERVER Express has it.
DTS is not present in SQL Server express; MSDE has it.
SQL Server Express has reporting services; MSDE does not.
SQL Server Express has native XML support; MSDE does not.
You can create a field with data type XML.
You can provide Schema to the SQL Server fields with XML data type.
You can use the new XML manipulation techniques like "XQUERY" also called "XML QUERY".
There is a complete chapter on SQL Server XML Support so until then this will suffice.

Summarizing: The major difference is database size (2 GB vs. 4 GB), support of .NET support in stored procedures and native support for XML. This should convince the interviewer that you are clear about the differences.

(Q) What is SQL Server 2000 Workload Governor?

Workload governor limits performance of the SQL Server Desktop engine (MSDE) if the SQL engine receives more load than what is meant for MSDE. MSDE was always meant for trial purpose and non-critical projects. Microsoft always planned for companies to buy the full version of SQL Server so they introduced this limitation on MSDE performance and the number of connections through the Workload Governor.

The workload governor sits between the client and the database engine and counts the number of connections per database instance. If the workload governor finds that the number of connections exceeds eight connections, it starts stalling the connections and slowing down the database engine.

Note: It does not limit the number of connections but makes the connection request go slow. By default, 32,767 connections are allowed both for SQL Server and MSDE. But it just makes the database engine go slow above eight connections.

(Q) What is the difference between SQL Server 2000 and 2005?

Twist: What is the difference between Yukon and SQL Server2000?

Note: This question will be one of the favorites during SQL Server Interviews. I have marked the points for developers as PG and DBA for Database Administrator.

Following are some major differences between the two versions:

The most significant change is the .NET integration with SQL Server 2005. Stored procedures, User-defined functions, triggers, aggregates, and user defined types can now be written using your own favorite .NET language (VB.NET, C#, J# etc .). This support was not there in SQL Server 2000 where the only language was T-SQL. In SQL Server 2005, you have support for two languages T-SQL and .NET.
SQL Server 2005 has reporting services for reports which is a newly added feature and does not exist for SQL Server 2000. It was a separate installation for SQL Server 2000.
(PG) SQL Server 2005 has introduced two new data types: varbinary (max) and XML. If you remember in SQL Server 2000, we had image and text data types. The problem with image and text data types is that they assign the same amount of storage irrespective of the actual data size. This problem is solved using varbinary (max) which adjusts depending on size of the data. The other new data type, XML, enables you to store XML documents and perform schema verification. In SQL Server 2000, developers used varchar or text data type, and all validation had to be done programmatically.
SQL Server 2005 can now process direct incoming HTTP request with out IIS web server. Also, stored procedure invocation is enabled using the SOAP protocol.
Asynchronous mechanism is introduced using Server events. In the Server event model the Server posts an event to the SQL Broker service; while waiting, the client can request the status by querying the broker.
For huge databases, SQL Server has provided a cool feature called Data Partitioning. In data partitioning you break a single database object such as a table or index into multiple pieces. For the client application, the data continues to appear as a single database object (i.e. the partitions are transparent).
In SQL Server 2000 if you rebuilt clustered indexes, even the non-clustered indexes where rebuilt. But in SQL Server 2005 building the clustered indexes does not re-build the non-clustered indexes.
Bulk data uploading in SQL Server 2000 was done using BCP (Bulk copy programs) format files. Now in SQL Server 2005 bulk, data uploading uses XML file format.
In SQL Server 2000 there where maximum 16 instances, but in 2005 you can have up to 50 instances.
Server 2005 has support of "Multiple Active Result Sets" also called "MARS". In previous versions of SQL Server 2000, one connection can only have one result set. With 2005 one SQL connection can query and have multiple results set.
In previous versions of SQL Server 2000, system catalog was stored in master database. In SQL Server 2005 it is stored in a resource database that is stored as a sys object; you cannot access the sys object directly as in the older version when we where accessing the master database.
There is one hardware benefit held by SQL Server 2005 called hyper threading. Windows 2003 supports hyper threading and SQL Server 2005 can take advantage of it.

Note: Hyper threading is a technology developed by INTEL which creates two logical processors on a single physical hardware processor.
SMO will be used for SQL Server Management.
AMO (Analysis Management Objects) is used to manage Analysis Services servers, data sources, cubes, dimensions, measures, and data mining models. You can map AMO in old SQL Server with DSO (Decision Support Objects).
Replication is now managed by RMO (Replication Management Objects).

Note: SMO, AMO and RMO are all using .NET Framework.
SQL Server 2005 uses current user execution context to check rights rather than ownership link chain, which was done in SQL Server 2000.

Note: There is a question on this later; see Execution Context questions.
In previous versions of SQL Server the schema and the user name was the same, but in current, the schema is separated from the user. Now the user owns the schema.

Note: There are questions on this; see Schema.
Query analyzer is now replaced by query editor.
Business Intelligence development studio will be used to create Business Intelligence solutions.
OSQL and ISQL command line utility is replaced by SQLCMD utility.
SQL Server Enterprise manager is now replaced by SQL Server Management studio.
Server Manager which was running in system tray is now replaced by SQL Computer Manager.
Database mirror is new to SQL Server 2005.
In SQL Server 2005 Indexes can be rebuild online while the database is in production. Previously you could not perform insert, update, and delete operations when you are building indexes.
In addition to Serializable, Repeatable Read, Read Committed, and Read Uncommitted isolation level, there is one new isolation level called "Snapshot Isolation level".

Note: We will see "Snapshot Isolation level" in detail in coming questions.

Summarizing: The major significant difference between SQL Server 2000 and SQL Server 2005 is in terms of support of .NET Integration, Snapshot isolation level, Native XML support, handling HTTP request, Web service support and Data partitioning. You do not have to really give all the above points during an interview; provide a sweet summary, and you will rock.

(Q) What are E-R diagrams?

E-R diagram, also termed as Entity Relationship diagram, shows relationship between various tables in the database. Example: Tables "Customer" and "Customer Addresses" have a one to many relationships (i.e. one customer can have multiple addresses) this can be shown using the ER diagram. ER diagrams are drawn during the initial stages of project to forecast how the database structure will shape up. Below is a screen shot of a sample ER diagram of "Asset Management" which ships free with Access.

Figure 1.4: Asset management ER diagram.

(Q) How many types of relationship exist in database designing?

There are three major relationship models:

One to one: For every one record in the Asset table, there is only one corresponding record in the Status table. Note the arrow on the Status side of the relationship implying that for every asset there is one status.

Figure 1.5 : One to One relationship ER diagram
One to many: Many records in the Maintenance table corresponds to only one record in the Assets table. Example: Every one customer can have multiple sales. So there exists one customer to many sales relationships between customer and sales table. Note the Infinity symbol on the Maintenance side of the relationship.

Figure 1.6: One to Many Relationship ER diagram
Many to many: In this, one record in one table corresponds to many rows in other table and also vice versa. For instance, in a company one employee can have many skills like java , c# etc and also one skill can belong to many employees.

Given below is a sample of many to many relationship. One employee can have knowledge of multiple "Technology". So in order to implement this we have one more table "Employee Technology", sometimes called a Join table, which is linked to the primary key of "Employee" and "Technology" table. Note no arrows or infinity symbols on the graphical relationship connectors.

Figure 1.7: Many to Many Relationship ER diagram

(Q) What is normalization? What are different types of normalization?

Note: A regular .NET programmer working on projects often stumbles on this question. The bad part is that sometimes an interviewer can consider this a very basic question and it can be a turning point for the interview. Let's cram it.

Normalization is set of rules that have been established to aid in the design of tables that are meant to be connected through relationships.

Benefits of Normalizing your database include:

Avoiding repetitive entries.
Reducing required storage space.
Preventing the need to restructure existing tables to accommodate new data.
Increased speed and flexibility of queries, sorts, and summaries.

Note: Often you normalize databases to Fifth normal form; nonetheless, explaining Third normal form will put you in decent shape during the interview.

Following are the first three normal forms:

First Normal Form

For a table to be in first normal form, data must be broken up into the smallest units possible.In addition to breaking data up into the smallest meaningful values, tables in first normal form should not contain repeated groups of fields.

Figure 1.8: Repeating groups example

For in the above example city1 and city2 are repeating. In order, these tables to be in First normal form you have to modify the table structure as follows. Also note that the Customer Name is now broken down to first name and last name (First normal form data should be broken down to smallest unit).

Figure 1.9: Customer table normalized to first normal form

Second Normal form

The second normal form states that each field in a multiple field primary key table must be directly related to the entire primary key. In other words, each non key field should be a fact about all the fields in the primary key.

In the above table of customer, city is not linked to any primary field.

Figure 1.10: First Normalized Form customer table.

Figure 1.11: City is now shifted to a different master table.

That takes our database to a second normal form.

Third normal form

A non-key field should not depend on other non-key fields. The field "Total" is dependent on "Unit price" and "qty".

Figure 1.12: Third Normal Form

So now the "Total" field is removed and becomes a calculated field equal to (Unit Price * Qty).

(Q) What is denormalization?

Denormalization is the process of putting one fact in numerous places (its vice versa of normalization). Only one valid reason exists for denormalizing a relational design and that is to enhance performance by reducing joins. The sacrifice to performance is that you increase redundancy in database.

(Q) Can you explain Fourth Normal Form?

Note: Whenever the interviewer is trying to go above third normal form, it can be for two reasons, either their ego or to test you. Three normal forms are really enough, practically anything more than that is overkill.

In fourth normal form, it should not contain two or more independent multi-valued facts about an entity and it should satisfy "Third Normal form".

So let us try to see what multi-valued facts are. If there are two or more many-to-many relationship in one entity and they tend to come to one place, it is termed as "multi-valued facts".

Figure 1.13: Multi-valued facts

In the above table, you can see there are two many-to-many relationship between "Supplier" / "Product" and "Supplier" / "Location" (or in short, multi-valued facts). In order for the above example to satisfy fourth normal form, both the many-to-many relationships should go in different tables.

Figure 1.14: Normalized to Fourth Normal form.

(Q) Can you explain Fifth Normal Form?

Note: If you get this question, after joining the company you should ask him if he really uses it.

Fifth normal form deals with reconstructing information from smaller pieces of information. These smaller pieces of information can be maintained with less redundancy.

Example: "Dealers" sells "Product" which can be manufactured by various "Companies". "Dealers" in order to sell the "Product" should be registered with the "Company". So these three entities have very much mutual relationship within them.

Figure 1.15: Not in Fifth Normal Form.

The above table shows some sample data. If you observe closely a single record is created using lot of pieces of information. For instance: "JM Associate" can sell sweets in the following two conditions:

"JM Associate" should be an authorized dealer of "Cadbury".
Sweets should be manufactured by "Cadbury" company.

These two smaller pieces of information form one record in the above given table. So in order for the above information to be "Fifth Normal Form", all the smaller information should be in three different places. Below is the complete fifth normal form of the database.

Figure 1.16: Complete Fifth Normal Form

(Q) What is the difference between Fourth and Fifth normal form?

Note: There is huge similarity between Fourth and Fifth normal form (i.e. they address the problem of "Multi Valued facts"). "Fifth normal form" multi-valued facts are interlinked and "Fourth normal form" values are independent. For instance in the above two questions, "Supplier/Product" and "Supplier/Location" are not linked. While in fifth form, the "Dealer/Product/Companies" are completely linked.

(Q) Have you heard about Sixth Normal form?

Yes there exists a sixth normal form also. Feel free to skip this answer, but in case you want to impress the interviewer: If you want a relational system in conjunction with time, you use sixth normal form. Currently SQL Server does not support sixth normal form.

(Q) What is Extent and Page?

Twist: What is the relationship between Extent and Page?

Extent is a basic unit of storage to provide space for tables. Every extent has number of data pages. As new records are inserted new data pages are allocated. There are eight data pages in an extent. So as soon as the eight pages are consumed it allocates a new extent with data pages. While extent is basic unit storage from database point of view, page is a unit of allocation within extent.

(Q) What are the different sections in Page?

Page has three important sections:

Page header
Actual data (i.e. Data row)
Row pointers or Row offset

Page header has information like timestamp, next page number, previous page number etc. Data rows are where your actual row data is stored. For every data row there is a row offset that points to that data row.

Figure 1.17 : General view of a Extent

(Q) What are page splits?

Pages are contained in an extent. Every extent will have around eight data pages. But all the eight data pages are not created at once; it's created depending on data demand. When a page becomes full, it creates a new page and this process is called a "Page Split".

(Q) In which files does SQL Server actually store data?

Any SQL Server database is associated with two kinds of files: .MDF and .LDF. The MDF files are actual physical database files where your data is stored while LDF (LOG) files are LOG data that is recorded from the last time data was committed in database.

Figure 1.18: MDF and LDF files.

(Q) What is Collation in SQL Server?

Collation refers to a set of rules that determine how data is sorted and compared. Character data is sorted using rules that define the correct character sequence with options for specifying case-sensitivity, accent marks, kana character types and character width.

Figure 1.19: Collation according to language

Note: Different languages will have different sort orders.

Case sensitivity

If A and a, B and b, etc. are treated in the same way then it is case insensitive. A computer treats A and a differently because it uses ASCII code to differentiate the input. The ASCII value of A is 65, while a is 97. The ASCII value of B is 66 and b is 98.

Accent sensitivity

If a and A, o and O are treated in the same way, then it is accent insensitive. A computer treats a and A differently because it uses ASCII code for differentiating the input. The ASCII value of a is 97 and à 225. The ASCII value of o is 111 and ò is 243.

Kana Sensitivity

Japanese Kana characters, Hiragana and Katakana, are treated differently, it is called Kana sensitive.

Width sensitivity

When a single byte character (half-width) and the same character when represented as a double byte character (full width) are treated differently, then it is width sensitive.