SQL Server DO's and DON'Ts
Do you always need to plan about normalization?
We all know normalizing the DBs is the norm in any database. By normalizing the DB, it saves storage, reduces redundancy, makes updates faster etc. But we also keep in mind that a few large organizations deliberately keep some of their databases denormalized. So, when is it a good idea to keep the DB denormalized?
- If the database is being used only for reporting/analysis. This implies that there is an already a normalized database being used for OLTP, which is periodically synchronized to the analysis database through ETL.
- The data is "control data". There's usually no point in normalizing data that isn't actually business data.
Please remember that it's not true that experts always choose a normalized design but I think experts don't just follow a mantra. They analyze, discuss, clarify, based on the need and then they choose whatever approach makes the most sense for their particular situation.
Don’t use a cursor unless there is no other alternative
SQL programmers (came from front end background 😊) find in cursors a comfortable and familiar way of coding. But unfortunately, this leads to bad performance.
Please remember that T-SQL was not done with loop performance in mind. Cursor is a common approach for developers who think like application servers. When you’re running hundreds of cursor operations per minute – this can quickly run into higher CPU use and locking problem.
In the database world, you’re much better off performing set-based operations: working on all of the data at once, in a single statement.
But since cursor is there in SQL and if you need to use it, use it then.
Don’t do SELECT *
Very often under stress and rush we use select * clause in our stored procedure, which is a bad practice. We should specify the column name which we would like to use. It reduces memory consumption, network bandwidth and also eases security design.
Know how your data is being used and plan indexes accordingly
If you know your data well then only you can plan the indexes well. If you don’t know the data then you will end up creating bad indexes which will result in problems either in the form of bad performance or storage.
When deciding the indexes first analyze how the table is being used, whether it’s a transaction table or reporting table. If it's a reporting table determine which columns are being used in where clause and then decide the proper indexing.
Don’t count all rows if you only need to see if data exists
I can't tell you how often I've seen SELECT COUNT (*) FROM TABLE1 just to check that whether a record exists in the table or not. If you need to check the existence of records rather use If EXISTS (SELECT 1 FROM TABLE1).
Use Transactions wherever possible
A transaction is a set of operations performed so all operations are guaranteed to succeed or fail together as one. Its all or none. Transactions are important to maintain data integrity.
Don’t do negative searches
Avoid negative search if you can. For ex. SELECT * FROM Customers WHERE RegionID <> 3
.
The above query will not be able to use indexes wisely and do full table scan and give performance problems.
Optimal use of Data type
With design tables use data types wisely. For example, if you need to store numeric data decide if INT data type is needed or if SMALLINT would be good. If you are using VARCHAR what should be the size of VARCHAR? Avoid VARCHAR(MAX) if it's not needed. Avoid TEXT, if VARCHAR can work.
DO Use Referential Integrity
Referential Integrity is a set of constraints applied to foreign key which prevents entering a row in child table (where you have foreign key) for which you don't have any corresponding row in parent table; i.e., entering NULL or invalid foreign keys. Referential Integrity prevents your table from having incorrect or incomplete relationships. Use proper referential integrity while designing the tables.
Do you want to use ORM" If the answer is yes, then use stored procedures
Though the latest ORM tools improved a lot, still I feel ORM code generators can’t possibly write SQL as well as a person who knows SQL. However, if you use an ORM, write your own stored procedures and have the ORM call the stored procedure instead of ORM writing its own queries.
Using a stored procedure will reduce network traffic drastically, stored procedures are easier to trace in Profiler. It's very easy to get performance statistics on a stored procedure than on an ad-hoc query and, in turn, find performance issues and draw out performance issues.