Archive

Posts Tagged ‘Design’

ACID Characteristic of Databases

October 10, 2010 Leave a comment

So often we forget the most underlying reason for Database’s existence. Atleast I do at times. So here’s the ACID part of Databases:

- Atomic: either all of its data modifications are performed, or none of them is performed
- Consistent: once completed, a transaction must leave all data in a consistent state
- Isolation: Modifications made by concurrent transactions must be isolated from the modifications made by any other concurrent transactions
- Durability: After a transaction has completed, its effects are permanently in place in the system. The modifications persist even in the event of a system failure.

SQL Server Database Design Best Practices for Good Performance

September 18, 2010 Leave a comment

Here’s a quick list of some of the Design best practices that shall be considered for good performing SQL Server database. 
- Avoid online data column sorting while using Profiler
- Capture trace output using a server-side trace (not Profiler)
- Discard starting events while analyzing costly and slow queries
- To run Profiler for very short tracing, run the tool remotely
- Limit the number of events and data columns to be traced
- Limit trace output size

 

Details for each is as below.

Practice Description or Reasons
Adopt index-design best practices Some quick and generic guidelines:
- Choose narrow columns for indexes
- Ensure that the selectivity of the data in the candidate column is very high OR that the candidate column has a large number of UNIQUE values
- Prefer columns with the integer data type (or its variants). Avoid VARCHAR columns for Index
- For a multicolumn index, prefer column with higher selectivity toward the leading edge of the index
- Use the INCLUDE list in an index as a way to make an index covering without changing the index key structure
- While deciding the columns to be indexed, pay extra attention to the queries WHERE clauses and JOIN criteria columns
- For queries retrieving a range of rows, clustered indexes are usually better. For point queries, nonclustered indexes are usually better.

Clustered Index require more careful planning as they are referenced by Non-Clustered Indexes as well:
- Keep clustered indexes as narrow as possible
- Create the clustered index first, and then create the nonclustered indexes on the table
- If required, rebuild a clustered index in a single step using the DROPEXISTING keyword in the CREATE INDEX command (rather than dropping and then rebuilding it)
- Do not create a clustered index on a frequently updatable column. Or nonclustered indexes on the table will have difficulty in remaining in sync with clustered index key values

Avoid the use of the sp_prefix for stored procedure names Performance hit is due to the way an SP is looked upon within SQL Server, ie.
1. In the master database
2. In the current database based on any qualifiers provided – DB name or owner
3. In the current database using dbo as the schema, if a schema is not specified
Thus object is first searched for within MASTER DB if its named as sp_prefix. A SP_CacheMiss event is caused, and then searched for within correct DB.
Balance under- and over normalization

Target "Balanced" normalization – extremes impacts the performance.
Under normalization : Causes excessive repetition of data, which can lead to improper results and often hurts performance
Over normalization: causes excessive joins across too many narrow tables. Heuristic I follow is to more closely examine a query when it exceeds 6 to 8 tables in the join criteria.

Use domain and referential integrity constraints

Data Integrity: can be enforced by restricting the data type of the column, defining the format of the input data, and limiting the range of acceptable values for the column
Data Integrity enablers: data types, FOREIGN KEY constraint, CHECK constraints, DEFAULT definitions, and NOT NULL definitions

Referential Integrity: ensures that a record in the child table exists only if the corresponding record in the parent table exists (except where NULL is allowed on child table for identifier or linked column)
Referential Integrity enablers: PRIMARY KEY (on parent), FOREIGN KEY (on child)
With the declarative referential integrity in place, the optimizer is assured that for every record in the child table, the parent table contains a corresponding record. Therefore, the JOIN clause between the parent and child tables is redundant.

Use entity-integrity constraints

Using "primary key" or "natural key" are possible options – with both having pros and cons
Primary/ Identity Key:
- generally an INT or BIGINT, which is excellent for index
- unique constraint has to be used as well to avoid any dupes

Natural Key (e.g. Social Security Number):
- separating the value of the primary key from any business knowledge is considered good design
- can carry business meaning which is bad design practice for identifier
- are clear, and human readable and carry business meaning at same time
- are generally composite, and hence not so good index candidate
Primary Key and UNIQUE constraint provide entity-integrity, and contribute to performance by providing useful information to the optimizer about the expected results, assisting the optimizer in generating efficient execution plans

Minimize the use of triggers Trigger itself is not a problem. Unawareness of its presence is a problem. As a Dev might lead a change and trigger might not be handling it causing performance issues. Avoid Triggers, if not possible, document them to best and max extent.
Follow

Get every new post delivered to your Inbox.