Archive

Posts Tagged ‘DBCC’

SQL Server DBCC Deep Dive

DBCC CHECKDB
- Only way to read and test all pages in DB
- By default its CPU bound (full check), but can be made IO bound using WITH PHYSICAL_ONLY
- A resource hog, and uses tempdb as well
- wont take locks unless specified WITH TABLOCK
- New Features starting 2005:
    (a) progress reporting using sys.dm_exec_reports
    (b) Data purity reports e.g. datetime correctness
    (c) last DBCC completed successfully
- New Features starting 2008:
    (a) WITH EXTENDED_LOGICAL_CHECKS

How to Run CHECKDB?
- Returns first 200 errors by default, use ALL_ERRORMSGS
- No harm in letting it complete even if its taking time; if running for too long, check for error 5628 to check if its rescheduling certain check

What CHECKDB Do
(1) primitive check for critical system tables (any problem here – game over)
(2) allocation checks (IAM pages etc)
(3) Logical checks of critical system tables (key order, index mapping, LOB pointer, file stream)
(4) logical check for all tables
(5) Service Broker data validation
(6) Metadata table checks
(7) Indexed view, XML index, spatial index checks (off by default)
NOTE that repairs are done at each stage. CheckDB creates a snapshot and runs the check on that for consistency. You can as well create a snapshot of your own and run DBCC on that (in case default site dont have space)

How CHECKDB checks as above:
- Primitive system table checks
- DBCC CHECKALLOC
- DBCC CHECKTABLE (system tables)
- DBCC CHECKTABLE
- DBCC CHECKCATALOG.
You might want to optionally run DBCC CHECKIDENT and CHECKCONSTRAINTS.
CHECKFILEGROUP: checks file and hosted tables and index. can be opted for after file based operations. 
CHECKTABLE: Dont use WITHOUT INDEX flag.
NOTE: – Size TempDB, use WITH_ESTIMATEONLY flag if needed. – If checkdb running for too long it might mean it has found something to work upon. Run CHECKDN as frequently as can.

Interpreting and Fixing Issue Using CheckDB
- If there are only a couple errors, check MSDN – If there are many errors, use some tips and tricks, as below:
(1) If checkdb doesn’t complete it tells why it couldn’t (also means you cant perform recovery, so restore). e.g. of fatal errors, 7984-7988 critical system table corrupt; 8967 invalid states in checkdb; 8930 corrupt metadata
(2) Are the error messages for NonClustered Index (NCI)? Run index rebuild. When rebuilding indexes, it references older records, so issue might persist. So drop and recreated.  Always rerun DBCC
(3) Unrepairable erros:
    (a) 8909, 8938, 8939 i.e. page header corruption – restore page from backup 
    (b) 8970 invalid data for column – work with BAs
    (c) 8992 checkcatalog error; find object ID, find catalog table having errorenous objectID, shutdown db, startup with -m: param,  delete the entry from errooneous table, rerun DBCC, if fixed startup without -m: flag
    (d) 2570 data purity error – find the page from DBCC CHECKDB output, run DBCC PAGE, try delete or update the records as appropriate

REPAIR_ALLOW_DATA_LOSS
- Beware of using it
- repairs physical strucuture w/o any consideration for data / biz logic; fastest mode of repair.
- Drop create for what cant be corrected
- Doesnt take into account: replication, mirroring, biz logic, data relationship, foreign key constraints
- Take backup before doing this
- After using it, run CHECKCONSTRAINTS and fix up replication topologies

 

Reference: SQL Server MCM Deep Dive Sessions

Tags:

SQL Server – Why Checksum was Introduced?

Torn Page Detection
- an 8K page = 16*512byte disk sectors
- its possible that a page is partially written when power goes off, i.e. torn page
- You cant detect corruptions within each disk sector, hence page checksum is preferred / used

Page Checksum
- checksum is written last within the same 4byte page header as used by torn page
- checksum is read first when reading a page. Every time a page is read, buffer pool calculates the checksum and compares to value stored in header
- Checksum is enabled by default in new installations, but for upgrade from 2000 it needs enabled manually
- Remember, its error detection, not error correction. – CPU overhead = 1 or 2% 
- Checksum errors lead to error 824. – tempdb page checksum is supported starting SQL2008
- Checksum is referenced for read, checkdb, backup with checksum, or if page is in checksum’-med backup
Note that page checksum starts playing only after its been re-written once with page checksum value (if page already existed).

Tags:

SQL Server Single Page Recovery

Last Wednesday a friend reached me with a request to help with DB corruption which they encountered from DBCC runs. It was a single page corruption, and was a SQL 2005 instance, UAT database (they had Prod backup, but it has had some data and plus it was a big database so they didn’t wanted to run with restore). I went around and talked it through, and eventually we together sorted it out. Though I have a log of all what we did from out command session, but I can’t release it for security reasons (and too lazy to work with those images right now). :-)

So, here’s the exact run of what we did instead:

DBCC CHECKDB <DB> WITH ALL_ERRORMSGS, NO_INFOMSGS;    – note down slot#, and Page details

ALTER DATABASE <DB> SET SINGLE_USER;   – to ensure things get messy when updating the page
DBCC PAGE (<DB>, 2, 145, 3);    – couldn’t find specific details, so need to run more specific
NOTE: if we would have found some details here, we would have scrolled across to the noted slot# and would have tried to find the issue there, but in our case it simply did any favors.

DBCC PAGE (<DB>, 2, 145, 2);
BACKUP LOG <DB> TO DISK = <location> WITH INIT;    
– backup the log, just for safety and to safeguard the log tail
RESTORE DATABASE <DB> PAGE = ’1:145′ DISK = <backup>;   
– single page recovery
RESTORE LOG <DB> FROM DISK=<location> WITH NORECOVERY;   
– restore the log tail
RESTORE DATABASE <DB> WITH RECOVERY;
DBCC CHECKDB                – important to run DBCC CHECKDB again, always recommended
ALTER DATABASE <DB> SET MULTI_USER;

Important aspects which should be highlighted here is, never panic when you see such page corruption or DBCC error messages, and when you think you have sorted it, validate it with DBCC CHECKDB again.

SQL Server : LogFile getting too big or not getting truncated!

There are so many times/ situations when you can find a SQL Server DBA (like myself) completely lost. A classical is: "My DB is STRANDED for a ridiculous reason – LogFile not truncating". Its simple yet complex, for the fact that it can be avoided most of the times (alright, no design lectures here) – but it gets really tricky at times. And imagine a Prodcution site Manager staring at you while you are glaring your desktop screen with that remote DB Connection and scratching your head carrying those odd expressions… :-)

Well, here’s a step down approach which I generally consider for self (OK; for my troubled DB rather) when am really trying to "solve" the problem rather than taking a random toll to fix it:

A. DB Recovery Model
The issue can be because of transaction logFile (obviously). Things to checkout:
- recovery mode of your DB (select name, recovery_model_desc from sys.databases)
- if its FULL, have you ever taken transactional log backup. If not, better switch the DB to SIMPLE recovery mode (ALTER DATABASE DBName SET RECOVERY SIMPLE)
- if its FULL and you have taken transactional log backup in past, then try looking up what is holding things up (select name, log_reuse_wait_desc from sys.databases;)

B. Long Running transactions
- identify open transactions and respective SPIDs will be returned (DBCC OPENTRAN)
- try to identify who is using those spid’s (exec sp_who2 <spid>) OR (SELECT * FROM sys.dm_exec_sessions WHERE session_id = <spid>;)
- try to find whats running as part of these SPIDs and kill them if needed/ possible. Use "DBCC INPUTBUFFER(spid)" OR try the script below:

SELECT r.session_id, r.blocking_session_id, s.program_name, s.host_name, t.text
FROM sys.dm_exec_requests r INNER JOIN sys.dm_exec_sessions s 
ON r.session_id = s.session_id CROSS APPLY sys.dm_exec_sql_text(r.sql_handle) t
WHERE
s.is_user_process = 1 AND
r.session_id = SPID –FROM DBCC OPENTRAN

C. Backup
LogFile truncation cannot happen during the backup/restore operation. You can create a transaction log backup when full/differential backup is occuring. BUT, you cant truncate the logFile while doing so.

D. Transactional Replication
With transactional replication, the inactive portion of the transaction log is not truncated until transactions have been replicated to the distributor.

E. Database Mirroring
Database mirroring is somewhat similar to transactional replication in that it requires that the transactions remain in the log until the record has been written to disk on the mirror server.

F. Disk Space
Might well be the reason that you are genuinely out of disk space
See if you want to add another logFile using (ALTER DATABASE DBName ADD LOG FILE;)

Once that you have truncated the logFile, you now will have to shrink the file to get it to manageable filesize.
- It can be done using EM (DB -> All Tasks -> Shrink -> LogFile) or commandLine by first identifying logFile name and then shrinking it:

select name from sys.database_files where type_desc =’LOG’; –List of logFile
DBCC SHRINKFILE (\’SalesHistory_Log\’, 1000) –shrinks the logFile to 1GB

NOTE: TRUNCATE_ONLY
One of the disastorous ways of dealing with issue and is not recommended by MS in SQLServer 2008 and onwards.
Doing so breaks the transaction log chain, which makes recovering to a point in time impossible because you have lost transactions that have occurred not only since the last transaction log backup but will not able to recovery any future transactions that occur until a differential or full database backup has been created.
BACKUP LOG SalesHistory WITH TRUNCATE_ONLY;
BACKUP LOG SalesHistory TO DISK = \’C:\Backup\SalesHistory.bak\’;  — to backup log to disk

Follow

Get every new post delivered to your Inbox.