Wednesday, May 11, 2011

Comparing two database tables for equality

Binary_Checksum can be used to compare two tables for equality.
SELECT CHECKSUM_AGG(BINARY_CHECKSUM(*)) from dbo.[Table1]
SELECT CHECKSUM_AGG(BINARY_CHECKSUM(*)) from dbo.[Table2]

But there are some concerns about the algorithm that it might not always be unique

Example:

select binary_checksum('A') ,binary_checksum('AAAAAAAAAAAAAAAAA')

The out put is 65 and 65

select binary_checksum('AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA A') ,binary_checksum('AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAA')

The out put is 577 and 577.

So the other way to compare two tables cell by cell would be to write a query like this


SELECT COUNT(*) as cnt, MIN(ViewName) as TableName, col1, col2, ...
FROM
(
SELECT '[DataBase1].dbo.[TableName or ViewName goes here]' as ViewName,
*
FROM [DataBase1].dbo.[TableName or ViewName goes here] A
UNION ALL
SELECT '[DataBase2].dbo.[TableName or ViewName goes here]' as ViewName,
*
FROM [DataBase2].dbo.[TableName or ViewName goes here] B
) tmp
GROUP BY ALL col1, col2, ...
HAVING COUNT(*)%2 <> 0


UNION concatenates two tables
GROUP BY groups the records by comparing cell to cell of all the columns listed and merges all the matching rows as a single row with COUNT(*) displaying the number of matching rows that have been merged.

If there is a row in View1/Table1 that matches the row in View2/Table2, then COUNT returns an even number (after the entire cell by cell comparison is performed by group by).

If there are any rows in View1/Table1 and View2/Table2 that do match each other, then after the merge performed by GROUP BY, COUNT(*) returns an odd number.

COUNT(*) is an even number when all the data cells in view1 match to view2 and when divided by 2 the reminder is 0. If any rows do not match exactly cell by cell, COUNT(*) is an odd number and when divided by 2 has a reminder.

Hence

HAVING COUNT(*)%2 <> 0
Returns all the mismatched rows. The query does not return any results if View1 and View2 match.

To get the list of columns if you are comparing views run this script:

DECLARE @Names VARCHAR(8000)
SELECT @Names = COALESCE(@Names + ', ', '') + c.name FROM sys.columns c, sys.views v
WHERE c.object_id = v.object_id
AND v.name = 'View Name here'
Print @Names










The out put is 577 and 577.

No comments: