Use SQL Server Checksum in Data Warehousing
When building data warehouses you often need a fast way to know if the data in the warehouse is different from the incoming source data. One method to do this is to create a SQL Server CHECKSUM column in the data warehouse table and another on the incoming table data. By creating a checksum you can compare the checksum value in the data warehouse tables to the checksum value in your staging table. If the values are different only then do you need to modify the data warehouse table.
For example, this script calculates a checksum column on the DimGeography table using all columns except [GeographyKey] in the AdventureWorkDW sample database.
SELECT TOP 10 [GeographyKey]
,[City]
,[StateProvinceCode]
,[StateProvinceName]
,[CountryRegionCode]
,[EnglishCountryRegionName]
,[SpanishCountryRegionName]
,[FrenchCountryRegionName]
,[PostalCode]
,[SalesTerritoryKey]
,checksum( city, StateProvinceCode, StateProvinceName, countryregioncode, EnglishCountryRegionName,
SpanishCountryRegionName, FrenchCountryRegionName, PostalCode, SalesTerritoryKey) as DW_Checksum
FROM [AdventureWorksDW2008R2].[dbo] .[DimGeography]
Results:
When loading new records into this table, comparing a Checksum on the source table that matches the same columns in the DimGeography (used in the script above) . If the DW_Checksum values match then you do not need to modify the record. If they are different then you know what records need to be updated. In larger tables creating the checksum as a physical column and adding an index will increase the speed of your loading process.
The post SQL Server Checksum appeared first on Derek E Wilson - Blog.