July 21, 2023 at 7:50 pm
Hello,
Problem: in a dwh table, we are getting duplicates based on some fields that represent the primary key of a table, but this is not a true duplicate cause one of the fields come with lower/upper case, and should be considered as sensitive, the issue is that our database is insensitive case so we got this as duplicate in queries.
did you have some idea or already been in that case? i am thinking about using an ID that will concatenate the fields of the primary key, and hashing the one that could come with upper/lower case...
The result impact also data in Power BI.
Thank you for help.
July 21, 2023 at 7:59 pm
You can create a column on a database that is not case sensitive to be case sensitive
CREATE TABLE MyTable
(
MyColumn VARCHAR(50) COLLATE SQL_Latin1_General_CP1_CS_AS
)
July 21, 2023 at 8:05 pm
yes but if we don't want to change the collation of the column of our table, what is the alternative? the table is a fact.
July 21, 2023 at 8:36 pm
You can change the collation of a column with an alter statement:
ALTER TABLE MyTable
ALTER COLUMN MyColumn VARCHAR(50) COLLATE SQL_Latin1_General_CP1_CS_AS
You could add a computed column to your table that uses the BINARY_CHECKSUM function. This function returns the binary checksum of the passed in values - meaning it will return different values for 'ABC' and 'abc'. You could then include this computed column as part of your primary key. Here's an example:
ALTER TABLE MyTable
ADD MyColumn_Checksum AS BINARY_CHECKSUM(MyColumn);
And then you would alter your primary key to include MyColumn_Checksum. Note that this would require dropping and recreating your primary key, which could have significant performance implications.
Similar to the computed column, but instead of using BINARY_CHECKSUM, you could use a hash function like HASHBYTES. This would give you a binary hash of your column that you could then include in your primary key.
ALTER TABLE MyTable
ADD MyColumn_Hash AS HASHBYTES('SHA2_256', MyColumn) PERSISTED;
July 21, 2023 at 9:52 pm
thank you! i will think about your suggestion
July 23, 2023 at 1:14 am
Provide more details about the data in the table. Is the data in the table double counting? As an alternative try to fix the ETL process.
=======================================================================
July 24, 2023 at 11:25 am
...
,MyChecksum as (checksum([host_name],[program_name] COLLATE Latin1_General_CS_AS,[DbName] ,[nt_user_name],[login_name],[original_login_name],[is_dac])) persisted
, MyHashBytes HASHBYTES('SHA2_256',concat([host_name],[program_name] COLLATE Latin1_General_CS_AS,[DbName] ,[nt_user_name],[login_name],[original_login_name],[is_dac])) persisted
Make that column persisted and it will only caltulate the value once if the used columns values change. ( and not on every select query )
Johan
Learn to play, play to learn !
Dont drive faster than your guardian angel can fly ...
but keeping both feet on the ground wont get you anywhere :w00t:
- How to post Performance Problems
- How to post data/code to get the best help[/url]
- How to prevent a sore throat after hours of presenting ppt
press F1 for solution, press shift+F1 for urgent solution 😀
Need a bit of Powershell? How about this
Who am I ? Sometimes this is me but most of the time this is me
Viewing 7 posts - 1 through 6 (of 6 total)
You must be logged in to reply to this topic. Login to reply