May 17, 2022 at 3:21 pm
I work on sql server 2019 i have table have multi column string concatenation as StrSubstance,strmass ,strcasnumber
i need to create filed hashchemical with nvarchar(700) and this will store hash for 3 columns concatenation with each other
are this possible
what i mean
alter table [dbo].[fmdchemical] add hashchemical nvarchar(700) null
update ch set ch.hashchemical =HASHBYTES('SHA2_512',concate(StrSubstance,strmass,strcasnumber)) from [dbo].[fmdchemical] ch
so are hashing will be correct with nvarchar(700) and every field have length 3500
are this will make issue on the feature with big counts
also are hash every column alone then compare it or concate all these columns on one column then compare
CREATE TABLE [dbo].[fmdchemical](
[id] [int] IDENTITY(1,1) NOT NULL,
[chemicalid] [int] NULL,
[StrSubstance] [nvarchar](3500) NULL,
[strmass] [nvarchar](3500) NULL,
[strcasnumber] [nvarchar](3500) NULL
) ON [PRIMARY]
GO
SET IDENTITY_INSERT [dbo].[fmdchemical] ON
INSERT [dbo].[fmdchemical] ([id], [chemicalid], [StrSubstance], [strmass], [strcasnumber]) VALUES (826807, 748787, N'3P04,AL2O3,C,C,Cr203,Cu,Cu,Fe,Mn,Others,others,others,Others,Pb,Si02,SiC,Silicone Resin,SiO2,Sn,Sn,Ti02', N'0.0004,0.0096,0.336,0.3992,0.5904,0.764,0.7904,1.5408,1.6176,1.6808,1.6808,2.9344,5.2392,7.2,7.9608,19.2072,32.08,65.3584,148.3384,232.7992,269.472', N'592910,592960,592961,593043,593043,593259,593290,593306,593306,593495,593501,593501,593505,593505,621442,621442,621442,621442,622089,624020,650559')
May 17, 2022 at 5:04 pm
Is this related to my response to https://www.sqlservercentral.com/forums/topic/slow-execution-when-update-chemical-id-on-table-chemicalhash-so-how-to-enhance-i suggesting using a hash for comparisons & indexing?
Hashbytes returns a varbinary(64) for that algorithm -- a very nice, efficient type for sorting & comparing --, not an nvarchar, and certainly not something 700 bytes long.
Why do you want to convert it to nvarchar? Converted, it's a big hexadecimal string not readable or usable by a human being. And it's a hash, so you can't decrypt it back into readable strings. So I see no reason you would store it as nvarchar.... At least you aren't trying to use nvarchar(3500).
If you can guarantee that no concatenation of those three columns could be ambiguous -- i.e., no two different concatenated sets of the three different columns could result in the same concatenated string -- then it would probably be safe to hash the concatenated columns. If not, can you use a separate hash for each column?
May 17, 2022 at 7:29 pm
I work on sql server 2019 i have table have multi column string concatenation as StrSubstance,strmass ,strcasnumber
i need to create filed hashchemical with nvarchar(700) and this will store hash for 3 columns concatenation with each other
are this possible
what i mean
alter table [dbo].[fmdchemical] add hashchemical nvarchar(700) null
update ch set ch.hashchemical =HASHBYTES('SHA2_512',concate(StrSubstance,strmass,strcasnumber)) from [dbo].[fmdchemical] chso are hashing will be correct with nvarchar(700) and every field have length 3500
are this will make issue on the feature with big counts
also are hash every column alone then compare it or concate all these columns on one column then compare
CREATE TABLE [dbo].[fmdchemical](
[id] [int] IDENTITY(1,1) NOT NULL,
[chemicalid] [int] NULL,
[StrSubstance] [nvarchar](3500) NULL,
[strmass] [nvarchar](3500) NULL,
[strcasnumber] [nvarchar](3500) NULL
) ON [PRIMARY]
GO
SET IDENTITY_INSERT [dbo].[fmdchemical] ON
INSERT [dbo].[fmdchemical] ([id], [chemicalid], [StrSubstance], [strmass], [strcasnumber]) VALUES (826807, 748787, N'3P04,AL2O3,C,C,Cr203,Cu,Cu,Fe,Mn,Others,others,others,Others,Pb,Si02,SiC,Silicone Resin,SiO2,Sn,Sn,Ti02', N'0.0004,0.0096,0.336,0.3992,0.5904,0.764,0.7904,1.5408,1.6176,1.6808,1.6808,2.9344,5.2392,7.2,7.9608,19.2072,32.08,65.3584,148.3384,232.7992,269.472', N'592910,592960,592961,593043,593043,593259,593290,593306,593306,593495,593501,593501,593505,593505,621442,621442,621442,621442,622089,624020,650559')
Ok... so you changed the other thread on this subject without telling us??? Why didn't you just ask this question over on that other thread? Hint: This is why people sometimes don't like helping your or that get a bit angry at your questions. 😉
--Jeff Moden
Change is inevitable... Change for the better is not.
May 18, 2022 at 10:54 am
HASHBYTES returns a set size, as Jeff noted. The inputs can be varchar(max), so you'd need to ensure that the first input is cast to varchar(max) to ensure this returns the correct values.
Note that HASHBYTES can have collisions, meaning that different sets of inputs can return the same HASHBYTES output. This isn't likely, but I have no idea how a large input might affect the algorithm. You'd want to test this with a lot of inputs and look for duplicates of the hash output.
HASHBYTES - https://docs.microsoft.com/en-us/sql/t-sql/functions/hashbytes-transact-sql?view=sql-server-ver15
CONCAT - https://docs.microsoft.com/en-us/sql/t-sql/functions/concat-transact-sql?view=sql-server-ver15
May 18, 2022 at 1:51 pm
Getting back to the original issue, if you'd go back to the original thread that start all this and answer my simple question, you might get somewhere.
As for the question on this thread, you really should start reading documentation on stuff you think you might want to use because the answers to such simple questions are in the documentation.
What's not in the documentation is the right way to do the concatenation for comparison hashes... which is super important in your case because you're trying to compare complex chemical formulas.
So take this conversation back to the original thread on this subject and answer my question so we can make all of this much faster.
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 5 posts - 1 through 4 (of 4 total)
You must be logged in to reply to this topic. Login to reply