February 24, 2016 at 5:49 am
Hi all,
I am having an issue with using distinct on one single column in a table and getting multiple result rows for a string "looking" the same. When I hashbyte the values, I can see that there must be a difference.But I have no clue how to get rid of this behaviour... please help:
MS SQL Server 2014 (Server, DB and Field Collation are all the same - Latin1_General_CI_AS)
City column is defined as nvarchar(50)
Table with column "City", doing a "select distinct city from table1" I get below city multiple times:
Column City
Zweibrücken
Zweibrücken
Zweibrücken
When doing a Hashbyte, I can see that there seems to be a difference of these values:
Column CityColumn City_Hashbyte
Zweibrücken0x31244719AA18450C404BD27E1F0B07A36E7840A2
Zweibrücken0xBD0FF24BCB34A0A2C83AE9069E59F33A588110E9
Zweibrücken0xD7BEF38644E48185EBC60B207EFF33F7DE50D8BF
All data is imported from one single XML file onto one single table. So I really don't get where this difference is coming from and how this can be overcome.
Thanks for any input / suggestion.
February 24, 2016 at 5:56 am
donstarter (2/24/2016)
Hi all,I am having an issue with using distinct on one single column in a table and getting multiple result rows for a string "looking" the same. When I hashbyte the values, I can see that there must be a difference.But I have no clue how to get rid of this behaviour... please help:
MS SQL Server 2014 (Server, DB and Field Collation are all the same - Latin1_General_CI_AS)
City column is defined as nvarchar(50)
Table with column "City", doing a "select distinct city from table1" I get below city multiple times:
Column City
Zweibrücken
Zweibrücken
Zweibrücken
When doing a Hashbyte, I can see that there seems to be a difference of these values:
Column CityColumn City_Hashbyte
Zweibrücken0x31244719AA18450C404BD27E1F0B07A36E7840A2
Zweibrücken0xBD0FF24BCB34A0A2C83AE9069E59F33A588110E9
Zweibrücken0xD7BEF38644E48185EBC60B207EFF33F7DE50D8BF
All data is imported from one single XML file onto one single table. So I really don't get where this difference is coming from and how this can be overcome.
Thanks for any input / suggestion.
You could try pasting the data into Notepad++ and then selecting the View/Show Symbol/Show All Characters option. Maybe that will show something.
The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin
February 24, 2016 at 6:03 am
Thanks for your input Phil,
Yes, I can see the difference of the strings when pasting into Notepad++ kind of same as I see the difference when doing a hashbyte. In two of the values are some is a "HOP" in front or after the "ü".
Any idea how I can overcome this?
February 24, 2016 at 6:09 am
donstarter (2/24/2016)
Thanks for your input Phil,Yes, I can see the difference of the strings when pasting into Notepad++ kind of same as I see the difference when doing a hashbyte. In two of the values are some is a "HOP" in front or after the "ü".
Any idea how I can overcome this?
Use some sort of clean string function to remove the rubbish characters. If you can't find anything suitable (no cursors or while loops please :-)), post back and I'll see whether I can.
Or, if that is the only character causing you issues, find out its ASCII value and do a REPLACE() on the city name to remove it.
The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin
February 24, 2016 at 6:29 am
The way I see it, there are two directions you can go here.
1. If, like Phil said, you only have 1 or 2 characters that are causing you problems, you can REPLACE them.
2. If you have a bunch of problem characters AND you can define a range of ASCII values you want to keep and eliminate everything else, let us know. This is probably best solved with an ITVF to return only the "good" characters.
Whatever direction you go, please heed Phil's advice - no loops.
February 24, 2016 at 6:41 am
Hi all,
Thanks a lot for your input, that helped a lot! I will go for the REPLACE version for now, this should do the trick for the specific issue now. Thanks
February 24, 2016 at 7:39 am
donstarter (2/24/2016)
Hi all,I am having an issue with using distinct on one single column in a table and getting multiple result rows for a string "looking" the same. When I hashbyte the values, I can see that there must be a difference.But I have no clue how to get rid of this behaviour... please help:
MS SQL Server 2014 (Server, DB and Field Collation are all the same - Latin1_General_CI_AS)
City column is defined as nvarchar(50)
Table with column "City", doing a "select distinct city from table1" I get below city multiple times:
Column City
Zweibrücken
Zweibrücken
Zweibrücken
When doing a Hashbyte, I can see that there seems to be a difference of these values:
Column CityColumn City_Hashbyte
Zweibrücken0x31244719AA18450C404BD27E1F0B07A36E7840A2
Zweibrücken0xBD0FF24BCB34A0A2C83AE9069E59F33A588110E9
Zweibrücken0xD7BEF38644E48185EBC60B207EFF33F7DE50D8BF
All data is imported from one single XML file onto one single table. So I really don't get where this difference is coming from and how this can be overcome.
Thanks for any input / suggestion.
Using the values you posted, it seems that there's an invisible value before or after the ü.
Check the following code:
WITH
Cities AS(
SELECT CityId, City
FROM (VALUES(1,N'Zweibrücken'),
(2,N'Zweibrücken'),
(3,N'Zweibrücken'))x(CityId, City)
),
E(n) AS(
SELECT n FROM (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0))E(n)
),
E2(n) AS(
SELECT a.n FROM E a, E b
),
E4(n) AS(
SELECT a.n FROM E2 a, E2 b
),
cteTally(n) AS(
SELECT ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) n
FROM E4
)
SELECT c.*, SUBSTRING(City, n, 1), UNICODE(SUBSTRING(City, n, 1))
FROM cteTally t
JOIN Cities c ON t.n <= LEN(c.City)
ORDER BY CityId, n;
Viewing 7 posts - 1 through 6 (of 6 total)
You must be logged in to reply to this topic. Login to reply