August 28, 2014 at 11:48 am
I cannot seem to figure this one out, so I have taken to a forum for assistance.
I have an SSIS package that puts together a series of queries to check all text fields in all tables of a database for a set of "invalid" characters. I know the root query works as we have been running it manually for quite a while. However, now that I have changed it to build dynamically, I cannot figure out why it does not work.
Here is the code. The EXEC (@sSQL) returns 3 records. It just so happens that these 3 records are the only ones in the database with a '?' in the field. The very bottom statement returns 0 records. I cannot figure out what the difference is. When I do PRINT (@sSQL), it looks fine to me, except that @Pattern is fully printed.
Any ideas?
DECLARE @Pattern NVARCHAR(MAX)
DECLARE @sSQL NVARCHAR(MAX)
DECLARE @1 nvarchar(max)
DECLARE @2 nvarchar(max)
DECLARE @loop int
SET @Pattern = '%['
SET @Pattern = @Pattern + NCHAR(0) + NCHAR(1) + NCHAR(2) + NCHAR(3) + NCHAR(4) + NCHAR (5) + NCHAR(6) + NCHAR(7) + NCHAR(8) + NCHAR(11) + NCHAR(12) + NCHAR(14) + NCHAR(15) + NCHAR(16) + NCHAR(17) + NCHAR(18) + NCHAR(19) + NCHAR(20) + NCHAR(21) + NCHAR(22) + NCHAR(23) + NCHAR(24) + NCHAR(25) + NCHAR(26) + NCHAR(27) + NCHAR(28) + NCHAR(29) + NCHAR(30) + NCHAR(31)
SET @loop = 55296
WHILE @loop < 57344
BEGIN
SET @Pattern = @Pattern + NCHAR (@loop)
SET @loop = @loop + 1
END
SET @Pattern = @Pattern + NCHAR(65534) + NCHAR(65535)
SET @Pattern = @Pattern + ']%'
set @1 = 'SELECT * FROM [datacreator].[M_EventPOI] WHERE PATINDEX('
set @2 = ', CONVERT(NVARCHAR(MAX), LTRIM(RTRIM(Alternate_Name))) COLLATE Latin1_General_BIN) <> 0'
SET @sSQL = @1 + '''' + @Pattern + '''' + @2
--3 Rows Returns; all 3 have '?'
EXEC (@ssql)
--FOR COMPARISON. 0 rows return (which is correct)
select * FROM [datacreator].[M_EventPOI] WHERE PATINDEX(@Pattern, CONVERT(NVARCHAR(MAX), LTRIM(RTRIM(Alternate_Name))) COLLATE Latin1_General_BIN) <> 0
August 28, 2014 at 11:51 am
Basically, I think what is happeneing is that the dynamic SQL is evaluating '?' as one of my invalid characters, even though it's not part of my list. '?' is CHAR(63), which I am not including in my @Pattern. What I can't figure out is why/how it is getting in there for the dynamic version but not for the static.
August 28, 2014 at 2:44 pm
I'm guessing what is happening is that Latin1 has only ascii 0 to 255, not extended characters. So, the double-bytes are being interpreted as single bytes, one of which is a "?".
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
August 28, 2014 at 2:53 pm
The script works fine and properly if I do not build a dynamic SQL input, but it also doesn't actually parse out the @Pattern prior to the PATINDEX in the static query like it does in the dynamic SQL.
I did go ahead and try other collations, though, with the same result. Any idea what collation would accept the unicode extended characters?
August 28, 2014 at 3:33 pm
Sorry, I have no idea; that's not my area of expertise.
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
August 28, 2014 at 3:55 pm
Two questions and a clarification:
Clarification: What are you using to build the @sql string? Is this completely encapsulated in an OLEDB or Exectute SQL task, or are you assembling a local variable in a script component first?
Question 1: Have you performed a profiler to confirm the query being sent to the SQL Server is the one you expect?
Question 2: Have you directly confirmed that your character is definately a CHR(63) via ASCII(SUBSTRING())?
Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.
For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA
August 28, 2014 at 9:50 pm
The code fails because the first character in the Pattern sequence is NCHAR(0). Any usage of the string will recognise this as the end of the string, truncating the string at this point. The solution is to start at NCHAR(1) instead.
😎
For demonstration, run first this code
DECLARE @Pattern NVARCHAR(MAX)
DECLARE @sSQL NVARCHAR(MAX)
SET @Pattern = '%['
SET @Pattern = @Pattern + NCHAR(0) + NCHAR(1) + NCHAR(2) + NCHAR(3) + NCHAR(4) + NCHAR (5) + NCHAR(6) + NCHAR(7) + NCHAR(8) + NCHAR(11) + NCHAR(12) + NCHAR(14) + NCHAR(15) + NCHAR(16) + NCHAR(17) + NCHAR(18) + NCHAR(19) + NCHAR(20) + NCHAR(21) + NCHAR(22) + NCHAR(23) + NCHAR(24) + NCHAR(25) + NCHAR(26) + NCHAR(27) + NCHAR(28) + NCHAR(29) + NCHAR(30) + NCHAR(31)
SELECT @Pattern
Return value
%[
And then this
DECLARE @Pattern NVARCHAR(MAX)
DECLARE @sSQL NVARCHAR(MAX)
SET @Pattern = '%['
SET @Pattern = @Pattern + NCHAR(0) + NCHAR(1) + NCHAR(2) + NCHAR(3) + NCHAR(4) + NCHAR (5) + NCHAR(6) + NCHAR(7) + NCHAR(8) + NCHAR(11) + NCHAR(12) + NCHAR(14) + NCHAR(15) + NCHAR(16) + NCHAR(17) + NCHAR(18) + NCHAR(19) + NCHAR(20) + NCHAR(21) + NCHAR(22) + NCHAR(23) + NCHAR(24) + NCHAR(25) + NCHAR(26) + NCHAR(27) + NCHAR(28) + NCHAR(29) + NCHAR(30) + NCHAR(31)
SELECT CONVERT(VARBINARY(MAX),@Pattern,3)
Return value
0x25005B000000010002000300040005000600070008000B000C000E000F0010001100120013001400150016001700180019001A001B001C001D001E001F00
This is probably way too obvious for one to notice at the first glance;-)
August 29, 2014 at 7:46 am
Unfortunately, converting to varbinary didn't work and neither did removing the nchar(0) start.
Also unfortunately, I do not have permission to run profiler in this particular case. It's a severely locked down server.
I have confirmed that the '?' I am seeing is truly CHAR(63) by checking the ASCII value of the characters in the field.
Now, to explain further, I have this code in a Function on another server. It works flawlessly there. I pass in a NVARCHAR(MAX) string and it runs through all the chars and uses STUFF to take them out. It spits out the cleaned string.
I can't simply put the working function on this server and handle it the same way. I am not allowed to create anything more than temp tables. As a result, I am using SSIS. Here is the run-down on what I am doing:
First thing I do is identify ALL char, varchar, nchar, and nvarchar columns in the TABLE_SCHEMA.
I have a ForEach loop to iterate over all of these identified tables/columns. In the loop I do a couple of things:
First, I create a SQL statement using VBScript. It outputs two variables; SQL1 and SQL2. They are:
--sTable and sColumn are pulled in from the ForEach loop
SQL1 = "SELECT * FROM " & sTable & " WHERE PATINDEX("
SQL2 = ", (" & sColumn & ") COLLATE Latin1_General_BIN) <> 0"
Next, I run the SQL code from my first post. Here is the unaltered code in that SQL Task:
DECLARE @Counter INTEGER
DECLARE @Pattern NVARCHAR(MAX)
DECLARE @loop int
DECLARE @sSQL NVARCHAR(MAX)
SET @Pattern = '%['
SET @Pattern = @Pattern + NCHAR(0) + NCHAR(1) + NCHAR(2) + NCHAR(3) + NCHAR(4) + NCHAR (5) + NCHAR(6) + NCHAR(7) + NCHAR(8) + NCHAR(11) + NCHAR(12) + NCHAR(14) + NCHAR(15) + NCHAR(16) + NCHAR(17) + NCHAR(18) + NCHAR(19) + NCHAR(20) + NCHAR(21) + NCHAR(22) + NCHAR(23) + NCHAR(24) + NCHAR(25) + NCHAR(26) + NCHAR(27) + NCHAR(28) + NCHAR(29) + NCHAR(30) + NCHAR(31)
SET @loop = 55296
WHILE @loop < 57344
BEGIN
SET @Pattern = @Pattern + NCHAR (@loop)
SET @loop = @loop + 1
END
SET @Pattern = @Pattern + NCHAR(65534) + NCHAR(65535)
SET @Pattern = @Pattern + ']%'
SET @sSQL = ? + '''' + @Pattern + '''' + ?
EXEC (@sSQL)
After this step, I do some logging and such that isn't important to this question, so I'll stop it there.
Now, the reason why I am using the VBScript to create SQL1 and SQL2 is because I cannot include @Pattern in dynamic SQL. (Must declare the scalar variable "@Pattern".) I have to use dynamic SQL because I am changing the table and/or column each time as I iterate over my list.
I am open to any other suggestions or different approaches. I've been scratching my head over this for a couple of weeks now.
August 29, 2014 at 8:30 am
Have you tried using an SC collation? There's info here. Your ? might be the first of a surrogate pair.
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
August 29, 2014 at 8:37 am
ChrisM@Work (8/29/2014)
Have you tried using an SC collation? There's info here. Your ? might be the first of a surrogate pair.
I thought supplementary char collations were not in 2k8? When I check fn_helpcollations, I don't see any with _SC. Am I missing something?
August 29, 2014 at 9:14 am
A little more testing and it looks like it might really be a collation issue. If I comment out the following, the return comes back as I expect.
SET @loop = 55296
WHILE @loop < 57344
--WHILE @loop < 55296
BEGIN
SET @Pattern = @Pattern + NCHAR (@loop)
SET @loop = @loop + 1
END
SET @Pattern = @Pattern + NCHAR(65534) + NCHAR(65535)
I am guessing that the database is not set up to handle SC or these higher valued chars in the first place. If this is the case, then I shouldn't need to scan for them because they can't exist in the first place. May end up being a non-issue.
Any additional input?
August 29, 2014 at 9:34 am
Solved:
DECLARE @Pattern NVARCHAR(MAX)
DECLARE @sSQL NVARCHAR(MAX)
DECLARE @1 nvarchar(max)
DECLARE @2 nvarchar(max)
DECLARE @loop int
SET @Pattern = '%['
SET @Pattern = @Pattern + NCHAR(0) + NCHAR(1) + NCHAR(2) + NCHAR(3) + NCHAR(4) + NCHAR (5) + NCHAR(6) + NCHAR(7) + NCHAR(8) + NCHAR(11) + NCHAR(12) + NCHAR(14) + NCHAR(15) + NCHAR(16) + NCHAR(17) + NCHAR(18) + NCHAR(19) + NCHAR(20) + NCHAR(21) + NCHAR(22) + NCHAR(23) + NCHAR(24) + NCHAR(25) + NCHAR(26) + NCHAR(27) + NCHAR(28) + NCHAR(29) + NCHAR(30) + NCHAR(31)
SET @loop = 55296
WHILE @loop < 57344
BEGIN
SET @Pattern = @Pattern + NCHAR(@loop)
SET @loop = @loop + 1
END
SET @Pattern = @Pattern + NCHAR(65534) + NCHAR(65535) + ']%'
------------------------------------------------------------------------------------------------------------------------------------
-- Fails: note PATINDEX('
SET @sSQL = '
SELECT * FROM (SELECT Alternate_Name = N''Sample?String'') d
CROSS APPLY (SELECT pos = PATINDEX('+ '''' + @Pattern + '''' + ', CONVERT(NVARCHAR(MAX), LTRIM(RTRIM(Alternate_Name)))
COLLATE Latin1_General_BIN)) x
WHERE pos <> 0'
EXEC (@ssql)
-----------------------------------------------------------------------------------------------------------------------------------
-- Works: note PATINDEX(N'
SET @sSQL = '
SELECT * FROM (SELECT Alternate_Name = N''Sample?String'') d
CROSS APPLY (SELECT pos = PATINDEX(N'+ '''' + @Pattern + '''' + ', CONVERT(NVARCHAR(MAX), LTRIM(RTRIM(Alternate_Name)))
COLLATE Latin1_General_BIN)) x
WHERE pos <> 0'
EXEC (@ssql)
----------------------------------------------------------------------------------------------------------------------------------
-- funny - they look the same
;WITH
E1 AS (SELECT * FROM (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) d (n)),
E2 (n) AS (SELECT 0 FROM E1 a, E1 b),
iTally (n) AS (SELECT TOP (LEN(@ssql)) n = ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) FROM E2 a, E2 b)
SELECT
[Char] = SUBSTRING(@ssql,n,1),
[UNICODECode] = UNICODE(SUBSTRING(@ssql,n,1)),
[ASCIICode] = ASCII(SUBSTRING(@ssql,n,1))
FROM iTally
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
August 29, 2014 at 9:43 am
Wow, thanks Chris! I can't believe I forgot to N' the string. That single oversight fixed the entire routine.
Good information to know on the CHAR/Unicode/ASCII differences. I've never had to work much above 255's until now, so much appreciated and a good learning experience.
August 29, 2014 at 9:47 am
joseph.robinson83 (8/29/2014)
Wow, thanks Chris! I can't believe I forgot to N' the string. That single oversight fixed the entire routine.Good information to know on the CHAR/Unicode/ASCII differences. I've never had to work much above 255's until now, so much appreciated and a good learning experience.
Thanks Joseph - it was a learning experience for me too!
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
Viewing 14 posts - 1 through 13 (of 13 total)
You must be logged in to reply to this topic. Login to reply