May 9, 2011 at 12:46 pm
Thanks Jeff this is an excellent bit of code!
A few problems I see however
1) It is grabbing columns with the dash ('-') as 'weird' columns even though looking at the list of valid characters I see the dash in there. I did a temporary fix of adding AND Testfield not like '%-%' to the code which using this on my actual table yields the same results as my original function (with the exception of one issue which ill explain in a minute) albeit about 4x faster 😀
2) The ASCII=RIGHT(Testsstring,1)) is pulling only the ASCII character number for the rightmost character in the string, not necessarily the illegal character. What if it appears in the middle or front? I am trying to come up with something now to get that in the mix as well 😀
3) Last but not least, my function takes care of leading as well (which I never really said in my original post so I apologize, not your fault!) I believe from my research that the ascii code for leading and trailing spaces is 160 whereas spaces within the string is 32 (though I admit I could be wrong)
Thank you again for the code, it has given me a good start to work from! One question I do have however, is there a way to get around the loop of having to execute this code against each character column one at a time? I can't think of a tally table solution to that so I am open to ideas 🙂
Link to my blog http://notyelf.com/
May 9, 2011 at 1:21 pm
Figured out how to get the characters in the substring to show where in the string the illegal character is and what illegal character it is. still working on the other issues
DECLARE @LegalCharacters VARCHAR(256)
SELECT @LegalCharacters = ISNULL(@LegalCharacters,'')+CHAR(t.N)
FROM dbo.Tally t
WHERE t.N between 32 and 126
OR t.N IN (220,146,252,167,214,246,196,228,145,147,148,150,180,188,189,190,201,233,166,195,227,193,225,199,231,209,241)
;
--===== Add the "escape" characters to the characters normally used as WildCards and character range blocks.
-- In this case, I've used an underscore as the "escape" character.
SELECT @LegalCharacters = REPLACE(REPLACE(REPLACE(REPLACE(@LegalCharacters,'_','__'),'%','_%'),'[','_['),']','_]')
;
--===== And now we find ALL of the "cells" in the column that have a "wierd" character.
-- The "AsciiValue" column shows the decimal ASCII value of the "weird" character found.
-- I didn't use every possible wierd character but it will work for all "weird" characters.
SELECT RowNum, TestString, ascii(SUBSTRING(TestString,T.N,1)), SUBSTRING(TestString,T.N,1)
FROM #TestTable AS Tbl, dbo.tally AS T
WHERE TestString LIKE '%[^'+@LegalCharacters+']%' ESCAPE '_'
and T.N <= LEN(Teststring)
and SUBSTRING(Teststring,T.N,1) LIKE '%[^'+@LegalCharacters+']%' ESCAPE '_'
Link to my blog http://notyelf.com/
May 9, 2011 at 4:35 pm
Ok finally got the code to work out the 3 issues I had I have posted below. I notice originally the code you posted ran in about 6 seconds on my computer. The modified code runs in less than 1 second. I am thinking that is because it forces a clustered index seek on the dbo.tally table (which was a side affect of my code and not intended but very welcomed!)
DECLARE @LegalCharacters VARCHAR(256)
SELECT @LegalCharacters = ISNULL(@LegalCharacters,'')+CHAR(t.N)
FROM dbo.Tally t
WHERE t.N between 32 and 126
OR t.N IN (220,146,252,167,214,246,196,228,145,147,148,150,180,188,189,190,201,233,166,195,227,193,225,199,231,209,241)
;
--===== Add the "escape" characters to the characters normally used as WildCards and character range blocks.
-- In this case, I've used an underscore as the "escape" character.
SELECT @LegalCharacters = REPLACE(REPLACE(REPLACE(REPLACE(@LegalCharacters,'_','__'),'%','_%'),'[','_['),']','_]')
;
--===== And now we find ALL of the "cells" in the column that have a "wierd" character.
-- The "AsciiValue" column shows the decimal ASCII value of the "weird" character found.
-- I didn't use every possible wierd character but it will work for all "weird" characters.
SELECT RowNum, TestString, ascii(SUBSTRING(TestString,T.N,1)), SUBSTRING(TestString,T.N,1) FROM #TestTable AS Tbl, dbo.tally AS T
WHERE T.N <= LEN(Teststring) AND SUBSTRING(Teststring,T.N,1) LIKE '%[^'+@LegalCharacters+']%' ESCAPE '_' AND SUBSTRING(Teststring,T.N,1) not like '%-%'
AND (TestString LIKE '%[^'+@LegalCharacters+']%' ESCAPE '_' OR (TestString LIKE '%[^'+@LegalCharacters+']%' ESCAPE '_' and TESTSTRING NOT LIKE '%-%'))
Link to my blog http://notyelf.com/
May 9, 2011 at 5:05 pm
I'll show you how to fix the dash problem without all the special handling you've done but I still don't understand what you're trying to do with the Tally Table. I thought all you wanted to do was find items that had the weird characters... not split the strings up.
--Jeff Moden
Change is inevitable... Change for the better is not.
May 9, 2011 at 5:14 pm
shannonjk (5/9/2011)
2) The ASCII=RIGHT(Testsstring,1)) is pulling only the ASCII character number for the rightmost character in the string, not necessarily the illegal character. What if it appears in the middle or front? I am trying to come up with something now to get that in the mix as well 😀
The ASCII=RIGHT(TestString,1) is just to show you what it found as part of this test. I realize the weird character could be anywhere in the string. It was just simpler to make a predictable test.
3) Last but not least, my function takes care of leading as well (which I never really said in my original post so I apologize, not your fault!) I believe from my research that the ascii code for leading and trailing spaces is 160 whereas spaces within the string is 32 (though I admit I could be wrong)
No... normal leading, trailing, and embedded spaces are ASCII 32. The 160 character is actually a "hard space" and it get's put there by people doing things like "justification" so HTML and XML doesn't compress them to nothing.
Again, though... I thought you said in your original post that you only wanted to find "cells" that had weird characters which the basis of my code actually does. Are you now saying that you actually want to get rid of the weird characters "auto-magically"?
[/quote]
--Jeff Moden
Change is inevitable... Change for the better is not.
May 9, 2011 at 5:40 pm
I believe I found a slightly alternate solution however, to answer your question first, I am not trying to split up the string just identify the records that have the 'weird' characters in them and identify what they are. That is really just my first step, from there I will determine what to do with those records. It doesn't happen very often so I really don't have enough of a sample source to come up with automation in handling what to do once I have found the 'weird' characters. Your code does pretty much exactly what I was looking for so I think you understood just fine 🙂
Here is the alternate code I came up with
-- Creates a CTE with legal characters
;with cteTally as (select N
FROM dbo.Tally t WHERE t.N between 32 and 126 OR t.N IN (220,146,252,167,214,246,196,228,145,147,148,150,180,188,189,190,201,233,166,195,227,193,225,199,231,209,241))
-- Selects all rows where illegal characters appear in column
-- Added distinct clause due to multiple instances of leading and trailing spaces.
SELECT distinct RowNum, TestString, ascii(SUBSTRING(TestString,I.N,1)), SUBSTRING(TestString,I.N,1) FROM #TestTable AS Tbl, dbo.tally AS I
where ascii(SUBSTRING(TestString,I.N,1)) not in (select N from cteTally) and I.N <= LEN(TestString)
Link to my blog http://notyelf.com/
May 9, 2011 at 5:47 pm
I am not sure what you mean by 'cells'
I want exactly what your code does which is find illegal characters in a given column of a given table.
That makes sense then on the #2 then. I just had one other post in between your postings there that I didn't see your last but I will reiterate that I don't want anything 'auto-magically' done at this point 😀
Like I said your code does exactly what I need I just needed to get the dash to not come up. I also need to find out why, despite doing rtrim and ltrim on most of the source columns in my database, it is coming up with leading and trailing spaces :-). The new code I just posted found those char(160) as well though so all in all seems to be working well.
What I was asking additionally though, is there a way without using a loop to cycle through each character column in each table within a database to apply this script? All I can think of (which is what I am using now) is to create a table variable that has all the tables and their corresponding character columns with an ID column. then I loop through table via ID column and create a sql script for each record and execute it so by the end I have a large output of the results of each sql script being executed for each char column of each table.
Hopefully I am making as much sense I as I think I am 😀 You have been really helpful, and I appreciate it!
Link to my blog http://notyelf.com/
May 9, 2011 at 9:46 pm
By a "cell", I mean an intersection of a row and a column. I should have just said "data element".
I forgot about the dash. It's used as a range element character. To use it the way we want, it needs to be the first character in the "valid" list of characters. Might be able to use an "escape" character on it, but have never tried it before.
A loop to trip across columns and across tables won't be a slow thing because of how comparatively little work it'll actually do. It can, however, be avoided if you really want to. SQL Server 2005 and up make it real easy to do so.
It's getting late here so I'll have to whittle on this tomorrow but thought I'd give you a couple of ideas in the meantime.
--Jeff Moden
Change is inevitable... Change for the better is not.
May 10, 2011 at 11:02 am
Thanks again Jeff. I see what you are saying on the Cell now 😀
I will have to play with the dash thing as well to see if I can capture it however, the alternate code I posted seems to work because it references the - by its ascii number instead of a literal reference.
That makes sense as far as looping through the tables and columns, I didn't think it was too bad since it never seems to be very much of a load but, after hearing for years and years that 'loops are bad' I guess deep down in my psyche yearns for a non-loop solution lol. I will continue to see if I can find one in the mean time 😀
Thanks again Jeff, I had a lot of fun and learned a lot in this process!
Link to my blog http://notyelf.com/
May 27, 2011 at 9:49 am
Awesome article! Definitely something I've never done, but I believe this article will get me there.
"For an optional solution, please refer to the following URL"
Optimal, perhaps?
And the URL?
Thanks!
May 27, 2011 at 11:24 am
SQL Server Youngling (5/27/2011)
Awesome article! Definitely something I've never done, but I believe this article will get me there."For an optional solution, please refer to the following URL"
Optimal, perhaps?
And the URL?
Thanks!
It is here - http://www.sqlservercentral.com/articles/Tally+Table/72993/
HTH
Dave J
June 15, 2011 at 9:02 pm
David Jackson (5/27/2011)
SQL Server Youngling (5/27/2011)
Awesome article! Definitely something I've never done, but I believe this article will get me there."For an optional solution, please refer to the following URL"
Optimal, perhaps?
And the URL?
Thanks!
It is here - http://www.sqlservercentral.com/articles/Tally+Table/72993/
HTH
Dave J
I can't believe I left off the URL. :blush: I just submitted an update to the article to include it. Thank you for the catch, folks.
--Jeff Moden
Change is inevitable... Change for the better is not.
June 15, 2011 at 9:04 pm
SQL Server Youngling (5/27/2011)
Awesome article! Definitely something I've never done, but I believe this article will get me there."For an optional solution, please refer to the following URL"
Optimal, perhaps?
And the URL?
Thanks!
Thank you very much for the great feedback.
--Jeff Moden
Change is inevitable... Change for the better is not.
June 20, 2011 at 4:13 am
ermm...is it just me or can anyone else 'see' the article..? I only get the second code block but then no more text..?
June 20, 2011 at 4:21 am
No... it's not just you. Something has gone dreadfully wrong (I hope I have a copy somewhere). I send an email to Steve Jones.
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 15 posts - 331 through 345 (of 511 total)
You must be logged in to reply to this topic. Login to reply