junk characters in a field

  • Hi ,

    You can check with below code also

    Declare @ sql varchar(Max)

    Delcare @Tablename Varchar(255)

    SET Tablename = 'Test1'

    SET @Sql=( 'UPDATE '+ @TableName + '

    SET PatientAddress2 =NULL

    FROM '+ @TableName + '

    WHERE PatientAddress2 IS NOT NULL AND

    (LOWER(RTRIM(LTRIM(PatientAddress2))) NOT LIKE ''[a-z]%'' AND RTRIM(LTRIM(PatientAddress2)) NOT LIKE ''[0-9]%''

    AND RTRIM(LTRIM(PatientAddress2)) not like ''#%'')

    AND LEN(LTRIM(RTRIM(PatientAddress2)))<=1 and isnull(ASCII(LEFT(LTRIM(RTRIM(PatientAddress2)),1)),0)=0 ')

    Exec (@Sql)

  • ...

    It's entirely possible someone with better C#/VB.NET skill than mine can make a CLR function that's faster, but the options I found online for that were all much, much slower, or buggy (in one case, it scrambled character-sequences in some cases).

    ...

    That is absolute true! If you need CLR function - give it to someone with good c# skills and knowledge about how CLR functions should be written to achieve the best possible performance when called from SQL. Again, it will require better than average c# skills.

    _____________________________________________
    "The only true wisdom is in knowing you know nothing"
    "O skol'ko nam otkrytiy chudnyh prevnosit microsofta duh!":-D
    (So many miracle inventions provided by MS to us...)

    How to post your question to get the best and quick help[/url]

  • Eugene Elutin (4/12/2012)


    ...

    It's entirely possible someone with better C#/VB.NET skill than mine can make a CLR function that's faster, but the options I found online for that were all much, much slower, or buggy (in one case, it scrambled character-sequences in some cases).

    ...

    That is absolute true! If you need CLR function - give it to someone with good c# skills and knowledge about how CLR functions should be written to achieve the best possible performance when called from SQL. Again, it will require better than average c# skills.

    Which is yet another reason to use code like mine. It doesn't require a specialized skillset to maintain/modify/refactor mine, but CLR does require a more rare skillset. Something to keep in mind when picking solutions. Not just "does it do what I need?", but "can I understand it well enough to extend/fix it if I need to in 6 months?" That's an important factor.

    That, of course, ignores the security issues with enabling CLR, and other potential drawbacks to it.

    Of course, mine has some of the same drawbacks. Not portable outside of T-SQL (Oracle/DB2/MySQL, you'll have to re-write mine or CLR). Can't be used in engines prior to SQL 2005.

    And mine does require a Numbers table, which does take disk space (minor), and needs to be kept intact (minimal effort, but matters if you're deploying to customer servers, et al), which a CLR solution doesn't need.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • I just ran another speed test. 23,028 rows needed phone numbers cleaned. Took 734 milliseconds total, 733 ms CPU time.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • GSquared (4/12/2012)


    Eugene Elutin (4/12/2012)


    ...

    It's entirely possible someone with better C#/VB.NET skill than mine can make a CLR function that's faster, but the options I found online for that were all much, much slower, or buggy (in one case, it scrambled character-sequences in some cases).

    ...

    That is absolute true! If you need CLR function - give it to someone with good c# skills and knowledge about how CLR functions should be written to achieve the best possible performance when called from SQL. Again, it will require better than average c# skills.

    Which is yet another reason to use code like mine. It doesn't require a specialized skillset to maintain/modify/refactor mine, but CLR does require a more rare skillset. Something to keep in mind when picking solutions. Not just "does it do what I need?", but "can I understand it well enough to extend/fix it if I need to in 6 months?" That's an important factor.

    That, of course, ignores the security issues with enabling CLR, and other potential drawbacks to it.

    Of course, mine has some of the same drawbacks. Not portable outside of T-SQL (Oracle/DB2/MySQL, you'll have to re-write mine or CLR). Can't be used in engines prior to SQL 2005.

    And mine does require a Numbers table, which does take disk space (minor), and needs to be kept intact (minimal effort, but matters if you're deploying to customer servers, et al), which a CLR solution doesn't need.

    What happens to your code if you build it with a dynamic tally table as part of the code?

  • Lynn Pettis (4/12/2012)


    GSquared (4/12/2012)


    Eugene Elutin (4/12/2012)


    ...

    It's entirely possible someone with better C#/VB.NET skill than mine can make a CLR function that's faster, but the options I found online for that were all much, much slower, or buggy (in one case, it scrambled character-sequences in some cases).

    ...

    That is absolute true! If you need CLR function - give it to someone with good c# skills and knowledge about how CLR functions should be written to achieve the best possible performance when called from SQL. Again, it will require better than average c# skills.

    Which is yet another reason to use code like mine. It doesn't require a specialized skillset to maintain/modify/refactor mine, but CLR does require a more rare skillset. Something to keep in mind when picking solutions. Not just "does it do what I need?", but "can I understand it well enough to extend/fix it if I need to in 6 months?" That's an important factor.

    That, of course, ignores the security issues with enabling CLR, and other potential drawbacks to it.

    Of course, mine has some of the same drawbacks. Not portable outside of T-SQL (Oracle/DB2/MySQL, you'll have to re-write mine or CLR). Can't be used in engines prior to SQL 2005.

    And mine does require a Numbers table, which does take disk space (minor), and needs to be kept intact (minimal effort, but matters if you're deploying to customer servers, et al), which a CLR solution doesn't need.

    What happens to your code if you build it with a dynamic tally table as part of the code?

    Reduced CPU time 633 ms for the same records. Limited the Numbers CTE to 100 rows, because this is phone numbers and doesn't go beyond that in length.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • GSquared (4/12/2012)


    Lynn Pettis (4/12/2012)


    GSquared (4/12/2012)


    Eugene Elutin (4/12/2012)


    ...

    It's entirely possible someone with better C#/VB.NET skill than mine can make a CLR function that's faster, but the options I found online for that were all much, much slower, or buggy (in one case, it scrambled character-sequences in some cases).

    ...

    That is absolute true! If you need CLR function - give it to someone with good c# skills and knowledge about how CLR functions should be written to achieve the best possible performance when called from SQL. Again, it will require better than average c# skills.

    Which is yet another reason to use code like mine. It doesn't require a specialized skillset to maintain/modify/refactor mine, but CLR does require a more rare skillset. Something to keep in mind when picking solutions. Not just "does it do what I need?", but "can I understand it well enough to extend/fix it if I need to in 6 months?" That's an important factor.

    That, of course, ignores the security issues with enabling CLR, and other potential drawbacks to it.

    Of course, mine has some of the same drawbacks. Not portable outside of T-SQL (Oracle/DB2/MySQL, you'll have to re-write mine or CLR). Can't be used in engines prior to SQL 2005.

    And mine does require a Numbers table, which does take disk space (minor), and needs to be kept intact (minimal effort, but matters if you're deploying to customer servers, et al), which a CLR solution doesn't need.

    What happens to your code if you build it with a dynamic tally table as part of the code?

    Reduced CPU time 633 ms for the same records. Limited the Numbers CTE to 100 rows, because this is phone numbers and doesn't go beyond that in length.

    Reduced CPU time to 633 ms or by 633 ms? I'm going to go with to 633 ms shaving off 100 ms from what you reported earlier.

  • Lynn Pettis (4/12/2012)


    GSquared (4/12/2012)


    Lynn Pettis (4/12/2012)


    GSquared (4/12/2012)


    Eugene Elutin (4/12/2012)


    ...

    It's entirely possible someone with better C#/VB.NET skill than mine can make a CLR function that's faster, but the options I found online for that were all much, much slower, or buggy (in one case, it scrambled character-sequences in some cases).

    ...

    That is absolute true! If you need CLR function - give it to someone with good c# skills and knowledge about how CLR functions should be written to achieve the best possible performance when called from SQL. Again, it will require better than average c# skills.

    Which is yet another reason to use code like mine. It doesn't require a specialized skillset to maintain/modify/refactor mine, but CLR does require a more rare skillset. Something to keep in mind when picking solutions. Not just "does it do what I need?", but "can I understand it well enough to extend/fix it if I need to in 6 months?" That's an important factor.

    That, of course, ignores the security issues with enabling CLR, and other potential drawbacks to it.

    Of course, mine has some of the same drawbacks. Not portable outside of T-SQL (Oracle/DB2/MySQL, you'll have to re-write mine or CLR). Can't be used in engines prior to SQL 2005.

    And mine does require a Numbers table, which does take disk space (minor), and needs to be kept intact (minimal effort, but matters if you're deploying to customer servers, et al), which a CLR solution doesn't need.

    What happens to your code if you build it with a dynamic tally table as part of the code?

    Reduced CPU time 633 ms for the same records. Limited the Numbers CTE to 100 rows, because this is phone numbers and doesn't go beyond that in length.

    Reduced CPU time to 633 ms or by 633 ms? I'm going to go with to 633 ms shaving off 100 ms from what you reported earlier.

    Yeah. I started to type, "Reduced CPU time by ..." and list a percentage. Then decided the actual number of milliseconds it ended up with would be better, and deleted "by" and forgot to put "to" in there.

    Yes, reduced from 734 ms to 633 ms. About 11%.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • GSquared,

    It is good to know it when someone uses a posted solution in a production environment because it at least confirms that it probably works pretty well (i.e., no bugs).

    The reason I seemed a little testy early on about performance is because I took your solution and stacked in next to mine to check the execution plan. In that case, I added a 100 number CTE tally. The results screamed at me that mine would perform better as your query plan took up 97% of the cost (without the UPDATE). Obviously, I know that isn't the full story and that's why I didn't post that result before. The speed you're reporting looks pretty good in fact it's probably doubtful that mine will beat it, but at least it was fun to write.

    Until that is when you made that ampersand bug jump out at me. Very irritating side effect of XML and it made the code-around ugly.

    Suffice it to say there are a couple of lessons learned here and I may still yet take a page from Jeff Moden, anti-RBAR Jedi and use the 1000000 row test harness to see just how bad my solution is.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • After all was said and done, I decided to add the famous 1000000 row test harness (only with 100000 rows though), so I came up with the following two tests and results (run on Dell Inspiron Core i5 4GB).

    1st solution: Recursive CTE followed by Dynamic SQL to do the UPDATE:

    CREATE TABLE #t (strings VARCHAR(25));

    CREATE TABLE #b (bad CHAR(1));

    DECLARE @SQL NVARCHAR(MAX), @n INT, @i INT;

    DECLARE @base VARCHAR(83)

    SET @base = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789~!@#$%^&*()_+-=[]{}?/'

    DECLARE @StartValue INT, @EndValue INT, @Range INT

    SELECT @StartValue = 1

    ,@EndValue = 83

    ,@Range = @EndValue - @StartValue + 1

    INSERT INTO #t (strings)

    SELECT ''

    UNION ALL SELECT '@'

    UNION ALL SELECT '&'

    ;WITH cteTally (n) AS (

    SELECT TOP 100000 ROW_NUMBER() OVER (ORDER BY (SELECT NULL))

    FROM sys.all_columns ac1 CROSS JOIN sys.all_columns ac2)

    INSERT INTO #t (strings)

    SELECT SUBSTRING(@base, ABS(CHECKSUM(NEWID())) % @Range + @StartValue,1)

    + SUBSTRING(@base, ABS(CHECKSUM(NEWID())) % @Range + @StartValue,1)

    + SUBSTRING(@base, ABS(CHECKSUM(NEWID())) % @Range + @StartValue,1)

    + SUBSTRING(@base, ABS(CHECKSUM(NEWID())) % @Range + @StartValue,1)

    + SUBSTRING(@base, ABS(CHECKSUM(NEWID())) % @Range + @StartValue,1)

    + SUBSTRING(@base, ABS(CHECKSUM(NEWID())) % @Range + @StartValue,1)

    + SUBSTRING(@base, ABS(CHECKSUM(NEWID())) % @Range + @StartValue,1)

    + SUBSTRING(@base, ABS(CHECKSUM(NEWID())) % @Range + @StartValue,1)

    + SUBSTRING(@base, ABS(CHECKSUM(NEWID())) % @Range + @StartValue,1)

    + SUBSTRING(@base, ABS(CHECKSUM(NEWID())) % @Range + @StartValue,1)

    + SUBSTRING(@base, ABS(CHECKSUM(NEWID())) % @Range + @StartValue,1)

    + SUBSTRING(@base, ABS(CHECKSUM(NEWID())) % @Range + @StartValue,1)

    + SUBSTRING(@base, ABS(CHECKSUM(NEWID())) % @Range + @StartValue,1)

    + SUBSTRING(@base, ABS(CHECKSUM(NEWID())) % @Range + @StartValue,1)

    + SUBSTRING(@base, ABS(CHECKSUM(NEWID())) % @Range + @StartValue,1)

    + SUBSTRING(@base, ABS(CHECKSUM(NEWID())) % @Range + @StartValue,1)

    + SUBSTRING(@base, ABS(CHECKSUM(NEWID())) % @Range + @StartValue,1)

    + SUBSTRING(@base, ABS(CHECKSUM(NEWID())) % @Range + @StartValue,1)

    + SUBSTRING(@base, ABS(CHECKSUM(NEWID())) % @Range + @StartValue,1)

    + SUBSTRING(@base, ABS(CHECKSUM(NEWID())) % @Range + @StartValue,1)

    + SUBSTRING(@base, ABS(CHECKSUM(NEWID())) % @Range + @StartValue,1)

    + SUBSTRING(@base, ABS(CHECKSUM(NEWID())) % @Range + @StartValue,1)

    + SUBSTRING(@base, ABS(CHECKSUM(NEWID())) % @Range + @StartValue,1)

    + SUBSTRING(@base, ABS(CHECKSUM(NEWID())) % @Range + @StartValue,1)

    + SUBSTRING(@base, ABS(CHECKSUM(NEWID())) % @Range + @StartValue,1)

    FROM cteTally

    SET STATISTICS TIME ON

    PRINT 'QUERY 1 START'

    -- Identify the bad characters

    ;WITH BadStrings AS (

    SELECT strings

    FROM #t

    WHERE strings LIKE '%[^A-Za-z0-9]%'

    )

    ,Parse AS (

    SELECT SUBSTRING(strings + '*', PATINDEX('%[^A-Za-z0-9]%',strings + '*'), 1) as bad

    ,SUBSTRING(strings + '*', 1+PATINDEX('%[^A-Za-z0-9]%',strings + '*'), LEN(strings + '*')) as rest

    FROM BadStrings

    UNION ALL

    SELECT SUBSTRING(rest, PATINDEX('%[^A-Za-z0-9]%',rest), 1)

    ,SUBSTRING(rest, 1+PATINDEX('%[^A-Za-z0-9]%',rest), LEN(rest))

    FROM Parse WHERE LEN(rest) > 0 and

    SUBSTRING(rest, 1+PATINDEX('%[^A-Za-z0-9]%',rest), LEN(rest)) <> ''

    )

    INSERT INTO #b (bad)

    SELECT DISTINCT bad

    FROM Parse b

    PRINT 'QUERY 1 END'

    -- Now clean them up

    SELECT @SQL = 'UPDATE #t SET strings = ', @i = 1, @n = (SELECT COUNT(*) FROM #b WHERE bad <> '&')

    WHILE @i <= @n + 1

    SELECT @SQL = @SQL + 'REPLACE(', @i = @i + 1

    SELECT @SQL = @SQL + 'strings, '''

    +(SELECT bad + ''', ''''), '''FROM #bWHERE bad <> '&' FOR XML PATH(''))

    SELECT @SQL = SUBSTRING(@SQL, 1, LEN(@SQL)-3) + ',''&'','''')' + 'WHERE strings LIKE ''%[^A-Za-z0-9]%'''

    --SELECT @SQL

    PRINT 'QUERY 2 START'

    EXEC (@SQL)

    SELECT * FROM #t

    PRINT 'QUERY 2 END'

    DROP TABLE #t

    DROP TABLE #b

    Results:

    1. 100003 rows updated in about 13 seconds including data set up time

    2nd solution: Provided by GSquared:

    CREATE TABLE #t (strings VARCHAR(25));

    DECLARE @base VARCHAR(83)

    SET @base = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789~!@#$%^&*()_+-=[]{}?/'

    DECLARE @StartValue INT, @EndValue INT, @Range INT

    SELECT @StartValue = 1

    ,@EndValue = 83

    ,@Range = @EndValue - @StartValue + 1

    INSERT INTO #t (strings)

    SELECT ''

    UNION ALL SELECT '@'

    UNION ALL SELECT '&'

    ;WITH cteTally (n) AS (

    SELECT TOP 100000 ROW_NUMBER() OVER (ORDER BY (SELECT NULL))

    FROM sys.all_columns ac1 CROSS JOIN sys.all_columns ac2)

    INSERT INTO #t (strings)

    SELECT SUBSTRING(@base, ABS(CHECKSUM(NEWID())) % @Range + @StartValue,1)

    + SUBSTRING(@base, ABS(CHECKSUM(NEWID())) % @Range + @StartValue,1)

    + SUBSTRING(@base, ABS(CHECKSUM(NEWID())) % @Range + @StartValue,1)

    + SUBSTRING(@base, ABS(CHECKSUM(NEWID())) % @Range + @StartValue,1)

    + SUBSTRING(@base, ABS(CHECKSUM(NEWID())) % @Range + @StartValue,1)

    + SUBSTRING(@base, ABS(CHECKSUM(NEWID())) % @Range + @StartValue,1)

    + SUBSTRING(@base, ABS(CHECKSUM(NEWID())) % @Range + @StartValue,1)

    + SUBSTRING(@base, ABS(CHECKSUM(NEWID())) % @Range + @StartValue,1)

    + SUBSTRING(@base, ABS(CHECKSUM(NEWID())) % @Range + @StartValue,1)

    + SUBSTRING(@base, ABS(CHECKSUM(NEWID())) % @Range + @StartValue,1)

    + SUBSTRING(@base, ABS(CHECKSUM(NEWID())) % @Range + @StartValue,1)

    + SUBSTRING(@base, ABS(CHECKSUM(NEWID())) % @Range + @StartValue,1)

    + SUBSTRING(@base, ABS(CHECKSUM(NEWID())) % @Range + @StartValue,1)

    + SUBSTRING(@base, ABS(CHECKSUM(NEWID())) % @Range + @StartValue,1)

    + SUBSTRING(@base, ABS(CHECKSUM(NEWID())) % @Range + @StartValue,1)

    + SUBSTRING(@base, ABS(CHECKSUM(NEWID())) % @Range + @StartValue,1)

    + SUBSTRING(@base, ABS(CHECKSUM(NEWID())) % @Range + @StartValue,1)

    + SUBSTRING(@base, ABS(CHECKSUM(NEWID())) % @Range + @StartValue,1)

    + SUBSTRING(@base, ABS(CHECKSUM(NEWID())) % @Range + @StartValue,1)

    + SUBSTRING(@base, ABS(CHECKSUM(NEWID())) % @Range + @StartValue,1)

    + SUBSTRING(@base, ABS(CHECKSUM(NEWID())) % @Range + @StartValue,1)

    + SUBSTRING(@base, ABS(CHECKSUM(NEWID())) % @Range + @StartValue,1)

    + SUBSTRING(@base, ABS(CHECKSUM(NEWID())) % @Range + @StartValue,1)

    + SUBSTRING(@base, ABS(CHECKSUM(NEWID())) % @Range + @StartValue,1)

    + SUBSTRING(@base, ABS(CHECKSUM(NEWID())) % @Range + @StartValue,1)

    FROM cteTally

    SET STATISTICS TIME ON

    PRINT 'QUERY START'

    ;WITH Numbers (Number) AS (

    SELECT TOP 100000 ROW_NUMBER() OVER (ORDER BY (SELECT NULL))

    FROM sys.all_columns ac1 CROSS JOIN sys.all_columns ac2)

    SELECT *,

    (

    SELECT Sub + ''

    FROM (SELECT SUBSTRING(Strings, Number, 1) AS Sub,

    Number

    FROM Numbers -- table of positive integers

    WHERE Number BETWEEN 1 AND LEN(Strings)) AS Parser

    WHERE Sub LIKE '[0-9a-zA-Z]'

    ORDER BY Number

    FOR XML PATH(''),

    TYPE).value('.[1]', 'varchar(100)') fixedstring

    FROM #T ;

    PRINT 'QUERY END'

    DROP TABLE #t

    --DROP TABLE #b

    Results:

    1. Code only SELECTs the resulting fixed strings (no UPDATE)

    2. Query cancelled after 4:52 (mm:ss) with 13158 rows returned

    I must have done something wrong!


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • You can't compare a Select to an Update.

    Here's a test of my method, using the test data you came up with:

    SET NOCOUNT ON ;

    IF OBJECT_ID(N'tempdb..#t') IS NOT NULL

    DROP TABLE #t ;

    CREATE TABLE #t (strings VARCHAR(25)) ;

    DECLARE @base VARCHAR(83)

    SET @base = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789~!@#$%^&*()_+-=[]{}?/'

    DECLARE @StartValue INT,

    @EndValue INT,

    @Range INT

    SELECT @StartValue = 1,

    @EndValue = 83,

    @Range = @EndValue - @StartValue + 1

    INSERT INTO #t

    (strings)

    SELECT ''

    UNION ALL

    SELECT '@'

    UNION ALL

    SELECT '&' ;

    WITH cteTally(n)

    AS (SELECT TOP 100000

    ROW_NUMBER() OVER (ORDER BY (SELECT NULL))

    FROM sys.all_columns ac1

    CROSS JOIN sys.all_columns ac2)

    INSERT INTO #t

    (strings)

    SELECT SUBSTRING(@base, ABS(CHECKSUM(NEWID())) % @Range + @StartValue, 1) + SUBSTRING(@base,

    ABS(CHECKSUM(NEWID()))

    % @Range

    + @StartValue, 1)

    + SUBSTRING(@base, ABS(CHECKSUM(NEWID())) % @Range + @StartValue, 1) + SUBSTRING(@base,

    ABS(CHECKSUM(NEWID()))

    % @Range

    + @StartValue, 1)

    + SUBSTRING(@base, ABS(CHECKSUM(NEWID())) % @Range + @StartValue, 1) + SUBSTRING(@base,

    ABS(CHECKSUM(NEWID()))

    % @Range

    + @StartValue, 1)

    + SUBSTRING(@base, ABS(CHECKSUM(NEWID())) % @Range + @StartValue, 1) + SUBSTRING(@base,

    ABS(CHECKSUM(NEWID()))

    % @Range

    + @StartValue, 1)

    + SUBSTRING(@base, ABS(CHECKSUM(NEWID())) % @Range + @StartValue, 1) + SUBSTRING(@base,

    ABS(CHECKSUM(NEWID()))

    % @Range

    + @StartValue, 1)

    + SUBSTRING(@base, ABS(CHECKSUM(NEWID())) % @Range + @StartValue, 1) + SUBSTRING(@base,

    ABS(CHECKSUM(NEWID()))

    % @Range

    + @StartValue, 1)

    + SUBSTRING(@base, ABS(CHECKSUM(NEWID())) % @Range + @StartValue, 1) + SUBSTRING(@base,

    ABS(CHECKSUM(NEWID()))

    % @Range

    + @StartValue, 1)

    + SUBSTRING(@base, ABS(CHECKSUM(NEWID())) % @Range + @StartValue, 1) + SUBSTRING(@base,

    ABS(CHECKSUM(NEWID()))

    % @Range

    + @StartValue, 1)

    + SUBSTRING(@base, ABS(CHECKSUM(NEWID())) % @Range + @StartValue, 1) + SUBSTRING(@base,

    ABS(CHECKSUM(NEWID()))

    % @Range

    + @StartValue, 1)

    + SUBSTRING(@base, ABS(CHECKSUM(NEWID())) % @Range + @StartValue, 1) + SUBSTRING(@base,

    ABS(CHECKSUM(NEWID()))

    % @Range

    + @StartValue, 1)

    + SUBSTRING(@base, ABS(CHECKSUM(NEWID())) % @Range + @StartValue, 1) + SUBSTRING(@base,

    ABS(CHECKSUM(NEWID()))

    % @Range

    + @StartValue, 1)

    + SUBSTRING(@base, ABS(CHECKSUM(NEWID())) % @Range + @StartValue, 1) + SUBSTRING(@base,

    ABS(CHECKSUM(NEWID()))

    % @Range

    + @StartValue, 1)

    + SUBSTRING(@base, ABS(CHECKSUM(NEWID())) % @Range + @StartValue, 1)

    FROM cteTally

    PRINT 'QUERY START'

    SET STATISTICS TIME ON ;

    WITH Seeds(Seed)

    AS (SELECT V

    FROM ( VALUES ( 1), ( 1), ( 1), ( 1), ( 1), ( 1), ( 1), ( 1), ( 1), ( 1) ) AS T (V)),

    Numbers(Number)

    AS (SELECT ROW_NUMBER() OVER (ORDER BY S1.Seed)

    FROM Seeds AS S1

    CROSS JOIN Seeds AS S2)

    UPDATE #t

    SET strings =

    ( SELECT Sub + ''

    FROM (SELECT SUBSTRING(Strings, Number, 1) AS Sub,

    Number

    FROM Numbers -- table of positive integers

    WHERE Number BETWEEN 1 AND LEN(Strings)) AS Parser

    WHERE Sub LIKE '[0-9a-zA-Z]'

    ORDER BY Number

    FOR XML PATH(''),

    TYPE).value('.[1]', 'varchar(100)')

    SET STATISTICS TIME OFF ;

    PRINT 'QUERY END'

    DROP TABLE #t

    --DROP TABLE #b

    On my machine, results are:

    QUERY START

    SQL Server Execution Times:

    CPU time = 5257 ms, elapsed time = 6053 ms.

    QUERY END

    Same machine, your version took about 12 seconds. Twice as long. Remove the output from it, and it looks like this:

    SET NOCOUNT ON ;

    IF OBJECT_ID(N'tempdb..#t') IS NOT NULL

    DROP TABLE #t ;

    IF OBJECT_ID(N'tempdb..#b') IS NOT NULL

    DROP TABLE #b ;

    CREATE TABLE #t (strings VARCHAR(25)) ;

    CREATE TABLE #b (bad CHAR(1)) ;

    DECLARE @SQL NVARCHAR(MAX),

    @n INT,

    @i INT ;

    DECLARE @base VARCHAR(83)

    SET @base = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789~!@#$%^&*()_+-=[]{}?/'

    DECLARE @StartValue INT,

    @EndValue INT,

    @Range INT

    SELECT @StartValue = 1,

    @EndValue = 83,

    @Range = @EndValue - @StartValue + 1

    INSERT INTO #t

    (strings)

    SELECT ''

    UNION ALL

    SELECT '@'

    UNION ALL

    SELECT '&' ;

    WITH cteTally(n)

    AS (SELECT TOP 100000

    ROW_NUMBER() OVER (ORDER BY (SELECT NULL))

    FROM sys.all_columns ac1

    CROSS JOIN sys.all_columns ac2)

    INSERT INTO #t

    (strings)

    SELECT SUBSTRING(@base, ABS(CHECKSUM(NEWID())) % @Range + @StartValue, 1) + SUBSTRING(@base,

    ABS(CHECKSUM(NEWID()))

    % @Range

    + @StartValue, 1)

    + SUBSTRING(@base, ABS(CHECKSUM(NEWID())) % @Range + @StartValue, 1) + SUBSTRING(@base,

    ABS(CHECKSUM(NEWID()))

    % @Range

    + @StartValue, 1)

    + SUBSTRING(@base, ABS(CHECKSUM(NEWID())) % @Range + @StartValue, 1) + SUBSTRING(@base,

    ABS(CHECKSUM(NEWID()))

    % @Range

    + @StartValue, 1)

    + SUBSTRING(@base, ABS(CHECKSUM(NEWID())) % @Range + @StartValue, 1) + SUBSTRING(@base,

    ABS(CHECKSUM(NEWID()))

    % @Range

    + @StartValue, 1)

    + SUBSTRING(@base, ABS(CHECKSUM(NEWID())) % @Range + @StartValue, 1) + SUBSTRING(@base,

    ABS(CHECKSUM(NEWID()))

    % @Range

    + @StartValue, 1)

    + SUBSTRING(@base, ABS(CHECKSUM(NEWID())) % @Range + @StartValue, 1) + SUBSTRING(@base,

    ABS(CHECKSUM(NEWID()))

    % @Range

    + @StartValue, 1)

    + SUBSTRING(@base, ABS(CHECKSUM(NEWID())) % @Range + @StartValue, 1) + SUBSTRING(@base,

    ABS(CHECKSUM(NEWID()))

    % @Range

    + @StartValue, 1)

    + SUBSTRING(@base, ABS(CHECKSUM(NEWID())) % @Range + @StartValue, 1) + SUBSTRING(@base,

    ABS(CHECKSUM(NEWID()))

    % @Range

    + @StartValue, 1)

    + SUBSTRING(@base, ABS(CHECKSUM(NEWID())) % @Range + @StartValue, 1) + SUBSTRING(@base,

    ABS(CHECKSUM(NEWID()))

    % @Range

    + @StartValue, 1)

    + SUBSTRING(@base, ABS(CHECKSUM(NEWID())) % @Range + @StartValue, 1) + SUBSTRING(@base,

    ABS(CHECKSUM(NEWID()))

    % @Range

    + @StartValue, 1)

    + SUBSTRING(@base, ABS(CHECKSUM(NEWID())) % @Range + @StartValue, 1) + SUBSTRING(@base,

    ABS(CHECKSUM(NEWID()))

    % @Range

    + @StartValue, 1)

    + SUBSTRING(@base, ABS(CHECKSUM(NEWID())) % @Range + @StartValue, 1) + SUBSTRING(@base,

    ABS(CHECKSUM(NEWID()))

    % @Range

    + @StartValue, 1)

    + SUBSTRING(@base, ABS(CHECKSUM(NEWID())) % @Range + @StartValue, 1)

    FROM cteTally

    PRINT 'QUERY 1 START'

    SET STATISTICS TIME ON

    -- Identify the bad characters

    ;

    WITH BadStrings

    AS (SELECT strings

    FROM #t

    WHERE strings LIKE '%[^A-Za-z0-9]%'),

    Parse

    AS (SELECT SUBSTRING(strings + '*', PATINDEX('%[^A-Za-z0-9]%', strings + '*'), 1) AS bad,

    SUBSTRING(strings + '*', 1 + PATINDEX('%[^A-Za-z0-9]%', strings + '*'), LEN(strings + '*')) AS rest

    FROM BadStrings

    UNION ALL

    SELECT SUBSTRING(rest, PATINDEX('%[^A-Za-z0-9]%', rest), 1),

    SUBSTRING(rest, 1 + PATINDEX('%[^A-Za-z0-9]%', rest), LEN(rest))

    FROM Parse

    WHERE LEN(rest) > 0

    AND SUBSTRING(rest, 1 + PATINDEX('%[^A-Za-z0-9]%', rest), LEN(rest)) <> '')

    INSERT INTO #b

    (bad)

    SELECT DISTINCT

    bad

    FROM Parse b

    PRINT 'QUERY 1 END'

    -- Now clean them up

    SELECT @SQL = 'UPDATE #t SET strings = ',

    @i = 1,

    @n = (SELECT COUNT(*)

    FROM #b

    WHERE bad <> '&')

    WHILE @i <= @n + 1

    SELECT @SQL = @SQL + 'REPLACE(',

    @i = @i + 1

    SELECT @SQL = @SQL + 'strings, ''' + (SELECT bad + ''', ''''), '''

    FROM #b

    WHERE bad <> '&'

    FOR XML PATH(''))

    SELECT @SQL = SUBSTRING(@SQL, 1, LEN(@SQL) - 3) + ',''&'','''')' + 'WHERE strings LIKE ''%[^A-Za-z0-9]%'''

    --SELECT @SQL

    PRINT 'QUERY 2 START'

    EXEC (@SQL)

    SET STATISTICS TIME OFF ;

    PRINT 'QUERY 2 END'

    Results:

    QUERY 1 START

    SQL Server Execution Times:

    CPU time = 8674 ms, elapsed time = 8849 ms.

    QUERY 1 END

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 0 ms.

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 0 ms.

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 0 ms.

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 0 ms.

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 0 ms.

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 0 ms.

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 0 ms.

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 0 ms.

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 0 ms.

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 0 ms.

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 0 ms.

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 0 ms.

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 0 ms.

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 0 ms.

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 0 ms.

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 0 ms.

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 0 ms.

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 0 ms.

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 0 ms.

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 0 ms.

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 0 ms.

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 0 ms.

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 0 ms.

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 0 ms.

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 0 ms.

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 0 ms.

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 0 ms.

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 0 ms.

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 0 ms.

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 0 ms.

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 0 ms.

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 0 ms.

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 0 ms.

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 0 ms.

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 0 ms.

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 0 ms.

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 0 ms.

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 0 ms.

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 0 ms.

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 0 ms.

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 0 ms.

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 0 ms.

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 0 ms.

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 0 ms.

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 0 ms.

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 0 ms.

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 0 ms.

    QUERY 2 START

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 0 ms.

    SQL Server parse and compile time:

    CPU time = 0 ms, elapsed time = 0 ms.

    SQL Server Execution Times:

    CPU time = 1045 ms, elapsed time = 1039 ms.

    SQL Server Execution Times:

    CPU time = 1076 ms, elapsed time = 1062 ms.

    QUERY 2 END

    Total runtime, about 11 seconds, not including the data generation.

    By the way, the two-step CTE Numbers table (Seeds then Numbers) that I used in my modification, is faster than using system tables to generate it. If you're going to query a table, from disk or cache, you may as well use an actual Numbers table instead of a CTE version. The whole idea is to make it a purely RAM+CPU object with no table behind it at all. That avoids locking overhead, which is the main reason a virtual table is faster than a physical one.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • On seconds runs (taking advantage of caching), my update-only version of your script took about 11 seconds, while my script took 3.8 seconds.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • GSquared,

    I see no flaws in your comparison. Clearly the UPDATE version of yours is faster than the original one you posted with only a SELECT. Too bad I didn't take the time to make that modification myself.

    I can't take credit for the SYS tables tally generator nor the random numbers generator as I appropriated them from Jeff Moden's excellent articles on the subject. I'll let him take it up with you if he disagrees on which is faster.

    I would like to point out though, that if you know in advance exactly what are "bad" characters, you don't need to run my initial SELECT (that was really in there kind of just for fun anyway). You can just keep them stored in a table or put them into a CTE and then run only my Dynamic SQL, which from your timing runs appears to run in about 2 seconds.

    Thanks for taking the time to get it all sorted out.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • You don't need to know the bad characters at all. White-list, don't black-list. It's more secure, easier, and faster.

    You could modify my version to use a table of "allowed characters", skip the Like statement and join to that table (virtual or real), and probably gain a few milliseconds on the whole thing.

    And, yes, I've discussed uses of Numbers tables with Jeff. More than once. Several times, I've helped him speed up code he was using for various things. His skillset and mine overlap considerably. I was one of the people who told him to start using a 0-base Numbers/Tally table several years ago, instead of starting it at 1, to get a performance improvement on string parsing, for example.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • GSquared (4/13/2012)


    You don't need to know the bad characters at all. White-list, don't black-list. It's more secure, easier, and faster.

    You could modify my version to use a table of "allowed characters", skip the Like statement and join to that table (virtual or real), and probably gain a few milliseconds on the whole thing.

    And, yes, I've discussed uses of Numbers tables with Jeff. More than once. Several times, I've helped him speed up code he was using for various things. His skillset and mine overlap considerably. I was one of the people who told him to start using a 0-base Numbers/Tally table several years ago, instead of starting it at 1, to get a performance improvement on string parsing, for example.

    It turns out that the 1 based parser is faster but it was definitely a good thought.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 16 through 29 (of 29 total)

You must be logged in to reply to this topic. Login to reply