CTE Troubles

  • Apologies for the length of the code here.

    Like many others here i regularly enter Jacob Sebastians challenges.

    On #17 i entered one which used a temp table , at the time i can remember having one with a CTE instead, but the Temp Table had a lower overall cost. It was then specifed that temp tables were not to be used.

    So on revisiting my query to remove the temp table and replace it with the cte called CTEFinalize, im now getting the error "Invalid length parameter passed to the SUBSTRING function".

    I , obviously know that CTE's are not processed and cached in strict order, has any one got any suggestions on how to modify the code so that it does work.

    As proof that the substring does work in isolation , and that the is something else odd occurring, replace the final select with "Select * from PreJoinList" (as i have commented out).

    Its a bit of an itch that wont go away 🙂

    DECLARE @t TABLE(

    id TINYINT,

    keyword VARCHAR(20),

    data VARCHAR(35)

    )

    INSERT INTO @t(id, keyword, data)

    SELECT 1, 'Pet Store', 'Microsoft SQL Server'

    UNION ALL

    SELECT 2, 'SQL Server Database', 'Dinner at a New York Restaurant'

    UNION ALL

    SELECT 3, 'Restaurant', 'Welcome to TSQL Challenges 17'

    UNION ALL

    SELECT 4, 'New York', 'Bob is a Database Expert'

    UNION ALL

    SELECT 5, 'TSQL Challenges', 'Is Microsoft Listening?'

    UNION ALL

    SELECT 6, 'Microsoft', 'New Challenges are coming up'

    ;

    with HtmlData

    as

    (

    Select id,

    keyword,

    convert(char(35),data) as data,

    Len(data) as Len

    from @t

    ),

    Num1 (n) AS (SELECT 1 UNION ALL SELECT 1),

    Num2 (n) AS (SELECT 1 FROM Num1 AS X, Num1 AS Y),

    Num3 (n) AS (SELECT 1 FROM Num2 AS X, Num2 AS Y),

    Num4 (n) AS (SELECT 1 FROM Num3 AS X, Num3 AS Y),

    Nums (n) AS (SELECT ROW_NUMBER() OVER(ORDER BY n) FROM Num4),

    SpacesPos

    as

    (

    Select id ,

    n as SpacePos,

    ' '+Data+' ' as Data,

    Len as DataLen

    from nums

    join HtmlData

    on n <= HtmlData.Len+2

    where substring(' '+HtmlData.data+' ',n,1)=' '

    ),

    WordList

    as

    (

    Select SpacesPos.Id,

    SpacesPos.SpacePos as SpacePos,

    SpacesPos.Data as FullData,

    Subsequent.SpacePos as SubsequentSpacePos,

    substring(SpacesPos.Data,SpacesPos.SpacePos+1,Subsequent.SpacePos -(SpacesPos.SpacePos+1)) as Keyword,

    SpacesPos.DataLen ,

    Subsequent.SpacePos -(SpacesPos.SpacePos+1) as KeyWordLen

    from SpacesPos join SpacesPos Subsequent

    on SpacesPos.ID = Subsequent.ID

    and Subsequent.SpacePos > SpacesPos.SpacePos

    )

    ,

    cteFinalize

    as

    (

    select WordList.ID,

    WordList.SpacePos as KeyWordStart,

    WordList.SubsequentSpacePos-1 as KeyWordEnd,

    WordList.FullData,

    WordList.KeyWord,

    HtmlData.ID as FoundWithIn,

    WordList.DataLen ,

    WordList.KeyWordLen

    From WordList join HtmlData

    on WordList.KeyWord = HtmlData.KeyWord

    )

    ,PreJoinList

    as

    (

    Select Finalize.Id,

    KeyWordStart,

    cast('<a href="tsql.com?id='+convert(varchar(10),FoundWithin)+'">'+KeyWord+'</a> ' as nvarchar(50) )as KeyWord,

    FoundWithin

    from cteFinalize Finalize

    union all

    Select distinct HtmlData.id,

    nums.n,

    substring(HtmlData.Data,nums.n,1),

    NULL

    from HtmlData

    join nums

    on nums.N <= HtmlData.Len

    left join cteFinalize Finalize

    on HtmlData.ID = Finalize.ID

    and nums.N between KeyWordStart and KeyWordEnd

    where Finalize.ID is null

    )

    --Select * from PreJoinList

    Select HtmlData.Id, HtmlData.Keyword,OutText.Out as data

    from HtmlData

    cross apply(select

    (Select

    (Select KeyWord as [text()]

    from

    (Select KeyWordStart,KeyWord

    from PreJoinList

    where PreJoinList.Id = HtmlData.Id

    ) r

    order by KeyWordStart

    for xml path(''),type

    ) as concat FOR XML RAW, TYPE).value('/row[1]/concat[1]', 'varchar(max)')) as OutText(Out)

    order by id



    Clear Sky SQL
    My Blog[/url]

  • You can rewrite

    SUBSTRING(@str, @start, @len)

    as

    RIGHT(LEFT(@str, @start + @len - 1), @len)

    but I don't think it will help much. I tried your code with that substitution, but it still threw errors, so I had to add a CASE expression for the RIGHT length parameter. I don't know if this is the expected result, give it a try.

    DECLARE @t TABLE(

    id TINYINT,

    keyword VARCHAR(20),

    data VARCHAR(35)

    )

    INSERT INTO @t(id, keyword, data)

    SELECT 1, 'Pet Store', 'Microsoft SQL Server'

    UNION ALL

    SELECT 2, 'SQL Server Database', 'Dinner at a New York Restaurant'

    UNION ALL

    SELECT 3, 'Restaurant', 'Welcome to TSQL Challenges 17'

    UNION ALL

    SELECT 4, 'New York', 'Bob is a Database Expert'

    UNION ALL

    SELECT 5, 'TSQL Challenges', 'Is Microsoft Listening?'

    UNION ALL

    SELECT 6, 'Microsoft', 'New Challenges are coming up'

    ;

    with HtmlData

    as

    (

    Select id,

    keyword,

    convert(char(35),data) as data,

    Len(data) as Len

    from @t

    ),

    Num1 (n) AS (SELECT 1 UNION ALL SELECT 1),

    Num2 (n) AS (SELECT 1 FROM Num1 AS X, Num1 AS Y),

    Num3 (n) AS (SELECT 1 FROM Num2 AS X, Num2 AS Y),

    Num4 (n) AS (SELECT 1 FROM Num3 AS X, Num3 AS Y),

    Nums (n) AS (SELECT ROW_NUMBER() OVER(ORDER BY n) FROM Num4),

    SpacesPos

    as

    (

    Select id ,

    n as SpacePos,

    ' '+Data+' ' as Data,

    Len as DataLen

    from nums

    join HtmlData

    on n <= HtmlData.Len+2

    where right(left(' '+HtmlData.data+' ',n),1)=' '

    ),

    WordList

    as

    (

    Select SpacesPos.Id,

    SpacesPos.SpacePos as SpacePos,

    SpacesPos.Data as FullData,

    Subsequent.SpacePos as SubsequentSpacePos,

    right(left(SpacesPos.Data,SpacesPos.SpacePos+Subsequent.SpacePos -(SpacesPos.SpacePos+1)), CASE WHEN Subsequent.SpacePos -(SpacesPos.SpacePos+1) >= 0 THEN Subsequent.SpacePos -(SpacesPos.SpacePos+1) ELSE 0 END ) as Keyword,

    SpacesPos.DataLen ,

    Subsequent.SpacePos -(SpacesPos.SpacePos+1) as KeyWordLen

    from SpacesPos join SpacesPos Subsequent

    on SpacesPos.ID = Subsequent.ID

    and Subsequent.SpacePos > SpacesPos.SpacePos

    )

    ,

    cteFinalize

    as

    (

    select WordList.ID,

    WordList.SpacePos as KeyWordStart,

    WordList.SubsequentSpacePos-1 as KeyWordEnd,

    WordList.FullData,

    WordList.KeyWord,

    HtmlData.ID as FoundWithIn,

    WordList.DataLen ,

    WordList.KeyWordLen

    From WordList join HtmlData

    on WordList.KeyWord = HtmlData.KeyWord

    )

    ,PreJoinList

    as

    (

    Select Finalize.Id,

    KeyWordStart,

    cast('<a href="tsql.com?id='+convert(varchar(10),FoundWithin)+'">'+KeyWord+'</a> ' as nvarchar(50) )as KeyWord,

    FoundWithin

    from cteFinalize Finalize

    union all

    Select distinct HtmlData.id,

    nums.n,

    right(left(HtmlData.Data,nums.n),1),

    NULL

    from HtmlData

    join nums

    on nums.N <= HtmlData.Len

    left join cteFinalize Finalize

    on HtmlData.ID = Finalize.ID

    and nums.N between KeyWordStart and KeyWordEnd

    where Finalize.ID is null

    )

    --Select * from PreJoinList

    Select HtmlData.Id, HtmlData.Keyword,OutText.Out as data

    from HtmlData

    cross apply(select

    (Select

    (Select KeyWord as [text()]

    from

    (Select KeyWordStart,KeyWord

    from PreJoinList

    where PreJoinList.Id = HtmlData.Id

    ) r

    order by KeyWordStart

    for xml path(''),type

    ) as concat FOR XML RAW, TYPE).value('/row[1]/concat[1]', 'varchar(max)')) as OutText(Out)

    order by id

    -- Gianluca Sartori

  • I'm not sure why but you have to enlarge the numbers table to get rid of the error. After that the query runs in 82 seconds on my machine.

    Num1 (n) AS (SELECT 1 UNION ALL SELECT 1),

    Num2 (n) AS (SELECT 1 FROM Num1 AS X, Num1 AS Y),

    Num3 (n) AS (SELECT 1 FROM Num2 AS X, Num2 AS Y),

    Num4 (n) AS (SELECT 1 FROM Num3 AS X, Num3 AS Y),

    Num5 (n) AS (SELECT 1 FROM Num4 AS X, Num4 AS Y),

    Nums (n) AS (SELECT ROW_NUMBER() OVER(ORDER BY n) FROM Num5),

    Peter

  • Peter Brinkhaus (3/10/2010)


    I'm not sure why but you have to enlarge the numbers table to get rid of the error. After that the query runs in 82 seconds on my machine.

    Num1 (n) AS (SELECT 1 UNION ALL SELECT 1),

    Num2 (n) AS (SELECT 1 FROM Num1 AS X, Num1 AS Y),

    Num3 (n) AS (SELECT 1 FROM Num2 AS X, Num2 AS Y),

    Num4 (n) AS (SELECT 1 FROM Num3 AS X, Num3 AS Y),

    Num5 (n) AS (SELECT 1 FROM Num4 AS X, Num4 AS Y),

    Nums (n) AS (SELECT ROW_NUMBER() OVER(ORDER BY n) FROM Num5),

    Peter

    Ahhh....

    Doing the tally table from spt_values stops the error occuring too.

    Nums (n) AS (SELECT number from master..spt_values where type ='P' and number between 1 and 255)

    Gianluca , your results were spot on 🙂



    Clear Sky SQL
    My Blog[/url]

  • You could also benefit from a strange behaviour that I found trying to generate a Tally table with no IO: using UNPIVOT seems to be slightly faster than multiple UNIONS or recursive CTEs:

    ;WITH tenRows AS (

    SELECT i

    FROM (

    SELECT 1 AS [1], 2 AS [2], 3 AS [3], 4 AS [4], 5 AS [5],

    6 AS [6], 7 AS [7], 8 AS [8], 9 AS [9], 10 AS [10]

    ) AS p

    UNPIVOT (i FOR numbers IN ([1],[2],[3],[4],[5],[6],[7],[8],[9],[10]) ) as unpvt

    ),

    thousandRows AS (

    SELECT 0 as n

    FROM tenRows AS a

    CROSS JOIN tenRows AS b

    CROSS JOIN tenRows AS c

    ) ,

    millionRows AS (

    SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS n

    FROM thousandRows AS a

    CROSS JOIN thousandRows AS b

    )

    SELECT n

    FROM millionRows

    This generates one million numbers in about 400 ms on my laptop.

    Good luck with the competition!

    -- Gianluca Sartori

  • Gianluca Sartori (3/10/2010)

    This generates one million numbers in about 400 ms on my laptop.

    Good luck with the competition!

    Interesting , ill look at the unpivot.

    The competition has closed long ago , this was just on my 'todo' list, ei It may crop up in production and i want a decent fix / workaround.



    Clear Sky SQL
    My Blog[/url]

  • Congrats on winning challenge 19!!!

    ____________________________________________________

    Deja View - The strange feeling that somewhere, sometime you've optimised this query before

    How to get the best help on a forum

    http://www.sqlservercentral.com/articles/Best+Practices/61537
  • Mark-101232 (5/10/2010)


    Congrats on winning challenge 19!!!

    Thanks , Id like to thank the academy and wish for world peace 🙂



    Clear Sky SQL
    My Blog[/url]

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply