March 10, 2010 at 2:08 am
Apologies for the length of the code here.
Like many others here i regularly enter Jacob Sebastians challenges.
On #17 i entered one which used a temp table , at the time i can remember having one with a CTE instead, but the Temp Table had a lower overall cost. It was then specifed that temp tables were not to be used.
So on revisiting my query to remove the temp table and replace it with the cte called CTEFinalize, im now getting the error "Invalid length parameter passed to the SUBSTRING function".
I , obviously know that CTE's are not processed and cached in strict order, has any one got any suggestions on how to modify the code so that it does work.
As proof that the substring does work in isolation , and that the is something else odd occurring, replace the final select with "Select * from PreJoinList" (as i have commented out).
Its a bit of an itch that wont go away 🙂
DECLARE @t TABLE(
id TINYINT,
keyword VARCHAR(20),
data VARCHAR(35)
)
INSERT INTO @t(id, keyword, data)
SELECT 1, 'Pet Store', 'Microsoft SQL Server'
UNION ALL
SELECT 2, 'SQL Server Database', 'Dinner at a New York Restaurant'
UNION ALL
SELECT 3, 'Restaurant', 'Welcome to TSQL Challenges 17'
UNION ALL
SELECT 4, 'New York', 'Bob is a Database Expert'
UNION ALL
SELECT 5, 'TSQL Challenges', 'Is Microsoft Listening?'
UNION ALL
SELECT 6, 'Microsoft', 'New Challenges are coming up'
;
with HtmlData
as
(
Select id,
keyword,
convert(char(35),data) as data,
Len(data) as Len
from @t
),
Num1 (n) AS (SELECT 1 UNION ALL SELECT 1),
Num2 (n) AS (SELECT 1 FROM Num1 AS X, Num1 AS Y),
Num3 (n) AS (SELECT 1 FROM Num2 AS X, Num2 AS Y),
Num4 (n) AS (SELECT 1 FROM Num3 AS X, Num3 AS Y),
Nums (n) AS (SELECT ROW_NUMBER() OVER(ORDER BY n) FROM Num4),
SpacesPos
as
(
Select id ,
n as SpacePos,
' '+Data+' ' as Data,
Len as DataLen
from nums
join HtmlData
on n <= HtmlData.Len+2
where substring(' '+HtmlData.data+' ',n,1)=' '
),
WordList
as
(
Select SpacesPos.Id,
SpacesPos.SpacePos as SpacePos,
SpacesPos.Data as FullData,
Subsequent.SpacePos as SubsequentSpacePos,
substring(SpacesPos.Data,SpacesPos.SpacePos+1,Subsequent.SpacePos -(SpacesPos.SpacePos+1)) as Keyword,
SpacesPos.DataLen ,
Subsequent.SpacePos -(SpacesPos.SpacePos+1) as KeyWordLen
from SpacesPos join SpacesPos Subsequent
on SpacesPos.ID = Subsequent.ID
and Subsequent.SpacePos > SpacesPos.SpacePos
)
,
cteFinalize
as
(
select WordList.ID,
WordList.SpacePos as KeyWordStart,
WordList.SubsequentSpacePos-1 as KeyWordEnd,
WordList.FullData,
WordList.KeyWord,
HtmlData.ID as FoundWithIn,
WordList.DataLen ,
WordList.KeyWordLen
From WordList join HtmlData
on WordList.KeyWord = HtmlData.KeyWord
)
,PreJoinList
as
(
Select Finalize.Id,
KeyWordStart,
cast('<a href="tsql.com?id='+convert(varchar(10),FoundWithin)+'">'+KeyWord+'</a> ' as nvarchar(50) )as KeyWord,
FoundWithin
from cteFinalize Finalize
union all
Select distinct HtmlData.id,
nums.n,
substring(HtmlData.Data,nums.n,1),
NULL
from HtmlData
join nums
on nums.N <= HtmlData.Len
left join cteFinalize Finalize
on HtmlData.ID = Finalize.ID
and nums.N between KeyWordStart and KeyWordEnd
where Finalize.ID is null
)
--Select * from PreJoinList
Select HtmlData.Id, HtmlData.Keyword,OutText.Out as data
from HtmlData
cross apply(select
(Select
(Select KeyWord as [text()]
from
(Select KeyWordStart,KeyWord
from PreJoinList
where PreJoinList.Id = HtmlData.Id
) r
order by KeyWordStart
for xml path(''),type
) as concat FOR XML RAW, TYPE).value('/row[1]/concat[1]', 'varchar(max)')) as OutText(Out)
order by id
March 10, 2010 at 4:28 am
You can rewrite
SUBSTRING(@str, @start, @len)
as
RIGHT(LEFT(@str, @start + @len - 1), @len)
but I don't think it will help much. I tried your code with that substitution, but it still threw errors, so I had to add a CASE expression for the RIGHT length parameter. I don't know if this is the expected result, give it a try.
DECLARE @t TABLE(
id TINYINT,
keyword VARCHAR(20),
data VARCHAR(35)
)
INSERT INTO @t(id, keyword, data)
SELECT 1, 'Pet Store', 'Microsoft SQL Server'
UNION ALL
SELECT 2, 'SQL Server Database', 'Dinner at a New York Restaurant'
UNION ALL
SELECT 3, 'Restaurant', 'Welcome to TSQL Challenges 17'
UNION ALL
SELECT 4, 'New York', 'Bob is a Database Expert'
UNION ALL
SELECT 5, 'TSQL Challenges', 'Is Microsoft Listening?'
UNION ALL
SELECT 6, 'Microsoft', 'New Challenges are coming up'
;
with HtmlData
as
(
Select id,
keyword,
convert(char(35),data) as data,
Len(data) as Len
from @t
),
Num1 (n) AS (SELECT 1 UNION ALL SELECT 1),
Num2 (n) AS (SELECT 1 FROM Num1 AS X, Num1 AS Y),
Num3 (n) AS (SELECT 1 FROM Num2 AS X, Num2 AS Y),
Num4 (n) AS (SELECT 1 FROM Num3 AS X, Num3 AS Y),
Nums (n) AS (SELECT ROW_NUMBER() OVER(ORDER BY n) FROM Num4),
SpacesPos
as
(
Select id ,
n as SpacePos,
' '+Data+' ' as Data,
Len as DataLen
from nums
join HtmlData
on n <= HtmlData.Len+2
where right(left(' '+HtmlData.data+' ',n),1)=' '
),
WordList
as
(
Select SpacesPos.Id,
SpacesPos.SpacePos as SpacePos,
SpacesPos.Data as FullData,
Subsequent.SpacePos as SubsequentSpacePos,
right(left(SpacesPos.Data,SpacesPos.SpacePos+Subsequent.SpacePos -(SpacesPos.SpacePos+1)), CASE WHEN Subsequent.SpacePos -(SpacesPos.SpacePos+1) >= 0 THEN Subsequent.SpacePos -(SpacesPos.SpacePos+1) ELSE 0 END ) as Keyword,
SpacesPos.DataLen ,
Subsequent.SpacePos -(SpacesPos.SpacePos+1) as KeyWordLen
from SpacesPos join SpacesPos Subsequent
on SpacesPos.ID = Subsequent.ID
and Subsequent.SpacePos > SpacesPos.SpacePos
)
,
cteFinalize
as
(
select WordList.ID,
WordList.SpacePos as KeyWordStart,
WordList.SubsequentSpacePos-1 as KeyWordEnd,
WordList.FullData,
WordList.KeyWord,
HtmlData.ID as FoundWithIn,
WordList.DataLen ,
WordList.KeyWordLen
From WordList join HtmlData
on WordList.KeyWord = HtmlData.KeyWord
)
,PreJoinList
as
(
Select Finalize.Id,
KeyWordStart,
cast('<a href="tsql.com?id='+convert(varchar(10),FoundWithin)+'">'+KeyWord+'</a> ' as nvarchar(50) )as KeyWord,
FoundWithin
from cteFinalize Finalize
union all
Select distinct HtmlData.id,
nums.n,
right(left(HtmlData.Data,nums.n),1),
NULL
from HtmlData
join nums
on nums.N <= HtmlData.Len
left join cteFinalize Finalize
on HtmlData.ID = Finalize.ID
and nums.N between KeyWordStart and KeyWordEnd
where Finalize.ID is null
)
--Select * from PreJoinList
Select HtmlData.Id, HtmlData.Keyword,OutText.Out as data
from HtmlData
cross apply(select
(Select
(Select KeyWord as [text()]
from
(Select KeyWordStart,KeyWord
from PreJoinList
where PreJoinList.Id = HtmlData.Id
) r
order by KeyWordStart
for xml path(''),type
) as concat FOR XML RAW, TYPE).value('/row[1]/concat[1]', 'varchar(max)')) as OutText(Out)
order by id
-- Gianluca Sartori
March 10, 2010 at 4:56 am
I'm not sure why but you have to enlarge the numbers table to get rid of the error. After that the query runs in 82 seconds on my machine.
Num1 (n) AS (SELECT 1 UNION ALL SELECT 1),
Num2 (n) AS (SELECT 1 FROM Num1 AS X, Num1 AS Y),
Num3 (n) AS (SELECT 1 FROM Num2 AS X, Num2 AS Y),
Num4 (n) AS (SELECT 1 FROM Num3 AS X, Num3 AS Y),
Num5 (n) AS (SELECT 1 FROM Num4 AS X, Num4 AS Y),
Nums (n) AS (SELECT ROW_NUMBER() OVER(ORDER BY n) FROM Num5),
Peter
March 10, 2010 at 5:02 am
Peter Brinkhaus (3/10/2010)
I'm not sure why but you have to enlarge the numbers table to get rid of the error. After that the query runs in 82 seconds on my machine.
Num1 (n) AS (SELECT 1 UNION ALL SELECT 1),
Num2 (n) AS (SELECT 1 FROM Num1 AS X, Num1 AS Y),
Num3 (n) AS (SELECT 1 FROM Num2 AS X, Num2 AS Y),
Num4 (n) AS (SELECT 1 FROM Num3 AS X, Num3 AS Y),
Num5 (n) AS (SELECT 1 FROM Num4 AS X, Num4 AS Y),
Nums (n) AS (SELECT ROW_NUMBER() OVER(ORDER BY n) FROM Num5),
Peter
Ahhh....
Doing the tally table from spt_values stops the error occuring too.
Nums (n) AS (SELECT number from master..spt_values where type ='P' and number between 1 and 255)
Gianluca , your results were spot on 🙂
March 10, 2010 at 6:14 am
You could also benefit from a strange behaviour that I found trying to generate a Tally table with no IO: using UNPIVOT seems to be slightly faster than multiple UNIONS or recursive CTEs:
;WITH tenRows AS (
SELECT i
FROM (
SELECT 1 AS [1], 2 AS [2], 3 AS [3], 4 AS [4], 5 AS [5],
6 AS [6], 7 AS [7], 8 AS [8], 9 AS [9], 10 AS [10]
) AS p
UNPIVOT (i FOR numbers IN ([1],[2],[3],[4],[5],[6],[7],[8],[9],[10]) ) as unpvt
),
thousandRows AS (
SELECT 0 as n
FROM tenRows AS a
CROSS JOIN tenRows AS b
CROSS JOIN tenRows AS c
) ,
millionRows AS (
SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS n
FROM thousandRows AS a
CROSS JOIN thousandRows AS b
)
SELECT n
FROM millionRows
This generates one million numbers in about 400 ms on my laptop.
Good luck with the competition!
-- Gianluca Sartori
March 10, 2010 at 6:27 am
Gianluca Sartori (3/10/2010)
This generates one million numbers in about 400 ms on my laptop.
Good luck with the competition!
Interesting , ill look at the unpivot.
The competition has closed long ago , this was just on my 'todo' list, ei It may crop up in production and i want a decent fix / workaround.
May 10, 2010 at 9:44 am
Congrats on winning challenge 19!!!
____________________________________________________
Deja View - The strange feeling that somewhere, sometime you've optimised this query before
How to get the best help on a forum
http://www.sqlservercentral.com/articles/Best+Practices/61537May 11, 2010 at 1:33 am
Mark-101232 (5/10/2010)
Congrats on winning challenge 19!!!
Thanks , Id like to thank the academy and wish for world peace 🙂
Viewing 8 posts - 1 through 7 (of 7 total)
You must be logged in to reply to this topic. Login to reply