August 15, 2011 at 2:09 pm
I'm doing something similar to the below example:
DECLARE @RowStart INT
DECLARE @RowEnd INT
SET @RowStart = 1
SET @RowEnd = 100
;WITH CTE AS
(
SELECT a,
b,
c,
ROW_NUMBER() OVER(ORDER BY c) AS RowNumber
FROM table1
)
SELECT a,
b,
c
FROM CTE
WHERE RowNumber BETWEEN @RowStart AND @RowEnd
I had this working fine as a stored procedure until I was told I needed to output the total row count, or the number of records that would have been returned with no paging. I searched and searched and could not find a solution that worked. I'm not able to set the the total row count and return the data from the select statement when using a CTE. My solution to this was to dump the CTE idea and instead use a temp table to get a row count and then return the data with paging, similar to what's below.
SELECT a,
b,
c,
ROW_NUMBER() OVER(ORDER BY c) AS RowNumber
INTO #table1
SET @TotalRows = COUNT(*)
FROM #table1
SELECT a,
b,
c
FROM #table1
WHERE RowNumber BETWEEN @RowStart AND @RowEnd
It works the way I want it to, it just doesn't perform that great. I would like to get it working with a CTE and then compare execution plans and IO and go with the one that performs better. If I could get it working as a CTE or some other alternative solution that I'm missing that would be great.
Thanks for any and all help.
August 15, 2011 at 2:52 pm
i just sat in on a SQLSaturday #79 in Davie(Fort Lauderdale) presentation on how to improve performance in exactly this situation;
the Presenter was Dimitri, and he had a great example how he added indexes and CTE's untill the join of 20 primariy keys from the row number CTE was joined ont eh bigger(millionrowtable?) in order to get the best performance.
i looked on the sql saturday 79 page, but didn't find links to his presentation, nore in the schedule...i think his presentation was a late add on.
http://www.sqlsaturday.com/79/eventhome.aspx
here's an example using sys.objects, which i hope is understandable. the idea is cross join the total, and use a CTE to limit the rows to just stuff that exists in an index, before finally joining to get the rest of the data, so you get , say just 100 rows, isntead of a zillion rows that get limited to 100
--declare our variables
DECLARE @RowStart INT
DECLARE @RowEnd INT
SET @RowStart = 1
SET @RowEnd = 100
--get the total count
WITH TotalCount AS
(
SELECT COUNT(*) AS TotalCount FROM sys.objects objz
),
--for performance reasons, get the PK and the x number of rows we want to get.
--the ORDER BY is important: if we are Ordering by a differnet column, make sure there is an
--index that has the PK and the Ordering column
AllRows AS
(
SELECT
objz.OBJECT_ID,
ROW_NUMBER() OVER (ORDER BY objz.name ) AS RW
FROM sys.objects objz
)
--i think Dimitri had an additional CTE that just selected the rows in the row number between start and end...here i just grab them all
SELECT
AllRows.*,
objz.*,
TotalCount.*
FROM AllRows
INNER JOIN sys.objects objz
CROSS JOIN TotalCount
ON Allrows.object_id = objz.object_id
WHERE AllRows.RW BETWEEN @RowStart AND @RowEnd
the key is looking at the execution plan, so you can see the final join is just 100 rows, isntead of a lot more
Lowell
August 15, 2011 at 3:03 pm
Interesting solution. I'll have to try and wrap my head around that one. Not seeing how I would return the total count as an output parameter though? Wish I could see that presentation too, maybe they'll put it up later.
August 16, 2011 at 2:39 am
Is there any problem with just using @@ROWCOUNT ?
DECLARE @RowStart INT
DECLARE @RowEnd INT
SET @RowStart = 1
SET @RowEnd = 100
;WITH CTE AS
(
SELECT *, ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS RowNumber
FROM sys.objects
)
SELECT * FROM CTE
WHERE RowNumber BETWEEN @RowStart AND @RowEnd
SELECT @@ROWCOUNT
--edit--
Obviously, use Lowell's solution for the rest of the query - if it adds any performance gains. I haven't seen it before which is why I haven't used it in the example above - want to get my head around it first!.
August 16, 2011 at 6:49 am
Cadavre (8/16/2011)
Is there any problem with just using @@ROWCOUNT ?
DECLARE @RowStart INT
DECLARE @RowEnd INT
SET @RowStart = 1
SET @RowEnd = 100
;WITH CTE AS
(
SELECT *, ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS RowNumber
FROM sys.objects
)
SELECT * FROM CTE
WHERE RowNumber BETWEEN @RowStart AND @RowEnd
SELECT @@ROWCOUNT
--edit--
Obviously, use Lowell's solution for the rest of the query - if it adds any performance gains. I haven't seen it before which is why I haven't used it in the example above - want to get my head around it first!.
Just a small problem with using @@rowcount here...
It will not return what was required (total number of records before pagiong) but the number of records in the page.
In your example, if database have more than 50 user defined objects, instead of SELECT @@ROWCOUNT you could just do SELECT 100 :-D.
I would bodify offered solution to count total number of rows out of CTE, just to safe some code lines π
--declare our variables
DECLARE @RowStart INT
DECLARE @RowEnd INT
SET @RowStart = 1
SET @RowEnd = 100
--get the records
;WITH Recs AS
(
SELECT *, ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) RN -- or whatever requried
FROM sys.objects
)
--Get the total OUT of full resultset
,Cnt AS ( SELECT COUNT(*) TtlCnt FROM Recs )
SELECT R.*, C.TtlCnt
FROM Recs R CROSS JOIN Cnt C
WHERE R.RN BETWEEN @RowStart AND @RowEnd
Also, I'm not sure if selecting just ID in CTE and joining it back would anyhow increase performance... Actually it's opposit! You will end up with more reads, as CTE is just a stylish wrapper for subquery. So, if you select ID's then join back to sys.objects SQL Server will perform much more reads then in my example. (~600 vs ~200).
August 16, 2011 at 7:08 am
Eugene has it right, it returns the number of records that were returned after paging. I need to know how many would have been returned had there been no paging.
I know you were just joking but I hardcoded those values (1 and 100), the calling application will send me a page number and a page size and then I have to calculate how many results to return. It's for a web search app so the user knows they are seeing results 1-100 of 1000, or 201-300 of 1000, or 1-5 of 10 etc. It really could be anything...just a background on the query in question.
August 16, 2011 at 7:14 am
Eugene, so the count is returned as a column in the resultset? I already tried something similar and it wasn't an accepted solution...:(
I never understood why not, but I'm not a web developer, so they may have their reasons.
August 16, 2011 at 7:25 am
Eugene Elutin (8/16/2011)
Just a small problem with using @@rowcount here...It will not return what was required (total number of records before pagiong) but the number of records in the page.
Ah, misread the OP.
I guess to fit the multiple result-sets you'd need a temp table then.
bwoulfe (8/16/2011)
Eugene, so the count is returned as a column in the resultset? I already tried something similar and it wasn't an accepted solution...:(I never understood why not, but I'm not a web developer, so they may have their reasons.
I had a .Net developer say the same to me awhile ago (although the requirements were slightly different - instead he wanted the whole result-set returned but wanted to know the size of the result-set). After we discussed it for awhile, we added a WCF component as a middle-man that obtained the entire result-set including a separate column that contained the number of records returned. The WCF component then passed this data to the front-end application before parcelling out the result-set in the way it was required.
Eugene Elutin (8/16/2011)
Also, I'm not sure if selecting just ID in CTE and joining it back would anyhow increase performance... Actually it's opposit! You will end up with more reads, as CTE is just a stylish wrapper for subquery. So, if you select ID's then join back to sys.objects SQL Server will perform much more reads then in my example. (~600 vs ~200).
I thought that would be the case, but haven't had time to test Lowell's code. Hoping that the presentation gets posted online somewhere, would be interested to see how it worked!
August 16, 2011 at 7:34 am
How you want your count to be returned then?
Returning the count in the column would be the best option, client app can read this value from first row and then ignore this column completely.
You can return it as output parameter of stored proc or in a separate resultset. But then I whould advice to use temp table (example attached).
What you cannot do is: to return resultset of 100 rows but tell the client app that its recordscount is different π
--declare our variables
DECLARE @RowStart INT
DECLARE @RowEnd INT
SET @RowStart = 1
SET @RowEnd = 100
--get the records
SELECT [object_id], ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) RN -- or whatever requried
INTO #Recs
FROM sys.objects
SELECT R.*
FROM Recs R
WHERE R.RN BETWEEN @RowStart AND @RowEnd
-- Option 1. return total count as stored proc return parameter
-- (you can do it via output parameter as well)
DECLARE @r INT
SELECT @r = COUNT(*) FROM #Recs
RETURN @r
-- Option 2. return total count in a separte resultset
SELECT COUNT(*) AS TotalRecCount FROM #Recs
And Yes, using temp table in this case is fine from performance perspective...
August 16, 2011 at 7:53 am
They want it returned with an OUTPUT parameter from the stored procedure. It's setup with temp table right now and the performance is just ok. I may just need to stick with a temp table but rewrite it. The stored procedure is over 1000 lines of code so I've been dreading doing that.
Thanks for the responses everyone, I think it's back to the drawing board though. I'm going to try the count returned in a column solution with them again, see if that might work this time....:-P
August 16, 2011 at 8:05 am
bwoulfe (8/16/2011)
...It's setup with temp table right now and the performance is just ok. I may just need to stick with a temp table but rewrite it...
Using temp tables is fine and may be even more beneficial then CTE sometimes.
Just few tips for rewritting:
1. Use SELECT ... INTO to create and populate temp table
2. Consider creating appropriate indexes on temp tables if they used multiple times in joins.
August 16, 2011 at 8:18 am
Eugene Elutin (8/16/2011)
bwoulfe (8/16/2011)
...It's setup with temp table right now and the performance is just ok. I may just need to stick with a temp table but rewrite it...Using temp tables is fine and may be even more beneficial then CTE sometimes.
Just few tips for rewritting:
1. Use SELECT ... INTO to create and populate temp table
2. Consider creating appropriate indexes on temp tables if they used multiple times in joins.
I'm using INSERT INTO, just so I can see my column names and reference them later if a problem occurs (this is still in dev). Any reason SELECT...INTO is preferred?
I do all the inserts, then add my indexes, then return the results. There are about 4 or 5 inserts into the temp table though, maybe reordering those steps might help?
August 16, 2011 at 8:35 am
SELECT Col1 AS Column1
,Col2 AS Column2
.....
INTO #t
You can see that you can name your columns quite easely. You can also define required datatype by casting the value to it (eg. SELECT CAST(SomeInt AS VARCHAR(11)) AS ...)
Benefits in performance...
You can read quite a lot about it π
Just few links you may find interesting:
http://stackoverflow.com/questions/1302670/sql-server-select-into-and-blocking-with-temp-tables
http://msdn.microsoft.com/en-us/library/dd425070(v=sql.100).aspx
Basically it's one of the "bulk load" methods within SQL Server so gives you the fasterst possible insert speed in T-SQL.
If data you inserting is naturally distinct you can insert all of your data at once using UNION ALL statement, something like:
SELECT q1.col1, q1.Col2, ...
INTO #t
FROM q1
UNION ALL
SELECT q2.col1, q2.Col2, ...
FROM q2
UNION ALL
SELECT q3.col1, q3.Col2, ...
FROM q3
....
If needs to be deduplicated any way you may test if using UNION alone will be good enough.
August 16, 2011 at 9:06 am
Lowell (8/15/2011)
i just sat in on a SQLSaturday #79 in Davie(Fort Lauderdale) presentation on how to improve performance in exactly this situation;the Presenter was Dimitri, and he had a great example how he added indexes and CTE's untill the join of 20 primariy keys from the row number CTE was joined ont eh bigger(millionrowtable?) in order to get the best performance.
Was it the key seek method, explained elegantly by Paul White here[/url]?
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
August 16, 2011 at 9:13 am
that post from Paul exactly encapsulates that portion of his presentation Chris, yes.
Β» Dmitri Korotkevitch's web site, now that i grabbed my notes, is http://aboutsqlserver.com but he has not posted his presentation materials from it yet.
Lowell
Viewing 15 posts - 1 through 15 (of 16 total)
You must be logged in to reply to this topic. Login to reply