Slow Row Number Over Partition

Question

Slow Row Number Over Partition

rootfixxxer

SSCrazy

Points: 2918
More actions
July 28, 2015 at 10:56 am

#306194

Hello
I have a simple table with 4 columns (idAuxiliarPF(BIGINT+PK), pf(BIGINT+FK), Data(DateTime), Descr(NVARCHAR))that has aprox. 50k rows.
I need to create a partition of the data to join to another table, the query that i have:
SELECT
ROW_NUMBER() OVER (PARTITION BY pf ORDER BY Data DESC, idAuxiliarPF DESC) AS RN,
pf,
Data,
Descr
FROM dbo.PFAuxiliar
WHERE Data <= GETDATE()
This query takes around 40 seconds to return the results
If i remove the Descr column, the query it takes no time.
SELECT
ROW_NUMBER() OVER (PARTITION BY pf ORDER BY Data DESC, idAuxiliarPF DESC) AS RN,
pf,
Data
FROM dbo.PFAuxiliar
WHERE Data <= GETDATE()
I have two indexes, Clustered (idAuxiliarPF), NONClustered(pf).
How can i improve the performance of this query?
Thanks

Viewing 15 posts - 1 through 15 (of 15 total)

You must be logged in to reply to this topic. Login to reply

Gazareth One Orange Chip Points: 27737 More actions · Answer 1

What's the size of the Decsr field? NVARCHAR(??) At a guess it's MAX & the difference is because you're returning significantly more data.

When you say partition to join with another table, what do you mean? You might find joining directly more efficient.

Could you provide the full query?

Thanks

ScottPletcher SSC Guru Points: 101223 More actions · Answer 2

It's a virtual certainty that the clustering key should be pf, not a meaningless id. That would also give better performance across the board on that table. Unless of course pf is highly variable and you can't afford to reorg/rebuild the table as often as needed.

Edit: With the key as it is, SQL must do a sort to get the rows in pf order first. If pf were the clustering key, that would not be the case. SQL will still likely need a sort because of the ORDER BY columns, but the sort should be vastly less overhead.

SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

rootfixxxer SSCrazy Points: 2918 More actions · Answer 3

@Gazareth

Yes it's MAX, this is a aux table with some comments to every pf at another table.

At this point i need to return every row, it's kind of report with every comment that the user added for each pf in the master table.

The join it's a simple Left Join:

LEFT OUTER JOIN (SELECT ROW_NUMBER() OVER (PARTITION BY pf ORDER BY Data DESC, idAuxiliarPF DESC) AS RN,

pf ,

Data ,

Descr

FROM dbo.PFAuxiliar

WHERE Data <= GETDATE()) PFAE

ON PFAE.pf = PF.idPF

@ScottPletcher

I didnt use the pf as key, because it's not unique, i can have several comments for each pf...

Only another thing, if i run the query without the ROW_NUMBER and OVER PARTITION, the query doesn't take 1 second to run.

SELECT pf ,

Data ,

Descr

FROM dbo.PFAuxiliar

WHERE Data <= GETDATE()

Thanks for your time.

rootfixxxer SSCrazy Points: 2918 More actions · Answer 4

Found a solution/workaround

If the complete query takes no time, just mix the two querys:

SELECT pf ,

Data ,

Descr

FROM dbo.PFAuxiliar PFA1 INNER JOIN (

SELECT ROW_NUMBER() OVER (PARTITION BY pf ORDER BY Data DESC,idAuxiliarPF DESC) AS RN,idAuxiliarPF

FROM dbo.PFAuxiliar WHERE Data <= GETDATE() AND Descr IS NOT NULL ) PFA2

ON PFA2.idAuxiliarPF = PFA1.idAuxiliarPF

And this takes less than a second to return the results.

Anyway i'd love to understand with the column descr in combination with the row number and partition, takes so much time to return the values.

Thanks

*Edited* - Column name in Where clause was wrong.

ScottPletcher SSC Guru Points: 101223 More actions · Answer 5

rootfixxxer (7/29/2015)
@ScottPletcher
I didnt use the pf as key, because it's not unique, i can have several comments for each pf...

Not a problem. But, if you prefer, add the id after the pf to make the key unique. SQL does prefer unique clustering keys, especially, but it's more critical to get the best clustering key than to worry about whether it has dups or not, that's a secondary concern.

SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

rootfixxxer SSCrazy Points: 2918 More actions · Answer 6

Just for testing i tried your suggestion, but the time it's almost the same, so in this case, at least with this amount of rows the only difference that i see it's the value of logical reads, and at this level it doesn't affect the total time.

---------Test Table

SQL Server Execution Times:

CPU time = 0 ms, elapsed time = 0 ms.

(35557 row(s) affected)

Table '#TesteTable'. Scan count 5, logical reads 455, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:

CPU time = 327 ms, elapsed time = 27715 ms.

--------Original Table

SQL Server Execution Times:

CPU time = 0 ms, elapsed time = 0 ms.

(35557 row(s) affected)

Table 'PFAuxiliar'. Scan count 5, logical reads 820, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(1 row(s) affected)

SQL Server Execution Times:

CPU time = 360 ms, elapsed time = 27962 ms

ScottPletcher SSC Guru Points: 101223 More actions · Answer 7

rootfixxxer (7/29/2015)
Just for testing i tried your suggestion, but the time it's almost the same, so in this case, at least with this amount of rows the only difference that i see it's the value of logical reads, and at this level it doesn't affect the total time.
---------Test Table
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
(35557 row(s) affected)
Table '#TesteTable'. Scan count 5, logical reads 455, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 327 ms, elapsed time = 27715 ms.
--------Original Table
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
(35557 row(s) affected)
Table 'PFAuxiliar'. Scan count 5, logical reads 820, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
(1 row(s) affected)
SQL Server Execution Times:
CPU time = 360 ms, elapsed time = 27962 ms

Logical reads reduced 45%. I'm virtually certain it would help your other queries against this table also. Naturally it's up to you, but the single most critical thing for performance is to get the best clustered index on every table.

SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

rootfixxxer SSCrazy Points: 2918 More actions · Answer 8

No doubt of that.

But like i said before, even if change the index and the logical reads decrease about 50%, the time is the same.

And if i use the workaround, even if i use only the id as clustered index, the time that it takes to process it's near zero.

So, i will maintain the index as it is, because i need it like that for other queries that use the ID (updates, deletes and selects).

The only thing that now i'm trying to understand is why the workaround is faster than the original code, how sql server engine processes the query, that when the query asks for another column (nvarchar column), it takes some much time to return?! Size MAX?!

Thanks

ScottPletcher SSC Guru Points: 101223 More actions · Answer 9

rootfixxxer (7/29/2015)
No doubt of that.
But like i said before, even if change the index and the logical reads decrease about 50%, the time is the same.
And if i use the workaround, even if i use only the id as clustered index, the time that it takes to process it's near zero.
So, i will maintain the index as it is, because i need it like that for other queries that use the ID (updates, deletes and selects).
The only thing that now i'm trying to understand is why the workaround is faster than the original code, how sql server engine processes the query, that when the query asks for another column (nvarchar column), it takes some much time to return?! Size MAX?!
Thanks

Raw clock time is not necessarily accurate to go by. There could be a blocking wait that occurs for one run and not another. For any query that does a significant amount of I/Os, almost always reducing logical I/Os will be the key to good performance.

SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

rootfixxxer SSCrazy Points: 2918 More actions · Answer 10

The time was the main problem here. 🙂

I decided to make some tests, the tests were in the table with the clustered index as you suggested, the first two with the original query, the 3rd and 4th with the workaround, the results:

Query Profile Statistics - Everything the same

Network Statistics - Bytes Sent difference in size of the querys, Bytes received a little difference but it's insignificant

Time Statistics - Huge difference in the processing time

I don't say that your option it's wrong, in fact it reduces the logical IO, but like i wrote i need that the clustered index stay in the id, and the main concern here it's the time, and looking for one and for the other...

Thanks

Gazareth One Orange Chip Points: 27737 More actions · Answer 11

Unless I'm missing something, surely your entire query is just this?

SELECT pf ,

Data ,

Descr

FROM dbo.PFAuxiliar PFA1

WHERE Data <= GETDATE()

AND Descricao IS NOT NULL

rootfixxxer SSCrazy Points: 2918 More actions · Answer 12

@Gazareth

Yes and no.

Yes, because i need everything, no because i also need to get partitions by the column pf with the row number.

BTW, there was an error in the column name in the where clause, should be Descr instead of Descricao.

Already corrected in my post.

Gazareth One Orange Chip Points: 27737 More actions · Answer 13

So this then?

SELECT pf ,

Data ,

Descr,

ROW_NUMBER() OVER (PARTITION BY pf ORDER BY Data DESC,idAuxiliarPF DESC)

FROM dbo.PFAuxiliar PFA1

WHERE Data <= GETDATE()

AND Descr IS NOT NULL

rootfixxxer SSCrazy Points: 2918 More actions · Answer 14

rootfixxxer

SSCrazy

Points: 2918

July 31, 2015 at 2:13 am

#1816372

@Gazareth

Yes