July 28, 2015 at 10:56 am
Hello
I have a simple table with 4 columns (idAuxiliarPF(BIGINT+PK), pf(BIGINT+FK), Data(DateTime), Descr(NVARCHAR))that has aprox. 50k rows.
I need to create a partition of the data to join to another table, the query that i have:
SELECT
ROW_NUMBER() OVER (PARTITION BY pf ORDER BY Data DESC, idAuxiliarPF DESC) AS RN,
pf,
Data,
Descr
FROM dbo.PFAuxiliar
WHERE Data <= GETDATE()
This query takes around 40 seconds to return the results
If i remove the Descr column, the query it takes no time.
SELECT
ROW_NUMBER() OVER (PARTITION BY pf ORDER BY Data DESC, idAuxiliarPF DESC) AS RN,
pf,
Data
FROM dbo.PFAuxiliar
WHERE Data <= GETDATE()
I have two indexes, Clustered (idAuxiliarPF), NONClustered(pf).
How can i improve the performance of this query?
Thanks
July 28, 2015 at 11:11 am
What's the size of the Decsr field? NVARCHAR(??) At a guess it's MAX & the difference is because you're returning significantly more data.
When you say partition to join with another table, what do you mean? You might find joining directly more efficient.
Could you provide the full query?
Thanks
July 28, 2015 at 1:09 pm
It's a virtual certainty that the clustering key should be pf, not a meaningless id. That would also give better performance across the board on that table. Unless of course pf is highly variable and you can't afford to reorg/rebuild the table as often as needed.
Edit: With the key as it is, SQL must do a sort to get the rows in pf order first. If pf were the clustering key, that would not be the case. SQL will still likely need a sort because of the ORDER BY columns, but the sort should be vastly less overhead.
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
July 29, 2015 at 2:25 am
Yes it's MAX, this is a aux table with some comments to every pf at another table.
At this point i need to return every row, it's kind of report with every comment that the user added for each pf in the master table.
The join it's a simple Left Join:
LEFT OUTER JOIN (SELECT ROW_NUMBER() OVER (PARTITION BY pf ORDER BY Data DESC, idAuxiliarPF DESC) AS RN,
pf ,
Data ,
Descr
FROM dbo.PFAuxiliar
WHERE Data <= GETDATE()) PFAE
ON PFAE.pf = PF.idPF
I didnt use the pf as key, because it's not unique, i can have several comments for each pf...
Only another thing, if i run the query without the ROW_NUMBER and OVER PARTITION, the query doesn't take 1 second to run.
SELECT pf ,
Data ,
Descr
FROM dbo.PFAuxiliar
WHERE Data <= GETDATE()
Thanks for your time.
July 29, 2015 at 5:36 am
Found a solution/workaround
If the complete query takes no time, just mix the two querys:
SELECT pf ,
Data ,
Descr
FROM dbo.PFAuxiliar PFA1 INNER JOIN (
SELECT ROW_NUMBER() OVER (PARTITION BY pf ORDER BY Data DESC,idAuxiliarPF DESC) AS RN,idAuxiliarPF
FROM dbo.PFAuxiliar WHERE Data <= GETDATE() AND Descr IS NOT NULL ) PFA2
ON PFA2.idAuxiliarPF = PFA1.idAuxiliarPF
And this takes less than a second to return the results.
Anyway i'd love to understand with the column descr in combination with the row number and partition, takes so much time to return the values.
Thanks
*Edited* - Column name in Where clause was wrong.
July 29, 2015 at 6:43 am
rootfixxxer (7/29/2015)
I didnt use the pf as key, because it's not unique, i can have several comments for each pf...
Not a problem. But, if you prefer, add the id after the pf to make the key unique. SQL does prefer unique clustering keys, especially, but it's more critical to get the best clustering key than to worry about whether it has dups or not, that's a secondary concern.
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
July 29, 2015 at 8:56 am
Just for testing i tried your suggestion, but the time it's almost the same, so in this case, at least with this amount of rows the only difference that i see it's the value of logical reads, and at this level it doesn't affect the total time.
---------Test Table
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
(35557 row(s) affected)
Table '#TesteTable'. Scan count 5, logical reads 455, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 327 ms, elapsed time = 27715 ms.
--------Original Table
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
(35557 row(s) affected)
Table 'PFAuxiliar'. Scan count 5, logical reads 820, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
(1 row(s) affected)
SQL Server Execution Times:
CPU time = 360 ms, elapsed time = 27962 ms
July 29, 2015 at 9:01 am
rootfixxxer (7/29/2015)
Just for testing i tried your suggestion, but the time it's almost the same, so in this case, at least with this amount of rows the only difference that i see it's the value of logical reads, and at this level it doesn't affect the total time.---------Test Table
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
(35557 row(s) affected)
Table '#TesteTable'. Scan count 5, logical reads 455, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 327 ms, elapsed time = 27715 ms.
--------Original Table
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
(35557 row(s) affected)
Table 'PFAuxiliar'. Scan count 5, logical reads 820, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
(1 row(s) affected)
SQL Server Execution Times:
CPU time = 360 ms, elapsed time = 27962 ms
Logical reads reduced 45%. I'm virtually certain it would help your other queries against this table also. Naturally it's up to you, but the single most critical thing for performance is to get the best clustered index on every table.
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
July 29, 2015 at 10:05 am
No doubt of that.
But like i said before, even if change the index and the logical reads decrease about 50%, the time is the same.
And if i use the workaround, even if i use only the id as clustered index, the time that it takes to process it's near zero.
So, i will maintain the index as it is, because i need it like that for other queries that use the ID (updates, deletes and selects).
The only thing that now i'm trying to understand is why the workaround is faster than the original code, how sql server engine processes the query, that when the query asks for another column (nvarchar column), it takes some much time to return?! Size MAX?!
Thanks
July 29, 2015 at 11:48 am
rootfixxxer (7/29/2015)
No doubt of that.But like i said before, even if change the index and the logical reads decrease about 50%, the time is the same.
And if i use the workaround, even if i use only the id as clustered index, the time that it takes to process it's near zero.
So, i will maintain the index as it is, because i need it like that for other queries that use the ID (updates, deletes and selects).
The only thing that now i'm trying to understand is why the workaround is faster than the original code, how sql server engine processes the query, that when the query asks for another column (nvarchar column), it takes some much time to return?! Size MAX?!
Thanks
Raw clock time is not necessarily accurate to go by. There could be a blocking wait that occurs for one run and not another. For any query that does a significant amount of I/Os, almost always reducing logical I/Os will be the key to good performance.
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
July 30, 2015 at 3:36 am
The time was the main problem here. 🙂
I decided to make some tests, the tests were in the table with the clustered index as you suggested, the first two with the original query, the 3rd and 4th with the workaround, the results:
Query Profile Statistics - Everything the same
Network Statistics - Bytes Sent difference in size of the querys, Bytes received a little difference but it's insignificant
Time Statistics - Huge difference in the processing time
I don't say that your option it's wrong, in fact it reduces the logical IO, but like i wrote i need that the clustered index stay in the id, and the main concern here it's the time, and looking for one and for the other...
Thanks
July 30, 2015 at 6:41 am
Unless I'm missing something, surely your entire query is just this?
SELECT pf ,
Data ,
Descr
FROM dbo.PFAuxiliar PFA1
WHERE Data <= GETDATE()
AND Descricao IS NOT NULL
July 30, 2015 at 11:47 am
So this then?
SELECT pf ,
Data ,
Descr,
ROW_NUMBER() OVER (PARTITION BY pf ORDER BY Data DESC,idAuxiliarPF DESC)
FROM dbo.PFAuxiliar PFA1
WHERE Data <= GETDATE()
AND Descr IS NOT NULL
Viewing 15 posts - 1 through 14 (of 14 total)
You must be logged in to reply to this topic. Login to reply