Slow Row Number Over Partition

  • Hello

    I have a simple table with 4 columns (idAuxiliarPF(BIGINT+PK), pf(BIGINT+FK), Data(DateTime), Descr(NVARCHAR))that has aprox. 50k rows.

    I need to create a partition of the data to join to another table, the query that i have:

    SELECT

    ROW_NUMBER() OVER (PARTITION BY pf ORDER BY Data DESC, idAuxiliarPF DESC) AS RN,

    pf,

    Data,

    Descr

    FROM dbo.PFAuxiliar

    WHERE Data <= GETDATE()

    This query takes around 40 seconds to return the results

    If i remove the Descr column, the query it takes no time.

    SELECT

    ROW_NUMBER() OVER (PARTITION BY pf ORDER BY Data DESC, idAuxiliarPF DESC) AS RN,

    pf,

    Data

    FROM dbo.PFAuxiliar

    WHERE Data <= GETDATE()

    I have two indexes, Clustered (idAuxiliarPF), NONClustered(pf).

    How can i improve the performance of this query?

    Thanks

  • What's the size of the Decsr field? NVARCHAR(??) At a guess it's MAX & the difference is because you're returning significantly more data.

    When you say partition to join with another table, what do you mean? You might find joining directly more efficient.

    Could you provide the full query?

    Thanks

  • It's a virtual certainty that the clustering key should be pf, not a meaningless id. That would also give better performance across the board on that table. Unless of course pf is highly variable and you can't afford to reorg/rebuild the table as often as needed.

    Edit: With the key as it is, SQL must do a sort to get the rows in pf order first. If pf were the clustering key, that would not be the case. SQL will still likely need a sort because of the ORDER BY columns, but the sort should be vastly less overhead.

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • @Gazareth

    Yes it's MAX, this is a aux table with some comments to every pf at another table.

    At this point i need to return every row, it's kind of report with every comment that the user added for each pf in the master table.

    The join it's a simple Left Join:

    LEFT OUTER JOIN (SELECT ROW_NUMBER() OVER (PARTITION BY pf ORDER BY Data DESC, idAuxiliarPF DESC) AS RN,

    pf ,

    Data ,

    Descr

    FROM dbo.PFAuxiliar

    WHERE Data <= GETDATE()) PFAE

    ON PFAE.pf = PF.idPF

    @scottpletcher

    I didnt use the pf as key, because it's not unique, i can have several comments for each pf...

    Only another thing, if i run the query without the ROW_NUMBER and OVER PARTITION, the query doesn't take 1 second to run.

    SELECT pf ,

    Data ,

    Descr

    FROM dbo.PFAuxiliar

    WHERE Data <= GETDATE()

    Thanks for your time.

  • Found a solution/workaround

    If the complete query takes no time, just mix the two querys:

    SELECT pf ,

    Data ,

    Descr

    FROM dbo.PFAuxiliar PFA1 INNER JOIN (

    SELECT ROW_NUMBER() OVER (PARTITION BY pf ORDER BY Data DESC,idAuxiliarPF DESC) AS RN,idAuxiliarPF

    FROM dbo.PFAuxiliar WHERE Data <= GETDATE() AND Descr IS NOT NULL ) PFA2

    ON PFA2.idAuxiliarPF = PFA1.idAuxiliarPF

    And this takes less than a second to return the results.

    Anyway i'd love to understand with the column descr in combination with the row number and partition, takes so much time to return the values.

    Thanks

    *Edited* - Column name in Where clause was wrong.

  • rootfixxxer (7/29/2015)

    @scottpletcher

    I didnt use the pf as key, because it's not unique, i can have several comments for each pf...

    Not a problem. But, if you prefer, add the id after the pf to make the key unique. SQL does prefer unique clustering keys, especially, but it's more critical to get the best clustering key than to worry about whether it has dups or not, that's a secondary concern.

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • Just for testing i tried your suggestion, but the time it's almost the same, so in this case, at least with this amount of rows the only difference that i see it's the value of logical reads, and at this level it doesn't affect the total time.

    ---------Test Table

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 0 ms.

    (35557 row(s) affected)

    Table '#TesteTable'. Scan count 5, logical reads 455, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 327 ms, elapsed time = 27715 ms.

    --------Original Table

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 0 ms.

    (35557 row(s) affected)

    Table 'PFAuxiliar'. Scan count 5, logical reads 820, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    (1 row(s) affected)

    SQL Server Execution Times:

    CPU time = 360 ms, elapsed time = 27962 ms

  • rootfixxxer (7/29/2015)


    Just for testing i tried your suggestion, but the time it's almost the same, so in this case, at least with this amount of rows the only difference that i see it's the value of logical reads, and at this level it doesn't affect the total time.

    ---------Test Table

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 0 ms.

    (35557 row(s) affected)

    Table '#TesteTable'. Scan count 5, logical reads 455, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 327 ms, elapsed time = 27715 ms.

    --------Original Table

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 0 ms.

    (35557 row(s) affected)

    Table 'PFAuxiliar'. Scan count 5, logical reads 820, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    (1 row(s) affected)

    SQL Server Execution Times:

    CPU time = 360 ms, elapsed time = 27962 ms

    Logical reads reduced 45%. I'm virtually certain it would help your other queries against this table also. Naturally it's up to you, but the single most critical thing for performance is to get the best clustered index on every table.

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • No doubt of that.

    But like i said before, even if change the index and the logical reads decrease about 50%, the time is the same.

    And if i use the workaround, even if i use only the id as clustered index, the time that it takes to process it's near zero.

    So, i will maintain the index as it is, because i need it like that for other queries that use the ID (updates, deletes and selects).

    The only thing that now i'm trying to understand is why the workaround is faster than the original code, how sql server engine processes the query, that when the query asks for another column (nvarchar column), it takes some much time to return?! Size MAX?!

    Thanks

  • rootfixxxer (7/29/2015)


    No doubt of that.

    But like i said before, even if change the index and the logical reads decrease about 50%, the time is the same.

    And if i use the workaround, even if i use only the id as clustered index, the time that it takes to process it's near zero.

    So, i will maintain the index as it is, because i need it like that for other queries that use the ID (updates, deletes and selects).

    The only thing that now i'm trying to understand is why the workaround is faster than the original code, how sql server engine processes the query, that when the query asks for another column (nvarchar column), it takes some much time to return?! Size MAX?!

    Thanks

    Raw clock time is not necessarily accurate to go by. There could be a blocking wait that occurs for one run and not another. For any query that does a significant amount of I/Os, almost always reducing logical I/Os will be the key to good performance.

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • The time was the main problem here. 🙂

    I decided to make some tests, the tests were in the table with the clustered index as you suggested, the first two with the original query, the 3rd and 4th with the workaround, the results:

    Query Profile Statistics - Everything the same

    Network Statistics - Bytes Sent difference in size of the querys, Bytes received a little difference but it's insignificant

    Time Statistics - Huge difference in the processing time

    I don't say that your option it's wrong, in fact it reduces the logical IO, but like i wrote i need that the clustered index stay in the id, and the main concern here it's the time, and looking for one and for the other...

    Thanks

  • Unless I'm missing something, surely your entire query is just this?

    SELECT pf ,

    Data ,

    Descr

    FROM dbo.PFAuxiliar PFA1

    WHERE Data <= GETDATE()

    AND Descricao IS NOT NULL

  • @Gazareth

    Yes and no.

    Yes, because i need everything, no because i also need to get partitions by the column pf with the row number.

    BTW, there was an error in the column name in the where clause, should be Descr instead of Descricao.

    Already corrected in my post.

  • So this then?

    SELECT pf ,

    Data ,

    Descr,

    ROW_NUMBER() OVER (PARTITION BY pf ORDER BY Data DESC,idAuxiliarPF DESC)

    FROM dbo.PFAuxiliar PFA1

    WHERE Data <= GETDATE()

    AND Descr IS NOT NULL

  • @Gazareth

    Yes

Viewing 15 posts - 1 through 14 (of 14 total)

You must be logged in to reply to this topic. Login to reply