Getting the minimum and the maximum values in a large table

  • I would be grateful for some advice with the following.

    Table definition:

    Create table code (

    id identity(1,1)

    code

    parentcode

    internalreference)

    There are other columns but I have omitted them for clarity.

    The clustered index is on the ID.

    There are indexes on the code, parentcode and internalreference columns.

    The problem is the table stores a parentcode with an internalreference and around 2000 codes which are children of the parentcode. I realise the table is very badly designed, but the company love orms!!

    Example:

    ID| Code| ParentCode| InternalReference|

    1 | M111| NULL | 1|

    2 | AAA | M111 | 2|

    3 | .... | .... | ....|

    4 | AAB | M111 | 2000|

    5 | M222 | NULL | 2001|

    6 | ZZZ | M222 | 2002|

    7 | .... | .... | .... |

    8 | ZZA | M222 | 4000|

    The table currently holds around 300 millions rows.

    The application does the following two queries to find the first internalreference of a code and the last internalrefernce of a code:

    --Find first internalrefernce

    SELECT TOP 1 ID, InternalReference

    FROM code

    WHERE ParentCode = 'M222'

    Order By InternalReference

    -- Find last ineternalreference

    SELECT TOP 1 ID, InternalReference

    FROM code

    WHERE ParentCode = 'M222'

    Order By InternalReference DESC

    These queries are running for a very long time, only because of the sort. If I run the query without the sort, then they return the results instantly, but obviously this doesn't help me find the first and last internalreference for a parentCode.

    I realise the best way to fix this is to redesign the table, but I cannot do that at this time.

    Is there a better way to do this so that two queries which individually run very slowly, can be combined into one that is more efficient?

    Thanks.

  • Sure...the title of your thread pretty much gave you the solution.

    SELECT ID,

    MIN(InternalReference) as MinInternalReference,

    MAX(InternalReference) as MaxInternalReference

    FROM code

    WHERE ParentCode = 'M222'

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • Thanks. That doesn't work unfortunately and still performs just as badly.

    Just talking this problem over on this forum, I think I have solved it by using a temporary table variable and extracting the rows I need into that, then doing the selects required from there. Works pretty much instantly now.

  • The following should work to bring back a single ID for each. Would recommend indexes on parentCode + internalReference and internalReference if not already present.

    SELECT MIN(c.ID) AS ID,

    c.internalreference

    FROM Code c

    WHERE internalreference = (

    SELECT MIN(InternalReference) AS InternalReference

    FROM code

    WHERE ParentCode = 'M222'

    )

    GROUP BY c.internalreference

    SELECT MAX(c.ID) AS ID,

    c.internalreference

    FROM Code c

    WHERE internalreference = (

    SELECT MAX(InternalReference) AS InternalReference

    FROM code

    WHERE ParentCode = 'M222'

    )

    GROUP BY c.internalreference

  • Maddave (5/23/2014)


    Thanks. That doesn't work unfortunately and still performs just as badly.

    Just talking this problem over on this forum, I think I have solved it by using a temporary table variable and extracting the rows I need into that, then doing the selects required from there. Works pretty much instantly now.

    My guess is you don't have a covering index for the query. Glad you were able to find a solution that works for you.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • Sean Lange (5/23/2014)


    Maddave (5/23/2014)


    Thanks. That doesn't work unfortunately and still performs just as badly.

    Just talking this problem over on this forum, I think I have solved it by using a temporary table variable and extracting the rows I need into that, then doing the selects required from there. Works pretty much instantly now.

    My guess is you don't have a covering index for the query. Glad you were able to find a solution that works for you.

    Yes, as Sean says, a covering index would greatly help his query:

    CREATE NONCLUSTERED INDEX IDX_References on dbo.yourLookupTable (

    parentcode

    )

    INCLUDE (

    id,

    internalreference

    );

    I think that should work, I have nothing to test it against.

  • Ahhhh.... be careful folks. This table is a 300 MILLION row table according to the op. Even with as narrow as it is, that could take up quite a bit of room. Indexes aren't much more than a duplication of data as a columnar subset with a different sort order. Think about memory, backups, restores, and index maintenance time before you go on with adding yet more indexes, especially covering indexes.

    As for the design of the table, there's nothing wrong with it except that the clustered index is on the wrong column and should be moved to help more queries enjoy the "self covering" benefits of Clustered Indexes. It's a simple "Adjacency List". Depending on what's going on with the data and what it's used for, the optimization of adding Nested Sets notation to it might be helpful... or not.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (5/23/2014)


    Ahhhh.... be careful folks. This table is a 300 MILLION row table according to the op. Even with as narrow as it is, that could take up quite a bit of room. Indexes aren't much more than a duplication of data as a columnar subset with a different sort order. Think about memory, backups, restores, and index maintenance time before you go on with adding yet more indexes, especially covering indexes.

    As for the design of the table, there's nothing wrong with it except that the clustered index is on the wrong column and should be moved to help more queries enjoy the "self covering" benefits of Clustered Indexes. It's a simple "Adjacency List". Depending on what's going on with the data and what it's used for, the optimization of adding Nested Sets notation to it might be helpful... or not.

    True, Jeff. I would actually recommend changing the clustered index as well. Having the clustered index on the ID column is not the best choice. I am not sure how long making an appropriate change to indexing on the table.

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply