Getting the minimum and the maximum values in a large table

Question

Post reply

Getting the minimum and the maximum values in a large table

Maddave

SSCertifiable

Points: 6828
More actions
May 23, 2014 at 9:22 am

#291664

I would be grateful for some advice with the following.
Table definition:
Create table code (
id identity(1,1)
code
parentcode
internalreference)
There are other columns but I have omitted them for clarity.
The clustered index is on the ID.
There are indexes on the code, parentcode and internalreference columns.
The problem is the table stores a parentcode with an internalreference and around 2000 codes which are children of the parentcode. I realise the table is very badly designed, but the company love orms!!
Example:
ID| Code| ParentCode| InternalReference|
1 | M111| NULL | 1|
2 | AAA | M111 | 2|
3 | .... | .... | ....|
4 | AAB | M111 | 2000|
5 | M222 | NULL | 2001|
6 | ZZZ | M222 | 2002|
7 | .... | .... | .... |
8 | ZZA | M222 | 4000|
The table currently holds around 300 millions rows.
The application does the following two queries to find the first internalreference of a code and the last internalrefernce of a code:
--Find first internalrefernce
SELECT TOP 1 ID, InternalReference
FROM code
WHERE ParentCode = 'M222'
Order By InternalReference
-- Find last ineternalreference
SELECT TOP 1 ID, InternalReference
FROM code
WHERE ParentCode = 'M222'
Order By InternalReference DESC
These queries are running for a very long time, only because of the sort. If I run the query without the sort, then they return the results instantly, but obviously this doesn't help me find the first and last internalreference for a parentCode.
I realise the best way to fix this is to redesign the table, but I cannot do that at this time.
Is there a better way to do this so that two queries which individually run very slowly, can be combined into one that is more efficient?
Thanks.

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply

Sean Lange SSC Guru Points: 286590 More actions · Answer 1

Sure...the title of your thread pretty much gave you the solution.

SELECT ID,

MIN(InternalReference) as MinInternalReference,

MAX(InternalReference) as MaxInternalReference

FROM code

WHERE ParentCode = 'M222'

_______________________________________________________________

Need help? Help us help you.

Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

Maddave SSCertifiable Points: 6828 More actions · Answer 2

Thanks. That doesn't work unfortunately and still performs just as badly.

Just talking this problem over on this forum, I think I have solved it by using a temporary table variable and extracting the rows I need into that, then doing the selects required from there. Works pretty much instantly now.

David_Reed SSC Veteran Points: 250 More actions · Answer 3

The following should work to bring back a single ID for each. Would recommend indexes on parentCode + internalReference and internalReference if not already present.

SELECT MIN(c.ID) AS ID,

c.internalreference

FROM Code c

WHERE internalreference = (

SELECT MIN(InternalReference) AS InternalReference

FROM code

WHERE ParentCode = 'M222'

)

GROUP BY c.internalreference

SELECT MAX(c.ID) AS ID,

c.internalreference

FROM Code c

WHERE internalreference = (

SELECT MAX(InternalReference) AS InternalReference

FROM code

WHERE ParentCode = 'M222'

)

GROUP BY c.internalreference

Sean Lange SSC Guru Points: 286590 More actions · Answer 4

Maddave (5/23/2014)
Thanks. That doesn't work unfortunately and still performs just as badly.
Just talking this problem over on this forum, I think I have solved it by using a temporary table variable and extracting the rows I need into that, then doing the selects required from there. Works pretty much instantly now.

My guess is you don't have a covering index for the query. Glad you were able to find a solution that works for you.

_______________________________________________________________

Need help? Help us help you.

Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 5

Sean Lange (5/23/2014)
Maddave (5/23/2014)
Thanks. That doesn't work unfortunately and still performs just as badly.
Just talking this problem over on this forum, I think I have solved it by using a temporary table variable and extracting the rows I need into that, then doing the selects required from there. Works pretty much instantly now.
My guess is you don't have a covering index for the query. Glad you were able to find a solution that works for you.

Yes, as Sean says, a covering index would greatly help his query:

CREATE NONCLUSTERED INDEX IDX_References on dbo.yourLookupTable (

parentcode

)

INCLUDE (

id,

internalreference

);

I think that should work, I have nothing to test it against.

Jeff Moden SSC Guru Points: 1004434 More actions · Answer 6

Ahhhh.... be careful folks. This table is a 300 MILLION row table according to the op. Even with as narrow as it is, that could take up quite a bit of room. Indexes aren't much more than a duplication of data as a columnar subset with a different sort order. Think about memory, backups, restores, and index maintenance time before you go on with adding yet more indexes, especially covering indexes.

As for the design of the table, there's nothing wrong with it except that the clustered index is on the wrong column and should be moved to help more queries enjoy the "self covering" benefits of Clustered Indexes. It's a simple "Adjacency List". Depending on what's going on with the data and what it's used for, the optimization of adding Nested Sets notation to it might be helpful... or not.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 7

Jeff Moden (5/23/2014)
Ahhhh.... be careful folks. This table is a 300 MILLION row table according to the op. Even with as narrow as it is, that could take up quite a bit of room. Indexes aren't much more than a duplication of data as a columnar subset with a different sort order. Think about memory, backups, restores, and index maintenance time before you go on with adding yet more indexes, especially covering indexes.
As for the design of the table, there's nothing wrong with it except that the clustered index is on the wrong column and should be moved to help more queries enjoy the "self covering" benefits of Clustered Indexes. It's a simple "Adjacency List". Depending on what's going on with the data and what it's used for, the optimization of adding Nested Sets notation to it might be helpful... or not.

True, Jeff. I would actually recommend changing the clustered index as well. Having the clustered index on the ID column is not the best choice. I am not sure how long making an appropriate change to indexing on the table.