Query of 100 mil rows with multiple parameters

Question

Post reply

Query of 100 mil rows with multiple parameters

Viewing 15 posts - 16 through 30 (of 39 total)

You must be logged in to reply to this topic. Login to reply

MarkHK SSC Enthusiast Points: 144 More actions · Answer 1

Wow - now I am down to 1 sec. after I added the index:

CREATE NONCLUSTERED INDEX [IX_BRWEEKHIS_TEST]

ON [dbo].[BRFCLSeaWeekHis] ([PortFrom],[PortTo],[QuoteType],[Year])

INCLUDE ([CurrencyCode],[RateTotal],[SeaTotal],[Total])

However, the last two conversions I don't understand, because they seem correct when declared into the temp table. I will dig a bit more into that, but now it is really great! Thank you very much!

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 2

ChrisM@Work (2/28/2014)
Secondly, your ordinary index has most of the columns of the clustered index as INCLUDE columns. This is unnecessary because nonclustered indexes contain the cluster keys - they're already included behind the scenes.

Don't want to derail, but I disagree with this recommendation. What happens if someone changes the clustered index.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

ChrisM@home SSC-Insane Points: 24260 More actions · Answer 3

GilaMonster (3/4/2014)
ChrisM@Work (2/28/2014)
Secondly, your ordinary index has most of the columns of the clustered index as INCLUDE columns. This is unnecessary because nonclustered indexes contain the cluster keys - they're already included behind the scenes.
Don't want to derail, but I disagree with this recommendation. What happens if someone changes the clustered index.

Since the OP is satisfied for the time being, derailing is moot. I don't recall ever seeing this discussed on ssc so I guess now is a good time.

If someone changes the clustered index and removes one of those columns, then the plan changes (for MarkHK's query) and a lookup drags the time down again. However:

1. How often do you change a clustered index on a production db?

2. If you did, wouldn't you be on high alert for performance dips caused by plan changes?

3. With a clustered index having this many keys, would you switch to an identity column? I would, provided that analysis of index usage patterns showed the clustered index isn't particularly useful to existing queries.

I'll stand by the suggestion 😉

[font="Arial"]^{Low-hanging fruit picker and defender of the moggies}[/font]

For better assistance in answering your questions, please read this[/url].

Understanding and using APPLY, (I)[/url] and (II)[/url] Paul White[/url]

Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]

ChrisM@home SSC-Insane Points: 24260 More actions · Answer 4

Jeff Moden (3/2/2014)
I guess my first recommendation would be to make the PK a nonclustered index. It's currently way too wide to be of any practical use as a clustered index and will cause BTree of the index on quotetype column to bloat quite badly (remember that all the columns of a clustered index are added to all nonclustered indexes) making it read many more pages per lookup than it should need to. I would put a new clustered index on year and week because that seems to be predominate in all of your queries.

Yes - but I'd try PortFrom and PortTo as cluster keys because they may be more selective and also more common as leading edge in other indexes - promoting MERGE or NLIJ's over hash joins.

[font="Arial"]^{Low-hanging fruit picker and defender of the moggies}[/font]

For better assistance in answering your questions, please read this[/url].

Understanding and using APPLY, (I)[/url] and (II)[/url] Paul White[/url]

Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 5

ChrisM@home (3/4/2014)
1. How often do you change a clustered index on a production db?

I do fairly often, but then that's what I do mostly (database tuning on various badly performing systems for clients)

2. If you did, wouldn't you be on high alert for performance dips caused by plan changes?

Yes, but which is better, be on alert for plan changes due to indexes no longer being covering, or knowing that my indexes will still be covering no matter what happens to the clustered index? Given that there's no overhead from the second option, I'll take that any day, especially as I may not be the one changing the clustered index, could be a consultant that the client gets in months after I last looked at their system.

3. With a clustered index having this many keys, would you switch to an identity column?

Maybe. Maybe not. Depends on a lot of factors.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Jeff Moden SSC Guru Points: 1004376 More actions · Answer 6

ChrisM@home (3/4/2014)
Jeff Moden (3/2/2014)
I guess my first recommendation would be to make the PK a nonclustered index. It's currently way too wide to be of any practical use as a clustered index and will cause BTree of the index on quotetype column to bloat quite badly (remember that all the columns of a clustered index are added to all nonclustered indexes) making it read many more pages per lookup than it should need to. I would put a new clustered index on year and week because that seems to be predominate in all of your queries.
Yes - but I'd try PortFrom and PortTo as cluster keys because they may be more selective and also more common as leading edge in other indexes - promoting MERGE or NLIJ's over hash joins.

It depends. Generally speaking, it's best to put the clustered index on some form of "ever increasing" column(s) to prevent page splits. Putting the CI on PortFrom could cause massive page splits during inserts to the point where it may cause timeouts at the GUI.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

ChrisM@home SSC-Insane Points: 24260 More actions · Answer 7

Jeff Moden (3/4/2014)
ChrisM@home (3/4/2014)
Jeff Moden (3/2/2014)
I guess my first recommendation would be to make the PK a nonclustered index. It's currently way too wide to be of any practical use as a clustered index and will cause BTree of the index on quotetype column to bloat quite badly (remember that all the columns of a clustered index are added to all nonclustered indexes) making it read many more pages per lookup than it should need to. I would put a new clustered index on year and week because that seems to be predominate in all of your queries.
Yes - but I'd try PortFrom and PortTo as cluster keys because they may be more selective and also more common as leading edge in other indexes - promoting MERGE or NLIJ's over hash joins.
It depends. Generally speaking, it's best to put the clustered index on some form of "ever increasing" column(s) to prevent page splits. Putting the CI on PortFrom could cause massive page splits during inserts to the point where it may cause timeouts at the GUI.

Aha! Good point well made. New ports get built but not very often.

Y'know, it doesn't seem so long ago when we were avoiding hotspots by choosing the ports option 😀

[font="Arial"]^{Low-hanging fruit picker and defender of the moggies}[/font]

For better assistance in answering your questions, please read this[/url].

Understanding and using APPLY, (I)[/url] and (II)[/url] Paul White[/url]

Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]

ChrisM@home SSC-Insane Points: 24260 More actions · Answer 8

GilaMonster (3/4/2014)
ChrisM@home (3/4/2014)
1. How often do you change a clustered index on a production db?
I do fairly often, but then that's what I do mostly (database tuning on various badly performing systems for clients)
2. If you did, wouldn't you be on high alert for performance dips caused by plan changes?
Yes, but which is better, be on alert for plan changes due to indexes no longer being covering, or knowing that my indexes will still be covering no matter what happens to the clustered index? Given that there's no overhead from the second option, I'll take that any day, especially as I may not be the one changing the clustered index, could be a consultant that the client gets in months after I last looked at their system.
3. With a clustered index having this many keys, would you switch to an identity column?
Maybe. Maybe not. Depends on a lot of factors.

Less rows per page because of data duplication doesn't come without cost. But that's defensive development. There's always some overhead and someone always has to do the sums, calculate the cost-benefit.

Funny - last Monday an agency I've worked with a couple of times called me up about a tuning gig, only a few miles away and good money. I've never done a pure tuning gig before. The temptation was strong, I'd love to have done it - but I don't think I'm ready yet, and in any case, sticking out a gig to the end is almost always good practice.

[font="Arial"]^{Low-hanging fruit picker and defender of the moggies}[/font]

For better assistance in answering your questions, please read this[/url].

Understanding and using APPLY, (I)[/url] and (II)[/url] Paul White[/url]

Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 9

ChrisM@home (3/4/2014)
Less rows per page because of data duplication doesn't come without cost.

What data duplication?

There's always some overhead and someone always has to do the sums, calculate the cost-benefit.

Not in this case. The only 'overhead' from explicitly specifying clustered index keys as part of an index when you need them as opposed to allowing for them to be implicitly included comes from the extra typing of the CREATE INDEX.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 10

GilaMonster (3/4/2014)
ChrisM@home (3/4/2014)
Less rows per page because of data duplication doesn't come without cost.
What data duplication?
There's always some overhead and someone always has to do the sums, calculate the cost-benefit.
Not in this case. The only 'overhead' from explicitly specifying clustered index keys as part of an index when you need them as opposed to allowing for them to be implicitly included comes from the extra typing of the CREATE INDEX.

I'm not sure what you're implying. If a multi-column clustered index includes a GUID, a nonclustered index which incorporates that same column either as a key or include column is going to be significantly wider than one which doesn't, and relies on its existence in the cluster key for coverage. Oh no, I can feel a monster blooper coming on, it must be at least six months since the last one :ermm:

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 11

ChrisM@Work (3/5/2014)
If a multi-column clustered index includes a GUID, a nonclustered index which incorporates that same column either as a key or include column is going to be significantly wider than one which doesn't, and relies on its existence in the cluster key for coverage.

:ermm:

Your reason for not needing to explicitly include the columns in the nonclustered index is based on the fact that all columns of the clustered index key are implicitly present in all nonclustered indexes. Given that, why is an index which explicitly includes a GUID going to be any larger than an index which implicitly includes the GUID because it's part of the clustered index?

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 12

GilaMonster (3/5/2014)
ChrisM@Work (3/5/2014)
If a multi-column clustered index includes a GUID, a nonclustered index which incorporates that same column either as a key or include column is going to be significantly wider than one which doesn't, and relies on its existence in the cluster key for coverage.
:ermm:
Your reason for not needing to explicitly include the columns in the nonclustered index is based on the fact that all columns of the clustered index key are implicitly present in all nonclustered indexes. Given that, why is an index which explicitly includes a GUID going to be any larger than an index which implicitly includes the GUID because it's part of the clustered index?

Okay, try this:

"Given that, why is an index which explicitly and implicitly includes a GUID going to be any larger than an index which only implicitly includes the GUID?"

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 13

ChrisM@Work (3/5/2014)
"Given that, why is an index which explicitly and implicitly includes a GUID going to be any larger than an index which only implicitly includes the GUID?"

Why do you assume that the GUID is going to be explicitly and implicitly included in an index when it's only needed once and when columns cannot be present twice in an index?

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 14

CREATE TABLE ReallyStupidTable (

ID CHAR(100),

OtherID INT

)

GO

CREATE UNIQUE CLUSTERED INDEX idx_Silly ON ReallyStupidTable (ID)

GO

INSERT INTO ReallyStupidTable (ID, OtherID)

SELECT ANumber, ANumber FROM (

SELECT TOP (5000000) ROW_NUMBER() OVER (ORDER BY (SELECT 1)) ANumber FROM msdb.sys.columns AS c CROSS JOIN msdb.sys.columns c2

) sub

CREATE INDEX idx_1 ON ReallyStupidTable(OtherID) -- ID is implicitly a key column because the NC index is not defined unique

CREATE INDEX idx_2 ON ReallyStupidTable(OtherID, ID) -- ID is explicitly a key column

GO

SELECT OBJECT_NAME(ps.object_id) AS TableName, i.name, ps.in_row_data_page_count, in_row_used_page_count, used_page_count

FROM sys.dm_db_partition_stats AS ps INNER JOIN sys.indexes AS i ON ps.object_id = i.object_id AND ps.index_id = i.index_id

WHERE ps.object_id = OBJECT_ID('ReallyStupidTable')

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 15

GilaMonster (3/5/2014)
CREATE TABLE ReallyStupidTable ( ID CHAR(100), OtherID INT ) GO CREATE UNIQUE CLUSTERED INDEX idx_Silly ON ReallyStupidTable (ID) GO INSERT INTO ReallyStupidTable (ID, OtherID) SELECT ANumber, ANumber FROM ( SELECT TOP (5000000) ROW_NUMBER() OVER (ORDER BY (SELECT 1)) ANumber FROM msdb.sys.columns AS c CROSS JOIN msdb.sys.columns c2 ) sub CREATE INDEX idx_1 ON ReallyStupidTable(OtherID) -- ID is implicitly a key column because the NC index is not defined unique CREATE INDEX idx_2 ON ReallyStupidTable(OtherID, ID) -- ID is explicitly a key column GO SELECT OBJECT_NAME(ps.object_id) AS TableName, i.name, ps.in_row_data_page_count, in_row_used_page_count, used_page_count FROM sys.dm_db_partition_stats AS ps INNER JOIN sys.indexes AS i ON ps.object_id = i.object_id AND ps.index_id = i.index_id
WHERE ps.object_id = OBJECT_ID('ReallyStupidTable')

I know I know, I'm doing the same thing:

DROP TABLE IndexTester

;WITH iTally AS ( -- (100000 row(s) affected)

SELECT rn = ROW_NUMBER() OVER(ORDER BY (SELECT NULL))

FROM (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) d (n)

CROSS JOIN (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) e (n)

CROSS JOIN (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) f (n)

CROSS JOIN (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) g (n)

CROSS JOIN (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) h (n)

)

SELECT

Ten = 1+ABS(CHECKSUM(NEWID())) %10,

Ton = 1+ABS(CHECKSUM(NEWID())) %100,

Whopper = NEWID()

INTO IndexTester

FROM iTally

CREATE UNIQUE CLUSTERED INDEX cx_tentonwhopper ON IndexTester (whopper)

CREATE INDEX ix_tenton ON IndexTester (ten, ton)

SELECT ten, ton, whopper

FROM IndexTester WHERE ten = 10 AND ton = 100

CREATE INDEX ix_tentonwhopper ON IndexTester (ten, ton) INCLUDE (whopper)

SELECT ten, ton, whopper

FROM IndexTester WHERE ten = 10 AND ton = 100

- The size stats of the two ordinary indexes is virtually the same.

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden