When does SQL decide to load all of an index in to memory?

Question

When does SQL decide to load all of an index in to memory?

julian.fletcher

SSCrazy

Points: 2965
More actions
February 4, 2019 at 10:22 am

#373740

We're having a bit of a performance issue with one of our tables, which seems to be behaving in a slightly unexpected way.
The table has about 13.5m rows. It has an integer, identity, primary key column (DeliveryTimetableItemResolvedCostID) and various other columns, one of which is another integer column (DeliveryTimetableItemID); this has about 11.5m distinct values in it.
CREATE TABLE [dbo].[T_DeliveryTimetableItemResolvedCost]( [DeliveryTimetableItemResolvedCostID] [int] IDENTITY(1,1) NOT NULL, [DeliveryTimetableItemID] [int] NOT NULL, [Quantity] [decimal](10, 4) NOT NULL, [Cost] [money] NULL, etc. CONSTRAINT [PK_T_DeliveryTimetableItemResolvedCost] PRIMARY KEY CLUSTERED ( [DeliveryTimetableItemResolvedCostID] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 90) ON [PRIMARY] ) ON [PRIMARY] GO
This other integer column has a non-unique index on it that includes a couple of the columns (Quantity and Cost):
CREATE NONCLUSTERED INDEX [IX_T_DeliveryTimetableItemResolvedCost_DeliveryTimetableItemID] ON [dbo].[T_DeliveryTimetableItemResolvedCost]
(
[DeliveryTimetableItemID] ASC
)
INCLUDE ( [Quantity],
[Cost]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 80) ON [PRIMARY]
GO
The problem we're having is that the entirety of the non-unique index (IX_T_DeliveryTimetableItemResolvedCost_DeliveryTimetableItemID) is being loaded in to memory even when we're only wanting the Quantity and Cost of 1% of the total number of rows in the table as selected via the DeliveryTimetableItemID. That ends up using quite a lot of memory.
My question is; does SQL have some threshold over which it just says "S*d it - let's load up the whole thing!" (Presumably a dynamically-determined threshold.) I know "scans are generally faster than seeks" - is it simply a result of that?

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply

sgmunson SSC Guru Points: 110640 More actions · Answer 1

Post the query that causes this problem. Aggregates could be involved. Alternatively, some kind of function could be involved, that would have to touch every row, just to then filter out most of them.

Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)

x SSC-Insane Points: 23660 More actions · Answer 2

julian.fletcher - Monday, February 4, 2019 10:22 AM
We're having a bit of a performance issue with one of our tables, which seems to be behaving in a slightly unexpected way.
The table has about 13.5m rows. It has an integer, identity, primary key column (DeliveryTimetableItemResolvedCostID) and various other columns, one of which is another integer column (DeliveryTimetableItemID); this has about 11.5m distinct values in it.
CREATE TABLE [dbo].[T_DeliveryTimetableItemResolvedCost]( [DeliveryTimetableItemResolvedCostID] [int] IDENTITY(1,1) NOT NULL, [DeliveryTimetableItemID] [int] NOT NULL, [Quantity] [decimal](10, 4) NOT NULL, [Cost] [money] NULL, etc. CONSTRAINT [PK_T_DeliveryTimetableItemResolvedCost] PRIMARY KEY CLUSTERED ( [DeliveryTimetableItemResolvedCostID] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 90) ON [PRIMARY] ) ON [PRIMARY] GO
This other integer column has a non-unique index on it that includes a couple of the columns (Quantity and Cost):
CREATE NONCLUSTERED INDEX [IX_T_DeliveryTimetableItemResolvedCost_DeliveryTimetableItemID] ON [dbo].[T_DeliveryTimetableItemResolvedCost]
(
[DeliveryTimetableItemID] ASC
)
INCLUDE ( [Quantity],
[Cost]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 80) ON [PRIMARY]
GO
The problem we're having is that the entirety of the non-unique index (IX_T_DeliveryTimetableItemResolvedCost_DeliveryTimetableItemID) is being loaded in to memory even when we're only wanting the Quantity and Cost of 1% of the total number of rows in the table as selected via the DeliveryTimetableItemID. That ends up using quite a lot of memory.
My question is; does SQL have some threshold over which it just says "S*d it - let's load up the whole thing!" (Presumably a dynamically-determined threshold.) I know "scans are generally faster than seeks" - is it simply a result of that?

I'm thinking SQL never says "lets load up the whole thing", but it loads pages if they aren't in the buffer pool, and it unloads pages that haven't been accessed in a while. So if you did a select that had all the keys and included columns that would also force a full table scan, I'm thinking that's got the best chance, including only the keys and included columns hopefully indicate that SQL does not also need to read the table, but that's just me thinking out loud 🙂

Of course making that happens is sort of an artificial thing, as you would then not be "using" the index in your query, except to access the included columns which I'm thinking SQL MIGHT assign a lower cost by reading the index instead of the table.

julian.fletcher SSCrazy Points: 2965 More actions · Answer 3

sgmunson - Monday, February 4, 2019 11:55 AM
Post the query that causes this problem. Aggregates could be involved. Alternatively, some kind of function could be involved, that would have to touch every row, just to then filter out most of them.

The actual query relies on other stuff so I've extracted its essence. This also shows the problem.

-- Get a small, random selection of IDs. Select Top 1 Percent DeliveryTimetableItemID Into #Subset From dbo.T_DeliveryTimetableItemResolvedCost DTIRC Order By NEWID ()

-- Empty the buffer and display how much of the index is loaded. (It is 0%, as expected.)
DBCC DROPCLEANBUFFERS WITH NO_INFOMSGS
Select dbo.IndexPercentInMemory ('IX_T_DeliveryTimetableItemResolvedCost_DeliveryTimetableItemID', Null)

-- Use the index.
Select Quantity, Cost
Into #Temp
From dbo.T_DeliveryTimetableItemResolvedCost DTIRC
Join #Subset S On (S.DeliveryTimetableItemID = DTIRC.DeliveryTimetableItemID)

-- This shows 100%.
Select dbo.IndexPercentInMemory ('IX_T_DeliveryTimetableItemResolvedCost_DeliveryTimetableItemID', Null)

Drop Table #Temp
Drop Table #Subset

If you change the percentage in the first select statement to "0.001" then the percentage loaded in to memory falls to 1.85. In fact, here's a table:

Entirely reproducible. It just seems very odd that it would load up pretty much the whole index just to get 0.1% of the data.

ben.brugman SSChampion Points: 13369 More actions · Answer 4

What is the size of the index ?
How many rows are in each index page ? (Average).

For Clustered indexes there are some 'methods' to obtain the position of each row and you can exactly calculate which pages are needed. Maybe there are equivalent 'methods' for index rows pages. (See below).

If there are a hundred indexes in each page and you read 1 percent, most index pages will be touched, so the optimizer might choose to do a full index scan. With more than a hundred indexes in each page the likelyhood of touching a page increases. A full index scan is faster than individually getting most pages, but consumes more cache.

Ben

Maybe there is something similar for indexes ???
---------------------------------------------------------------------------------
-- RID physical row locator segment/page/row segment/page/row
---------------------------------------------------------------------------------
-- http://www.sqlskills.com/blogs/paul/post/sql-server-2008-new-%28undocumented%29-physical-row-locator-function.aspx

-- example
select top 10
'--' [--]
,convert(bigint,convert (binary (4), reverse (substring (V.%%physloc%%, 1, 4)))) page
,convert(bigint,convert (binary (4), reverse (substring (V.%%physloc%%, 1, 4))))/8 segment
,convert (binary (2), reverse (substring (v.%%physloc%%, 5, 2))) file_id
,convert (binary (2), reverse (substring (v.%%physloc%%, 7, 2))) slot
from A_table v

Eirikur Eiriksson SSC Guru Points: 182915 More actions · Answer 5

julian.fletcher - Tuesday, February 5, 2019 2:24 AM
sgmunson - Monday, February 4, 2019 11:55 AM
Post the query that causes this problem. Aggregates could be involved. Alternatively, some kind of function could be involved, that would have to touch every row, just to then filter out most of them.
The actual query relies on other stuff so I've extracted its essence. This also shows the problem.
-- Get a small, random selection of IDs. Select Top 1 Percent DeliveryTimetableItemID Into #Subset From dbo.T_DeliveryTimetableItemResolvedCost DTIRC Order By NEWID ()
-- Empty the buffer and display how much of the index is loaded. (It is 0%, as expected.)
DBCC DROPCLEANBUFFERS WITH NO_INFOMSGS
Select dbo.IndexPercentInMemory ('IX_T_DeliveryTimetableItemResolvedCost_DeliveryTimetableItemID', Null)
-- Use the index.
Select Quantity, Cost
Into #Temp
From dbo.T_DeliveryTimetableItemResolvedCost DTIRC
Join #Subset S On (S.DeliveryTimetableItemID = DTIRC.DeliveryTimetableItemID)
-- This shows 100%.
Select dbo.IndexPercentInMemory ('IX_T_DeliveryTimetableItemResolvedCost_DeliveryTimetableItemID', Null)
Drop Table #Temp
Drop Table #Subset
If you change the percentage in the first select statement to "0.001" then the percentage loaded in to memory falls to 1.85. In fact, here's a table:
Entirely reproducible. It just seems very odd that it would load up pretty much the whole index just to get 0.1% of the data.

Quick thought, what SQL Server loads into memory is one of the things that is not directly configurable, hence a better way of addressing this kind of performance problems would be to fully understand where the bottlenecks are (read: where it hurts)
😎

Jeff Moden SSC Guru Points: 1004406 More actions · Answer 6

Another thing that could affect this is stats that need to be updated. Distribution of data within the index could also be a factor and easily have a tipping point as to whether SQL Server will do individual SEEKs, a SEEK/RANGE SCAN, or just a full up index scan.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

sgmunson SSC Guru Points: 110640 More actions · Answer 7

julian.fletcher - Tuesday, February 5, 2019 2:24 AM
sgmunson - Monday, February 4, 2019 11:55 AM
Post the query that causes this problem. Aggregates could be involved. Alternatively, some kind of function could be involved, that would have to touch every row, just to then filter out most of them.
The actual query relies on other stuff so I've extracted its essence. This also shows the problem.
-- Get a small, random selection of IDs. Select Top 1 Percent DeliveryTimetableItemID Into #Subset From dbo.T_DeliveryTimetableItemResolvedCost DTIRC Order By NEWID ()
-- Empty the buffer and display how much of the index is loaded. (It is 0%, as expected.)
DBCC DROPCLEANBUFFERS WITH NO_INFOMSGS
Select dbo.IndexPercentInMemory ('IX_T_DeliveryTimetableItemResolvedCost_DeliveryTimetableItemID', Null)
-- Use the index.
Select Quantity, Cost
Into #Temp
From dbo.T_DeliveryTimetableItemResolvedCost DTIRC
Join #Subset S On (S.DeliveryTimetableItemID = DTIRC.DeliveryTimetableItemID)
-- This shows 100%.
Select dbo.IndexPercentInMemory ('IX_T_DeliveryTimetableItemResolvedCost_DeliveryTimetableItemID', Null)
Drop Table #Temp
Drop Table #Subset
If you change the percentage in the first select statement to "0.001" then the percentage loaded in to memory falls to 1.85. In fact, here's a table:
Entirely reproducible. It just seems very odd that it would load up pretty much the whole index just to get 0.1% of the data.

Okay, no aggregates or functions, but what does the index look like? Are Quantity and Cost any part of the index? Please post the index definition.

Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)

frederico_fonseca SSCoach Points: 16167 More actions · Answer 8

sgmunson - Tuesday, February 5, 2019 1:46 PM
julian.fletcher - Tuesday, February 5, 2019 2:24 AM
sgmunson - Monday, February 4, 2019 11:55 AM
Post the query that causes this problem. Aggregates could be involved. Alternatively, some kind of function could be involved, that would have to touch every row, just to then filter out most of them.
The actual query relies on other stuff so I've extracted its essence. This also shows the problem.
-- Get a small, random selection of IDs. Select Top 1 Percent DeliveryTimetableItemID Into #Subset From dbo.T_DeliveryTimetableItemResolvedCost DTIRC Order By NEWID ()
-- Empty the buffer and display how much of the index is loaded. (It is 0%, as expected.)
DBCC DROPCLEANBUFFERS WITH NO_INFOMSGS
Select dbo.IndexPercentInMemory ('IX_T_DeliveryTimetableItemResolvedCost_DeliveryTimetableItemID', Null)
-- Use the index.
Select Quantity, Cost
Into #Temp
From dbo.T_DeliveryTimetableItemResolvedCost DTIRC
Join #Subset S On (S.DeliveryTimetableItemID = DTIRC.DeliveryTimetableItemID)
-- This shows 100%.
Select dbo.IndexPercentInMemory ('IX_T_DeliveryTimetableItemResolvedCost_DeliveryTimetableItemID', Null)
Drop Table #Temp
Drop Table #Subset
If you change the percentage in the first select statement to "0.001" then the percentage loaded in to memory falls to 1.85. In fact, here's a table:
Entirely reproducible. It just seems very odd that it would load up pretty much the whole index just to get 0.1% of the data.
Okay, no aggregates or functions, but what does the index look like? Are Quantity and Cost any part of the index? Please post the index definition.

he did post the index
on DeliveryTimetableItemID
with INCLUDE of Quantity and Cost

with regards to the above example - 130k rows on #Subset with a possibly very wide distribution is most likely a reason for SQL to determine its better to do a index scan vs multiple lookups
And we don't even know if that is what is doing as we don't have any actual explain plan - but likely

julian.fletcher SSCrazy Points: 2965 More actions · Answer 9

frederico_fonseca - Tuesday, February 5, 2019 2:13 PM
sgmunson - Tuesday, February 5, 2019 1:46 PM
julian.fletcher - Tuesday, February 5, 2019 2:24 AM
sgmunson - Monday, February 4, 2019 11:55 AM
Post the query that causes this problem. Aggregates could be involved. Alternatively, some kind of function could be involved, that would have to touch every row, just to then filter out most of them.
The actual query relies on other stuff so I've extracted its essence. This also shows the problem.
-- Get a small, random selection of IDs. Select Top 1 Percent DeliveryTimetableItemID Into #Subset From dbo.T_DeliveryTimetableItemResolvedCost DTIRC Order By NEWID ()
-- Empty the buffer and display how much of the index is loaded. (It is 0%, as expected.)
DBCC DROPCLEANBUFFERS WITH NO_INFOMSGS
Select dbo.IndexPercentInMemory ('IX_T_DeliveryTimetableItemResolvedCost_DeliveryTimetableItemID', Null)
-- Use the index.
Select Quantity, Cost
Into #Temp
From dbo.T_DeliveryTimetableItemResolvedCost DTIRC
Join #Subset S On (S.DeliveryTimetableItemID = DTIRC.DeliveryTimetableItemID)
-- This shows 100%.
Select dbo.IndexPercentInMemory ('IX_T_DeliveryTimetableItemResolvedCost_DeliveryTimetableItemID', Null)
Drop Table #Temp
Drop Table #Subset
If you change the percentage in the first select statement to "0.001" then the percentage loaded in to memory falls to 1.85. In fact, here's a table:
Entirely reproducible. It just seems very odd that it would load up pretty much the whole index just to get 0.1% of the data.
Okay, no aggregates or functions, but what does the index look like? Are Quantity and Cost any part of the index? Please post the index definition.
he did post the index
on DeliveryTimetableItemID
with INCLUDE of Quantity and Cost
with regards to the above example - 130k rows on #Subset with a possibly very wide distribution is most likely a reason for SQL to determine its better to do a index scan vs multiple lookups
And we don't even know if that is what is doing as we don't have any actual explain plan - but likely

Yes, I think that must be it - the fact that although we're accessing a small percentage of rows in the index, those rows are spread throughout the whole index. It's not something we see in the rest of the database but that's probably because access generally tends to be more focused.

The original 'hurt' arose from the index not having the "etc." columns which resulted in a key lookup and the whole table being loaded in to memory. Putting those columns on the index meant we just loaded up 100% of that instead of 100% of the table; and that was about 700MB instead of 2GB, which was a step in the right direction.

Thanks everybody!

sgmunson SSC Guru Points: 110640 More actions · Answer 10

julian.fletcher - Wednesday, February 6, 2019 7:09 AM
frederico_fonseca - Tuesday, February 5, 2019 2:13 PM
sgmunson - Tuesday, February 5, 2019 1:46 PM
julian.fletcher - Tuesday, February 5, 2019 2:24 AM
sgmunson - Monday, February 4, 2019 11:55 AM
Post the query that causes this problem. Aggregates could be involved. Alternatively, some kind of function could be involved, that would have to touch every row, just to then filter out most of them.
The actual query relies on other stuff so I've extracted its essence. This also shows the problem.
-- Get a small, random selection of IDs. Select Top 1 Percent DeliveryTimetableItemID Into #Subset From dbo.T_DeliveryTimetableItemResolvedCost DTIRC Order By NEWID ()
-- Empty the buffer and display how much of the index is loaded. (It is 0%, as expected.)
DBCC DROPCLEANBUFFERS WITH NO_INFOMSGS
Select dbo.IndexPercentInMemory ('IX_T_DeliveryTimetableItemResolvedCost_DeliveryTimetableItemID', Null)
-- Use the index.
Select Quantity, Cost
Into #Temp
From dbo.T_DeliveryTimetableItemResolvedCost DTIRC
Join #Subset S On (S.DeliveryTimetableItemID = DTIRC.DeliveryTimetableItemID)
-- This shows 100%.
Select dbo.IndexPercentInMemory ('IX_T_DeliveryTimetableItemResolvedCost_DeliveryTimetableItemID', Null)
Drop Table #Temp
Drop Table #Subset
If you change the percentage in the first select statement to "0.001" then the percentage loaded in to memory falls to 1.85. In fact, here's a table:
Entirely reproducible. It just seems very odd that it would load up pretty much the whole index just to get 0.1% of the data.
Okay, no aggregates or functions, but what does the index look like? Are Quantity and Cost any part of the index? Please post the index definition.
he did post the index
on DeliveryTimetableItemID
with INCLUDE of Quantity and Cost
with regards to the above example - 130k rows on #Subset with a possibly very wide distribution is most likely a reason for SQL to determine its better to do a index scan vs multiple lookups
And we don't even know if that is what is doing as we don't have any actual explain plan - but likely
Yes, I think that must be it - the fact that although we're accessing a small percentage of rows in the index, those rows are spread throughout the whole index. It's not something we see in the rest of the database but that's probably because access generally tends to be more focused.
The original 'hurt' arose from the index not having the "etc." columns which resulted in a key lookup and the whole table being loaded in to memory. Putting those columns on the index meant we just loaded up 100% of that instead of 100% of the table; and that was about 700MB instead of 2GB, which was a step in the right direction.
Thanks everybody!

Yep. Missed that the index def was already posted. The problem is clearly the use of NEWID() for your sample, as that can distribute the rows all across the index with even the smallest percentages. Simulating normal use requires a different way to get your sample data. Use whatever date constraints might usually apply first, and then sample within a date range.

Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)