April 8, 2013 at 2:48 pm
Just fair warning, I am definitely a newbie, I fell into this work so to speak.
I have a query that is taking between 30 seconds to 30 minutes depending on how much data is being run at the current time. I used SQL Server Management Studio Express to analyze the query and show me the execution plan with statistics ON.
I have since discovered where the problem is. I have one table in my database where we collect information, this information is updated daily. In fact it's not really updated, it's brand new data everyday collected from various websites.
Because this table collects brand new data daily it is not indexed, it's a HEAP. There are between 4-6 million rows of data on any given day depending on how much data is run that day. We clear all records older then 3 days from this table every night.
Now that I've explained the table a little, here's where the problem comes in: (This is not the entire stored procedure just the part that takes 98% operator cost time)
-------------------------------------------------------------------
-- GET THE RATES IN TEMP TABLE
-------------------------------------------------------------------
SELECT u.*,
NULL AS RateTypeID,
NULL AS RoomTypeID,
NULL ASRatePriority,
NULL AS RoomPriority
INTO #RESULTS
FROM TFX_V2_UnfilteredRates u
WHERE SiteID IN
(
SELECT * FROM #SITES
)
ANDHotelID IN
(
SELECT * FROM #HOTELS
)
AND timeInserted > @JobDate
DROP TABLE #SITES
DROP TABLE #HOTELS
----------------------------------------------
It takes far too long to scan this table for the appropriate rates. Any suggestions/ideas on how to make this process run more efficiently would greatly be appreciated. I've thought about creating an index for the TFX_v2_unfiltedrates table which is the one that is 5million rows give or take. However I am worried that it would slow down my inserts on that table, since we need to insert data very quickly.
If you need more info please let me know, however I was trying to prevent from pasting the entire stored procedure.
Thanks
-Josh
April 8, 2013 at 4:01 pm
Heya Josh,
You're most likely on the right track with the heap, but for optimization items there's a bunch of data most of us prefer to see before we start bounding through possibilities... primarily because we'd like to show you the right way on the first try.
If you take a look in my signature below, in the 'help on indexing/tuning' link there's a list of what we'll want and it walks you through getting it for us. Short form: All tables involved and their indexes (including the other tables), the query itself in full form, and the .sqlplan from an actual execution (as the heap stands now).
From there, we'll be able to analyze its connections and expectations and get you on track pretty quickly.
Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.
For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA
April 8, 2013 at 4:54 pm
Here you go, I tried to include all relevant information.
Again thanks for helping!
-Josh
April 8, 2013 at 6:59 pm
jrestuccio (4/8/2013)
Here you go, I tried to include all relevant information.Again thanks for helping!
-Josh
Nope, not what I'd want to see. I'd want to see the following:
1) DDL for the tables including the DDL for indexes
2) Actual execution plan as a .sqlplan file, not XML plans you included in the spreadsheets.
3) your current code
April 9, 2013 at 7:35 am
Did you look at all the sheets within the excel workbook?
It has the stored procedure (my code), also it is a SQL 2000 database and the save execution plan is greyed out when I try to save it.
Sorry for the silly question but what are you needing when you say DDL for all tables and indexes?
What's a quick way to pull that?
I thought I had this info on the "Relevant Tables and Indexes" sheet.
-Josh
April 9, 2013 at 8:06 am
jrestuccio (4/9/2013)
Did you look at all the sheets within the excel workbook?It has the stored procedure (my code), also it is a SQL 2000 database and the save execution plan is greyed out when I try to save it.
Sorry for the silly question but what are you needing when you say DDL for all tables and indexes?
What's a quick way to pull that?
I thought I had this info on the "Relevant Tables and Indexes" sheet.
-Josh
What we want to see is ddl (create table statements and create index statements), sample data (insert statements). This is all detailed in the link in my signature about best practices when posting questions. The idea here is so that we can recreate your tables and such in our environment. We also need to see the actual execution plan.
_______________________________________________________________
Need help? Help us help you.
Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.
Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.
Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/
April 9, 2013 at 12:35 pm
try this index on your heap table......not quite sure why you are worried about data load speeds...seems you are already deleting data over three days old and adding new. Therefore I am going to assume you have a period at some time to perform this task. You could drop the index first and then recreate once loaded...that may help...but an index on your date may also assist the delete.
anyways...food for thought...here is some test data to paly with...its not a replica of your system ...but it does have 5M rows and some hotels/sites.
kind regards
USE [tempdb]
GO
DROP TABLE [dbo].[Hotel]
GO
DROP TABLE [dbo].[Site]
GO
DROP TABLE [dbo].[TransData]
GO
SELECT TOP 5000000 ---- NOTE 5 MILLION rows
TranID = IDENTITY(INT, 1, 1),
SiteID = 1 + CAST(Abs(Checksum(Newid()) % 10000) AS INT),
HotelID = 1 + CAST(Abs(Checksum(Newid()) % 1000 ) AS INT),
SalesAmount = CAST(Rand(Checksum(Newid())) * 99 + 1 AS DECIMAL(5, 2)),
TransDate = Dateadd(dd, Abs(Checksum(Newid()))%Datediff(dd, '2010', '2013'), '2010')
INTO TransData
FROM sys.all_columns ac1
CROSS JOIN sys.all_columns ac2
CROSS JOIN sys.all_columns ac3
SELECT TOP 500
SiteID = 1 + CAST(Abs(Checksum(Newid()) % 10000) AS INT)
INTO Site
FROM sys.all_columns ac1
CROSS JOIN sys.all_columns ac2
GO
SELECT TOP 50
HotelID = 1 + CAST(Abs(Checksum(Newid()) % 1000) AS INT)
INTO Hotel
FROM sys.all_columns ac1
CROSS JOIN sys.all_columns ac2
GO
set statistics time, IO on
declare @jobdate as datetime
set @jobdate = '2011-07-09'
SELECT Hotel.HotelID, Site.SiteID, TransData.TransDate
FROM TransData INNER JOIN
Site ON TransData.SiteID = Site.SiteID INNER JOIN
Hotel ON TransData.HotelID = Hotel.HotelID
WHERE (TransData.TransDate = @jobdate)
set statistics time, IO OFF
/*create index*/
CREATE NONCLUSTERED INDEX [NIX_transdata] ON [dbo].[TransData]
(
[TransDate] ASC,
[HotelID] ASC,
[SiteID] ASC
) ON [PRIMARY]
GO
set statistics time, IO on
declare @jobdate as datetime
set @jobdate = '2011-07-09'
SELECT Hotel.HotelID, Site.SiteID, TransData.TransDate
FROM TransData INNER JOIN
Site ON TransData.SiteID = Site.SiteID INNER JOIN
Hotel ON TransData.HotelID = Hotel.HotelID
WHERE (TransData.TransDate = @jobdate)
set statistics time, IO OFF
________________________________________________________________
you can lead a user to data....but you cannot make them think
and remember....every day is a school day
April 9, 2013 at 12:37 pm
Also, if you are adding and deleting data, you really should create a clustered index on the table. As a heap it will just continue to grow and not reuse empty space allocated to the table.
April 9, 2013 at 12:49 pm
Lynn Pettis (4/9/2013)
Also, if you are adding and deleting data, you really should create a clustered index on the table. As a heap it will just continue to grow and not reuse empty space allocated to the table.
Thanks Lynn
________________________________________________________________
you can lead a user to data....but you cannot make them think
and remember....every day is a school day
April 9, 2013 at 1:39 pm
Thank you SO MUCH for the replies, I am trying to take it all in.
Let me describe a little bit about why the data is deleted so regularly and why it's important to not slow down insert speeds.
Imagine 1000 searches being performed at one time, and as soon as the search is done, it collects the data and inserts the data into the TFX_v2_unfilteredrates table. The reason we need this to happen quickly is because we have a massive amount of searches to complete, and since the data needs to be FRESH it needs to be collected within 1 day.
The reason for deleteing the data is merely to keep the size of the table down to a workable size. Since I believe our queries are poorly written........
-Josh
April 10, 2013 at 10:14 am
jrestuccio (4/9/2013)
Thank you SO MUCH for the replies, I am trying to take it all in.Let me describe a little bit about why the data is deleted so regularly and why it's important to not slow down insert speeds.
Imagine 1000 searches being performed at one time, and as soon as the search is done, it collects the data and inserts the data into the TFX_v2_unfilteredrates table. The reason we need this to happen quickly is because we have a massive amount of searches to complete, and since the data needs to be FRESH it needs to be collected within 1 day.
The reason for deleteing the data is merely to keep the size of the table down to a workable size. Since I believe our queries are poorly written........
-Josh
Assuming that your hardware is spec'ed appropriately for the workload, even 1,000 inserts of 5,000 rows each (5 million rows total) probably shouldn't take more than five or ten minutes, even with a couple or three indexes to update. This code inserts 5 million rows into a test table - it ran on my DEV server in about 30 seconds:
create table dbo.insert_test (date_collected datetime, hotel_name varchar(20), hotel_id int, rate decimal(6,2))
go
set nocount on
declare @i int = 1
while @i<=1000
begin
insert into dbo.insert_test
select dateadd(day, (@i*-1), getdate()), 'xxxxxxxxxxxxxxxxxxxx', t.N, 1.32 * t.N
from .dbo.tally t
where t.N <= 5000
set @i = @i+1
end
If you're avoiding indexes to save a few minutes once a day when the table is loaded with new data, but the absence of indexes is costing you several minutes every time you run query, you're shooting yourself in the foot. Before you suffer through the poor performance of your query much longer, I'd investigate the impact of a proper indexing scheme on both the insert/delete process and the query performance to make sure you're striking an appropriate balance.
Jason Wolfkill
April 10, 2013 at 1:31 pm
Sorry, got inundated yesterday, but wanted to catch up with this. As mentioned, we really need everything for tuning assistance. The article describes the easiest ways to get it for us.
jrestuccio (4/9/2013)
Let me describe a little bit about why the data is deleted so regularly and why it's important to not slow down insert speeds.
Even with a clustered index you won't see a significant difference in insert speeds. Start doing indexed views and triggers and that's a different story, or overloading the non-clustered indexing.
Imagine 1000 searches being performed at one time, and as soon as the search is done, it collects the data and inserts the data into the TFX_v2_unfilteredrates table.
Not really an issue. I/O is I/O. You might even get better performance with a split index to avoid hotspotting and using a really loose fillfactor, but that's a different story.
The reason we need this to happen quickly is because we have a massive amount of searches to complete, and since the data needs to be FRESH it needs to be collected within 1 day.
The reason for deleteing the data is merely to keep the size of the table down to a workable size. Since I believe our queries are poorly written........
It's hard not to have poorly written queries when those queries have no indexes to work from. Get indexes into place and you may find you can actually keep your archive.
Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.
For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA
April 12, 2013 at 8:32 am
Hey everyone, thanks again for the replies!
I am still a little confused on the best way to supply the DDL information, in the articles you link to it says the SP_help output should work, which I included in my spreadsheet.
Also, the "save execution plan" is greyed out in SSMS express, so what's the best way to get you the execution plan?
Again this is a SQL Server 2000 database.
Thanks again! Also, not sure if I am breaking any forum rules or anything (hope not) but how much would any of you charge as a consultant to take a hands on look?
-Josh
April 12, 2013 at 8:46 am
jrestuccio (4/12/2013)
Also, the "save execution plan" is greyed out in SSMS express, so what's the best way to get you the execution plan?
See the article in Craig's sig. Describes how to get an exec plan in SQL 2000, easiest way to save it is in a spreadsheet. Haven't looked at your spreadsheet to see what's in there, a little busy with stuff.
Thanks again! Also, not sure if I am breaking any forum rules or anything (hope not) but how much would any of you charge as a consultant to take a hands on look?
If by hands-on you mean remote, disconnected and across a couple of oceans, roughly $100/hr with a minimum of 8 hours. If you want in-person then I can't help you there.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
April 12, 2013 at 8:47 am
Lynn Pettis (4/8/2013)
jrestuccio (4/8/2013)
Here you go, I tried to include all relevant information.Again thanks for helping!
-Josh
2) Actual execution plan as a .sqlplan file, not XML plans you included in the spreadsheets.
SQL 2000 forum. Hence assumed to be a SQL 2000 server unless specified other. Hence no XML plans.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
Viewing 15 posts - 1 through 15 (of 18 total)
You must be logged in to reply to this topic. Login to reply