Performance on 10 million row table

Question

Performance on 10 million row table

stergiazotali

SSC-Addicted

Points: 421
More actions
September 18, 2013 at 4:03 am

#282667

Hello all,
I have a table, which is destined to rapidly grow. To give you an idea, the table is :
CREATE TABLE [dbo].[Production]
(
[ID] [int] IDENTITY(1,1) NOT NULL,
[OlympusID] [int] NOT NULL,
[DateTimeStamp] [datetime] NOT NULL,
[PackCount] [int] NOT NULL,
[CaseCount] [int] NOT NULL,
[WorkOrderID] [int] NOT NULL,
[State] [bit] NOT NULL,
[RowID] [INT] INT
)
There will be a query that will read counts occuring within a shift (ie 06:00 - 14:00), HOWEVER , since the counts occur every 5 minutes, I will need to include the last row before 06:00 AND the 1st row following 14:00.
Example:
Counts Time
10 05:54
309 06:02
...
..
25 13:58
91 14:02
So , the query will also need to include the above highlighted rows as well.
A solution that I have considered is a self join using rownumber().
I would join them as
select * from (SELECT * FROM ProductionTest where datetime='Some day') A INNER JOIN ProductionTest B ON A.RowID=B.RowID-1
Bear in mind that the table will REALLY grow. Undoubtedly, the table with include indexes(probably on date and RowID).
However, do you see any other better way ?
As always, I will really appreciate your thoughts.
Thanks,
V

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 1

Why so complex?

Write the query in the simplest possible way. If that query form proves, in testing, to have performance problems, consider alternates.

SELECT <column names> FROM ProductionTest WHERE DateTimeStamp >= '<date> 05:55' and DateTimeStamp <= '<date> 14:05'

If the counts are logged every 5 minutes, then that will catch the one before 6AM and the one after 2PM and all in between. If the counts are logged at different intervals, then you'd need to adjust the times

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

stergiazotali SSC-Addicted Points: 421 More actions · Answer 2

Thanks for the reply,

So according to my example :

Counts Time

10 05:54

309 06:02

...

..

25 13:58

91 14:02

The point is that you don't want just records before 06:02 am, you want to include it in your query because you want to find the exact number of counts

YOU WOULD EXPECT to have at 06:00! So my point is not just to include 05:54 and its counts but to use it for calculation in order to estimate the number of counts at 06:00 precisely.

Does it make sense?

Apologies for not explaining

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 3

No, makes less sense.

The query i posted doesn't return just rows from before 6:02. It would return all rows between 5:55 and 14:05 inclusive.

If the intervals are every 5 minutes, why is there a count at 5:54 and another at 6:02? (8 minutes apart)

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

stergiazotali SSC-Addicted Points: 421 More actions · Answer 4

Either you are missing the point or I fail to explain

I will shortly post the script to avoid confusions.

Erland Sommarskog SSC-Insane Points: 24038 More actions · Answer 5

I would do it with a UNION ALL query. One for the interval, and two SELECT TOP(1) queries to get the values just outside the interval. An index on DateTimeStamp is essential. And either clustered, or including Count.

[font="Times New Roman"]Erland Sommarskog, SQL Server MVP, www.sommarskog.se[/font]

stergiazotali SSC-Addicted Points: 421 More actions · Answer 6

Hello Erlang,

Many thanks for your reply. Couple of days ago, I found a solution, check the following please :

with cte_test (DatetimeStamp,Packcount,WorkOrderID,RowID)

AS

(

SELECT *,ROW_NUMBER() OVER (Order by X.Datetimestamp) RowID FROM

(

SELECT * FROM

(

select top(1) DateTimeSTamp,PackCount,WorkOrderID from Production

where DateTimeStamp<@StartTime and WorkOrderID=@WorkOrderID

order by DateTimeStamp DESC

) minusrow

UNION

select DateTimeSTamp,PackCount,WorkOrderID from Production

where DateTimeStamp>@StartTime and WorkOrderID=@WorkOrderID AND DateTimeStamp<=@EndTime

UNION

select * from

(

select top(1) DateTimeSTamp,PackCount,WorkOrderID from Production

where DateTimeStamp>@@EndTime and WorkOrderID=@WorkOrderID

order by DateTimeStamp ASC

) plusrow

) X

)

INSERT INTO @teml

select * from cte_test

SELECT * FROM @temp A INNER JOIN @temp B ON a.RowID=b.RowID-1)

Today, I visited the forum to check if there are any messages and I saw your solution and it seems we both thought exactly the same 🙂

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 7

UNION ALL, not UNION. Otherwise you're paying for an unnecessary distinct sort.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

stergiazotali SSC-Addicted Points: 421 More actions · Answer 8

Thanks Gail,

That's a good spot. Will correct accordingly.

V