Query: Joining a record chronoligical with the next record from the table

Question

Query: Joining a record chronoligical with the next record from the table

tim-923056

Valued Member

Points: 66
More actions
October 15, 2009 at 5:15 pm

#215581

Hi all,
Can someone send me in the right direction regarding the following issue:
I have a table with companies, employees and datetime fields. Each record contains one trip made by the employee. What I am trying to create is the fastest query to calculate the pause between 2 trips.
The table:
- ID (Unique)
- Company_ID
- Employee_ID
- Start_Trip
- Stop_Trip
To calculate the pause between 2 trips I have to join a record with the next record in chronological order. I cannot enter an indexed number in the table because trips are deleted, changed and added not in a chronological order. The number of records in the table is around 3M
What I tried so far:
Select top 10000 t1.Employee_ID, t1.id, t1.Stop_Trip , Next_Start_Trip =
(Select min(t2.Start_Trip) From Trips t2
Where t1.Company_ID = t2.Company_ID
And t1.Employee_ID = t2.Employee_ID
And t1.Stop_Trip < t2.Start_trip)

Execution time 00:07:02
Secondly I tried to create a view with a row number. The downside of this approach is that I cannot create an index on the view because of the Row_Number.
CREATE VIEW [dbo].[vFieldLog_Visits] AS
Select Company_ID, Employee_ID, id, Start_trip, Stop_trip,
ROW_NUMBER() OVER (Partition BY Company_ID,Employee_ID ORDER BY Start_Trip) AS rownum
From dbo.Trips Where Employee_ID != '0000000000'
Select top 10000 currow.Employee_ID, currow.id,
currow.Stop_Trip as Stop_Trip,
nextrow.Start_Trip as Next_Start_Trip
From vFieldLog_Visits currow
Inner Join vFieldLog_Visits nextrow
On currow.Company_ID = nextrow.Company_ID
And currow.Employee_ID = nextrow.Employee_ID
And currow.rownum = (nextrow.rownum - 1)

Execution time 00:02:32
I also tried using a CTE:
WITH Trips_X AS (
SELECT Company_ID,Employee_ID,Start_Trip,Stop_Trip,id,
ROW_NUMBER() OVER (Partition BY Company_ID,Employee_ID ORDER BY Start_Trip) AS rownum
FROM Trips Where Employee_ID != '0000000000')
Select top 10000 currow.Employee_ID, currow.id,
currow.Stop_Trip as Stop_Trip,
nextrow.Start_Trip as Next_Start_Trip
From Trips_X currow
Inner Join Trips_X nextrow
On currow.Company_ID = nextrow.Company_ID
And currow.Employee_ID = nextrow.Employee_ID
And currow.rownum = nextrow.rownum - 1

Execution time: 00:02:48
The queries are ok when you are trying to calculate a huge amount of pauses at once, but if you try to calculate a small amount of pauses then the execution times are not acceptable.
I tried to find a way to add a dynamic row number to the table on which I can create an index. Maybe even a view with an index. But what ever I try there is always something preventing me from achieving this.
Any ideas?

Viewing 15 posts - 1 through 15 (of 17 total)

You must be logged in to reply to this topic. Login to reply

Garadin One Orange Chip Points: 29613 More actions · Answer 1

Thanks for providing what you've already tried. Take a look at the article in my signature on Running Totals. I believe this to be the perfect method for this problem (aside from extraordinarily complex string manipulation methods).

Jeff is working on that article at the moment and there's not a lot of information, but if you'd provide some table schema and sample data (click the link in my signature on how to provide sample data to see how to do so in a manner we can use), I'll help you write the query.

It would be a stored procedure though, this method can't be used as a view. Not sure if that is acceptable in your situation.

Seth Phelabaum

Consistency is only a virtue if you're not a screwup.

Links: How to Post Sample Data[/url] :: Running Totals[/url] :: Tally Table[/url] :: Cross Tabs/Pivots[/url] :: String Concatenation[/url]

tim-923056 Valued Member Points: 66 More actions · Answer 2

Hi, Thank you Garadin for willing to help me!

A stored procedure is no problem.

Sample data:

--===== Create the test table with

CREATE TABLE dbo.Trips2(

[id] [bigint] NOT NULL,

[Company_ID] [int] NOT NULL,

[Employee_ID] [nvarchar](50) NOT NULL,

[Start_Trip] [datetime] NOT NULL,

[Stop_Trip] [datetime] NOT NULL)

--===== Insert the test data into the test table

INSERT INTO dbo.Trips2

(id, Company_ID, Employee_ID, Start_Trip, Stop_Trip)

SELECT '114',5000,'0000000001','Oct 17 2007 1:00AM','Oct 17 2007 1:02AM' UNION ALL

SELECT '137',5000,'0000000001','Oct 17 2007 1:13AM','Oct 17 2007 1:17AM' UNION ALL

SELECT '144',5000,'0000000001','Oct 17 2007 1:21AM','Oct 17 2007 1:57AM' UNION ALL

SELECT '154',5000,'0000000001','Oct 17 2007 2:00AM','Oct 17 2007 2:13AM' UNION ALL

SELECT '155',5000,'0000000001','Oct 17 2007 2:31AM','Oct 17 2007 2:35AM' UNION ALL

SELECT '181',5000,'0000000002','Oct 17 2007 2:36AM','Oct 17 2007 2:53AM' UNION ALL

SELECT '186',5000,'0000000002','Oct 17 2007 3:01AM','Oct 17 2007 3:21AM' UNION ALL

SELECT '196',5000,'0000000003','Oct 17 2007 3:31AM','Oct 17 2007 3:45AM' UNION ALL

SELECT '214',5001,'0000000001','Oct 17 2007 1:00AM','Oct 17 2007 1:02AM' UNION ALL

SELECT '237',5001,'0000000001','Oct 17 2007 1:13AM','Oct 17 2007 1:17AM' UNION ALL

SELECT '244',5001,'0000000001','Oct 17 2007 1:21AM','Oct 17 2007 1:57AM' UNION ALL

SELECT '254',5001,'0000000001','Oct 17 2007 2:00AM','Oct 17 2007 2:13AM' UNION ALL

SELECT '255',5001,'0000000001','Oct 17 2007 2:31AM','Oct 17 2007 2:35AM' UNION ALL

SELECT '281',5001,'0000000001','Oct 17 2007 2:36AM','Oct 17 2007 2:53AM' UNION ALL

SELECT '286',5001,'0000000001','Oct 17 2007 3:01AM','Oct 17 2007 3:21AM' UNION ALL

SELECT '296',5001,'0000000001','Oct 17 2007 3:31AM','Oct 17 2007 3:45AM' UNION ALL

SELECT '314',5001,'0000000001','Oct 17 2007 4:00AM','Oct 17 2007 4:02AM' UNION ALL

SELECT '337',5001,'0000000001','Oct 17 2007 4:13AM','Oct 17 2007 4:17AM' UNION ALL

SELECT '344',5001,'0000000002','Oct 17 2007 4:21AM','Oct 17 2007 4:57AM' UNION ALL

SELECT '354',5001,'0000000002','Oct 17 2007 5:00AM','Oct 17 2007 5:13AM' UNION ALL

SELECT '355',5001,'0000000002','Oct 17 2007 5:31AM','Oct 17 2007 5:35AM' UNION ALL

SELECT '381',5001,'0000000002','Oct 17 2007 5:36AM','Oct 17 2007 5:53AM' UNION ALL

SELECT '386',5001,'0000000003','Oct 17 2007 6:01AM','Oct 17 2007 6:21AM' UNION ALL

SELECT '396',5001,'0000000003','Oct 17 2007 6:31AM','Oct 17 2007 6:45AM'

I am not sure if this will be enough for peromance testing but is will be enough for validating a query. If you need more please let me know!

I am very interessted in your 'running totals' article. I will definitely read it when it is ready.

Matt Miller (4) SSC Guru Points: 124210 More actions · Answer 3

a composite index with company Id, employeeId and start_trip in it should speed it up quite a bit.

----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

tim-923056 Valued Member Points: 66 More actions · Answer 4

You are right! I am sorry for leaving this information out...

In dbo.Trips I have 2 indexes:

- Unique nonclustered index on ID

- Unique clustered index on Company_ID, Employee_ID, Start_trip

Garadin One Orange Chip Points: 29613 More actions · Answer 5

Excellent, thanks Tim, I'll whip something up for you in the morning. Is adding a field to the table an issue? Ordinarily, I'd do without, but it looks like you already have the index I need to do this, and that'd speed it up considerably. I fully understand that you can't rely on a sequence column to stay accurate over time because of changes made to the base data, but this solution would actually re-populate that field each time.

Small point of clarification though, it's Jeff's article, not mine ;-).

Seth Phelabaum

Consistency is only a virtue if you're not a screwup.

Links: How to Post Sample Data[/url] :: Running Totals[/url] :: Tally Table[/url] :: Cross Tabs/Pivots[/url] :: String Concatenation[/url]

tim-923056 Valued Member Points: 66 More actions · Answer 6

No, adding a column is not a problem. What should be concidered is that the amount of io's to this table is pretty huge. Mostly selects and inserts. Not so many updates.

If the extra column would mean a sequential index, will this not become a problem when inserting old data? It looks to me like this would slow down the table enormously. If not... Then this would be a great solution.

Garadin One Orange Chip Points: 29613 More actions · Answer 7

The new column(s) do not need to be added to the index, they just allow you to use this method without creating a temporary table. I've given you two methods below. The first just sequences, the second both sequences and calculates the difference in minutes between the two trips. I used #trips2 and #trips so that you could run this as is to see how it works.

--===== Create the test table with

CREATE TABLE dbo.#Trips2(

[id] [bigint] NOT NULL,

[Company_ID] [int] NOT NULL,

[Employee_ID] [nvarchar](50) NOT NULL,

[Start_Trip] [datetime] NOT NULL,

[Stop_Trip] [datetime] NOT NULL,

[Seq]int)-- Added Column

-- Clustered index that already exists

CREATE CLUSTERED INDEX Quirk ON #Trips2 (Company_ID, Employee_ID, Start_Trip)

--===== Insert the test data into the test table

INSERT INTO dbo.#Trips2