Query: Joining a record chronoligical with the next record from the table

  • Hi all,

    Can someone send me in the right direction regarding the following issue:

    I have a table with companies, employees and datetime fields. Each record contains one trip made by the employee. What I am trying to create is the fastest query to calculate the pause between 2 trips.

    The table:

    - ID (Unique)

    - Company_ID

    - Employee_ID

    - Start_Trip

    - Stop_Trip

    To calculate the pause between 2 trips I have to join a record with the next record in chronological order. I cannot enter an indexed number in the table because trips are deleted, changed and added not in a chronological order. The number of records in the table is around 3M

    What I tried so far:

    Select top 10000 t1.Employee_ID, t1.id, t1.Stop_Trip , Next_Start_Trip =

    (Select min(t2.Start_Trip) From Trips t2

    Where t1.Company_ID = t2.Company_ID

    And t1.Employee_ID = t2.Employee_ID

    And t1.Stop_Trip < t2.Start_trip)

    Execution time 00:07:02

    Secondly I tried to create a view with a row number. The downside of this approach is that I cannot create an index on the view because of the Row_Number.

    CREATE VIEW [dbo].[vFieldLog_Visits] AS

    Select Company_ID, Employee_ID, id, Start_trip, Stop_trip,

    ROW_NUMBER() OVER (Partition BY Company_ID,Employee_ID ORDER BY Start_Trip) AS rownum

    From dbo.Trips Where Employee_ID != '0000000000'

    Select top 10000 currow.Employee_ID, currow.id,

    currow.Stop_Trip as Stop_Trip,

    nextrow.Start_Trip as Next_Start_Trip

    From vFieldLog_Visits currow

    Inner Join vFieldLog_Visits nextrow

    On currow.Company_ID = nextrow.Company_ID

    And currow.Employee_ID = nextrow.Employee_ID

    And currow.rownum = (nextrow.rownum - 1)

    Execution time 00:02:32

    I also tried using a CTE:

    WITH Trips_X AS (

    SELECT Company_ID,Employee_ID,Start_Trip,Stop_Trip,id,

    ROW_NUMBER() OVER (Partition BY Company_ID,Employee_ID ORDER BY Start_Trip) AS rownum

    FROM Trips Where Employee_ID != '0000000000')

    Select top 10000 currow.Employee_ID, currow.id,

    currow.Stop_Trip as Stop_Trip,

    nextrow.Start_Trip as Next_Start_Trip

    From Trips_X currow

    Inner Join Trips_X nextrow

    On currow.Company_ID = nextrow.Company_ID

    And currow.Employee_ID = nextrow.Employee_ID

    And currow.rownum = nextrow.rownum - 1

    Execution time: 00:02:48

    The queries are ok when you are trying to calculate a huge amount of pauses at once, but if you try to calculate a small amount of pauses then the execution times are not acceptable.

    I tried to find a way to add a dynamic row number to the table on which I can create an index. Maybe even a view with an index. But what ever I try there is always something preventing me from achieving this.

    Any ideas?

  • Thanks for providing what you've already tried. Take a look at the article in my signature on Running Totals. I believe this to be the perfect method for this problem (aside from extraordinarily complex string manipulation methods).

    Jeff is working on that article at the moment and there's not a lot of information, but if you'd provide some table schema and sample data (click the link in my signature on how to provide sample data to see how to do so in a manner we can use), I'll help you write the query.

    It would be a stored procedure though, this method can't be used as a view. Not sure if that is acceptable in your situation.

    Seth Phelabaum


    Consistency is only a virtue if you're not a screwup. 😉

    Links: How to Post Sample Data[/url] :: Running Totals[/url] :: Tally Table[/url] :: Cross Tabs/Pivots[/url] :: String Concatenation[/url]

  • Hi, Thank you Garadin for willing to help me!

    A stored procedure is no problem.

    Sample data:

    --===== Create the test table with

    CREATE TABLE dbo.Trips2(

    [id] [bigint] NOT NULL,

    [Company_ID] [int] NOT NULL,

    [Employee_ID] [nvarchar](50) NOT NULL,

    [Start_Trip] [datetime] NOT NULL,

    [Stop_Trip] [datetime] NOT NULL)

    --===== Insert the test data into the test table

    INSERT INTO dbo.Trips2

    (id, Company_ID, Employee_ID, Start_Trip, Stop_Trip)

    SELECT '114',5000,'0000000001','Oct 17 2007 1:00AM','Oct 17 2007 1:02AM' UNION ALL

    SELECT '137',5000,'0000000001','Oct 17 2007 1:13AM','Oct 17 2007 1:17AM' UNION ALL

    SELECT '144',5000,'0000000001','Oct 17 2007 1:21AM','Oct 17 2007 1:57AM' UNION ALL

    SELECT '154',5000,'0000000001','Oct 17 2007 2:00AM','Oct 17 2007 2:13AM' UNION ALL

    SELECT '155',5000,'0000000001','Oct 17 2007 2:31AM','Oct 17 2007 2:35AM' UNION ALL

    SELECT '181',5000,'0000000002','Oct 17 2007 2:36AM','Oct 17 2007 2:53AM' UNION ALL

    SELECT '186',5000,'0000000002','Oct 17 2007 3:01AM','Oct 17 2007 3:21AM' UNION ALL

    SELECT '196',5000,'0000000003','Oct 17 2007 3:31AM','Oct 17 2007 3:45AM' UNION ALL

    SELECT '214',5001,'0000000001','Oct 17 2007 1:00AM','Oct 17 2007 1:02AM' UNION ALL

    SELECT '237',5001,'0000000001','Oct 17 2007 1:13AM','Oct 17 2007 1:17AM' UNION ALL

    SELECT '244',5001,'0000000001','Oct 17 2007 1:21AM','Oct 17 2007 1:57AM' UNION ALL

    SELECT '254',5001,'0000000001','Oct 17 2007 2:00AM','Oct 17 2007 2:13AM' UNION ALL

    SELECT '255',5001,'0000000001','Oct 17 2007 2:31AM','Oct 17 2007 2:35AM' UNION ALL

    SELECT '281',5001,'0000000001','Oct 17 2007 2:36AM','Oct 17 2007 2:53AM' UNION ALL

    SELECT '286',5001,'0000000001','Oct 17 2007 3:01AM','Oct 17 2007 3:21AM' UNION ALL

    SELECT '296',5001,'0000000001','Oct 17 2007 3:31AM','Oct 17 2007 3:45AM' UNION ALL

    SELECT '314',5001,'0000000001','Oct 17 2007 4:00AM','Oct 17 2007 4:02AM' UNION ALL

    SELECT '337',5001,'0000000001','Oct 17 2007 4:13AM','Oct 17 2007 4:17AM' UNION ALL

    SELECT '344',5001,'0000000002','Oct 17 2007 4:21AM','Oct 17 2007 4:57AM' UNION ALL

    SELECT '354',5001,'0000000002','Oct 17 2007 5:00AM','Oct 17 2007 5:13AM' UNION ALL

    SELECT '355',5001,'0000000002','Oct 17 2007 5:31AM','Oct 17 2007 5:35AM' UNION ALL

    SELECT '381',5001,'0000000002','Oct 17 2007 5:36AM','Oct 17 2007 5:53AM' UNION ALL

    SELECT '386',5001,'0000000003','Oct 17 2007 6:01AM','Oct 17 2007 6:21AM' UNION ALL

    SELECT '396',5001,'0000000003','Oct 17 2007 6:31AM','Oct 17 2007 6:45AM'

    I am not sure if this will be enough for peromance testing but is will be enough for validating a query. If you need more please let me know!

    I am very interessted in your 'running totals' article. I will definitely read it when it is ready.

  • a composite index with company Id, employeeId and start_trip in it should speed it up quite a bit.

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • You are right! I am sorry for leaving this information out...

    In dbo.Trips I have 2 indexes:

    - Unique nonclustered index on ID

    - Unique clustered index on Company_ID, Employee_ID, Start_trip

  • Excellent, thanks Tim, I'll whip something up for you in the morning. Is adding a field to the table an issue? Ordinarily, I'd do without, but it looks like you already have the index I need to do this, and that'd speed it up considerably. I fully understand that you can't rely on a sequence column to stay accurate over time because of changes made to the base data, but this solution would actually re-populate that field each time.

    Small point of clarification though, it's Jeff's article, not mine ;-).

    Seth Phelabaum


    Consistency is only a virtue if you're not a screwup. 😉

    Links: How to Post Sample Data[/url] :: Running Totals[/url] :: Tally Table[/url] :: Cross Tabs/Pivots[/url] :: String Concatenation[/url]

  • No, adding a column is not a problem. What should be concidered is that the amount of io's to this table is pretty huge. Mostly selects and inserts. Not so many updates.

    If the extra column would mean a sequential index, will this not become a problem when inserting old data? It looks to me like this would slow down the table enormously. If not... Then this would be a great solution.

  • The new column(s) do not need to be added to the index, they just allow you to use this method without creating a temporary table. I've given you two methods below. The first just sequences, the second both sequences and calculates the difference in minutes between the two trips. I used #trips2 and #trips so that you could run this as is to see how it works.

    --===== Create the test table with

    CREATE TABLE dbo.#Trips2(

    [id] [bigint] NOT NULL,

    [Company_ID] [int] NOT NULL,

    [Employee_ID] [nvarchar](50) NOT NULL,

    [Start_Trip] [datetime] NOT NULL,

    [Stop_Trip] [datetime] NOT NULL,

    [Seq]int)-- Added Column

    -- Clustered index that already exists

    CREATE CLUSTERED INDEX Quirk ON #Trips2 (Company_ID, Employee_ID, Start_Trip)

    --===== Insert the test data into the test table

    INSERT INTO dbo.#Trips2

    (id, Company_ID, Employee_ID, Start_Trip, Stop_Trip)

    SELECT '114',5000,'0000000001','Oct 17 2007 1:00AM','Oct 17 2007 1:02AM' UNION ALL

    SELECT '137',5000,'0000000001','Oct 17 2007 1:13AM','Oct 17 2007 1:17AM' UNION ALL

    SELECT '144',5000,'0000000001','Oct 17 2007 1:21AM','Oct 17 2007 1:57AM' UNION ALL

    SELECT '154',5000,'0000000001','Oct 17 2007 2:00AM','Oct 17 2007 2:13AM' UNION ALL

    SELECT '155',5000,'0000000001','Oct 17 2007 2:31AM','Oct 17 2007 2:35AM' UNION ALL

    SELECT '181',5000,'0000000002','Oct 17 2007 2:36AM','Oct 17 2007 2:53AM' UNION ALL

    SELECT '186',5000,'0000000002','Oct 17 2007 3:01AM','Oct 17 2007 3:21AM' UNION ALL

    SELECT '196',5000,'0000000003','Oct 17 2007 3:31AM','Oct 17 2007 3:45AM' UNION ALL

    SELECT '214',5001,'0000000001','Oct 17 2007 1:00AM','Oct 17 2007 1:02AM' UNION ALL

    SELECT '237',5001,'0000000001','Oct 17 2007 1:13AM','Oct 17 2007 1:17AM' UNION ALL

    SELECT '244',5001,'0000000001','Oct 17 2007 1:21AM','Oct 17 2007 1:57AM' UNION ALL

    SELECT '254',5001,'0000000001','Oct 17 2007 2:00AM','Oct 17 2007 2:13AM' UNION ALL

    SELECT '255',5001,'0000000001','Oct 17 2007 2:31AM','Oct 17 2007 2:35AM' UNION ALL

    SELECT '281',5001,'0000000001','Oct 17 2007 2:36AM','Oct 17 2007 2:53AM' UNION ALL

    SELECT '286',5001,'0000000001','Oct 17 2007 3:01AM','Oct 17 2007 3:21AM' UNION ALL

    SELECT '296',5001,'0000000001','Oct 17 2007 3:31AM','Oct 17 2007 3:45AM' UNION ALL

    SELECT '314',5001,'0000000001','Oct 17 2007 4:00AM','Oct 17 2007 4:02AM' UNION ALL

    SELECT '337',5001,'0000000001','Oct 17 2007 4:13AM','Oct 17 2007 4:17AM' UNION ALL

    SELECT '344',5001,'0000000002','Oct 17 2007 4:21AM','Oct 17 2007 4:57AM' UNION ALL

    SELECT '354',5001,'0000000002','Oct 17 2007 5:00AM','Oct 17 2007 5:13AM' UNION ALL

    SELECT '355',5001,'0000000002','Oct 17 2007 5:31AM','Oct 17 2007 5:35AM' UNION ALL

    SELECT '381',5001,'0000000002','Oct 17 2007 5:36AM','Oct 17 2007 5:53AM' UNION ALL

    SELECT '386',5001,'0000000003','Oct 17 2007 6:01AM','Oct 17 2007 6:21AM' UNION ALL

    SELECT '396',5001,'0000000003','Oct 17 2007 6:31AM','Oct 17 2007 6:45AM'

    --------------- Option 1 : Sequence the Trips ------------

    DECLARE @Seq int,

    @CompanyID int,

    @EmployeeID varchar(20)

    UPDATE #Trips2

    SET @Seq = Seq = CASE WHEN Company_ID = @CompanyID AND Employee_ID = @EmployeeID THEN @Seq + 1 ELSE 1 END,

    @CompanyID = Company_ID,

    @EmployeeID = Employee_ID

    FROM #Trips2 OPTION (MAXDOP 1)

    -- Show the results

    SELECT * FROM #Trips2

    GO

    --===== Create the test table with

    CREATE TABLE dbo.#Trips(

    [id] [bigint] NOT NULL,

    [Company_ID] [int] NOT NULL,

    [Employee_ID] [nvarchar](50) NOT NULL,

    [Start_Trip] [datetime] NOT NULL,

    [Stop_Trip] [datetime] NOT NULL,

    [Seq]int,-- Added Column

    [Diff]int)-- Added Column

    -- Clustered index that already exists

    CREATE CLUSTERED INDEX Quirk ON #Trips (Company_ID, Employee_ID, Start_Trip)

    --===== Insert the test data into the test table

    INSERT INTO dbo.#Trips

    (id, Company_ID, Employee_ID, Start_Trip, Stop_Trip)

    SELECT '114',5000,'0000000001','Oct 17 2007 1:00AM','Oct 17 2007 1:02AM' UNION ALL

    SELECT '137',5000,'0000000001','Oct 17 2007 1:13AM','Oct 17 2007 1:17AM' UNION ALL

    SELECT '144',5000,'0000000001','Oct 17 2007 1:21AM','Oct 17 2007 1:57AM' UNION ALL

    SELECT '154',5000,'0000000001','Oct 17 2007 2:00AM','Oct 17 2007 2:13AM' UNION ALL

    SELECT '155',5000,'0000000001','Oct 17 2007 2:31AM','Oct 17 2007 2:35AM' UNION ALL

    SELECT '181',5000,'0000000002','Oct 17 2007 2:36AM','Oct 17 2007 2:53AM' UNION ALL

    SELECT '186',5000,'0000000002','Oct 17 2007 3:01AM','Oct 17 2007 3:21AM' UNION ALL

    SELECT '196',5000,'0000000003','Oct 17 2007 3:31AM','Oct 17 2007 3:45AM' UNION ALL

    SELECT '214',5001,'0000000001','Oct 17 2007 1:00AM','Oct 17 2007 1:02AM' UNION ALL

    SELECT '237',5001,'0000000001','Oct 17 2007 1:13AM','Oct 17 2007 1:17AM' UNION ALL

    SELECT '244',5001,'0000000001','Oct 17 2007 1:21AM','Oct 17 2007 1:57AM' UNION ALL

    SELECT '254',5001,'0000000001','Oct 17 2007 2:00AM','Oct 17 2007 2:13AM' UNION ALL

    SELECT '255',5001,'0000000001','Oct 17 2007 2:31AM','Oct 17 2007 2:35AM' UNION ALL

    SELECT '281',5001,'0000000001','Oct 17 2007 2:36AM','Oct 17 2007 2:53AM' UNION ALL

    SELECT '286',5001,'0000000001','Oct 17 2007 3:01AM','Oct 17 2007 3:21AM' UNION ALL

    SELECT '296',5001,'0000000001','Oct 17 2007 3:31AM','Oct 17 2007 3:45AM' UNION ALL

    SELECT '314',5001,'0000000001','Oct 17 2007 4:00AM','Oct 17 2007 4:02AM' UNION ALL

    SELECT '337',5001,'0000000001','Oct 17 2007 4:13AM','Oct 17 2007 4:17AM' UNION ALL

    SELECT '344',5001,'0000000002','Oct 17 2007 4:21AM','Oct 17 2007 4:57AM' UNION ALL

    SELECT '354',5001,'0000000002','Oct 17 2007 5:00AM','Oct 17 2007 5:13AM' UNION ALL

    SELECT '355',5001,'0000000002','Oct 17 2007 5:31AM','Oct 17 2007 5:35AM' UNION ALL

    SELECT '381',5001,'0000000002','Oct 17 2007 5:36AM','Oct 17 2007 5:53AM' UNION ALL

    SELECT '386',5001,'0000000003','Oct 17 2007 6:01AM','Oct 17 2007 6:21AM' UNION ALL

    SELECT '396',5001,'0000000003','Oct 17 2007 6:31AM','Oct 17 2007 6:45AM'

    ---------------- Option 2 : Seq + Calc the Time Diff -------------

    DECLARE @Seqint,

    @CompanyIDint,

    @EmployeeID varchar(20),

    @PrevEnddatetime,

    @Diffint

    UPDATE #Trips

    SET @Seq = Seq = CASE WHEN Company_ID = @CompanyID AND Employee_ID = @EmployeeID THEN @Seq + 1 ELSE 1 END,

    @Diff = Diff = CASE WHEN Company_ID = @CompanyID AND Employee_ID = @EmployeeID THEN DATEDIFF(n,@PrevEnd,Start_Trip) ELSE NULL END,

    @CompanyID = Company_ID,

    @EmployeeID = Employee_ID,

    @PrevEnd = Stop_Trip

    FROM #Trips OPTION (MAXDOP 1)

    -- View the Results

    SELECT * FROM #Trips

    Seth Phelabaum


    Consistency is only a virtue if you're not a screwup. 😉

    Links: How to Post Sample Data[/url] :: Running Totals[/url] :: Tally Table[/url] :: Cross Tabs/Pivots[/url] :: String Concatenation[/url]

  • Thanks a lot. I am going to test this tomorrow and post the results.

    Regards, Tim

  • :unsure: Sorry if I am mistaking but, this way it looks like I need to recalculate and update 30 Million records each time before I can query the needed time differences. The sequence of the records can change any moment records are not update and inserted chronological. I doubt if 30 million updates will be the fastest way to handle this issue.

    For example:

    To query the sum of all time differences per employee (15.000 employees) of 10 days ago:

    Select Employee_ID, Sum(TimeDiff)

    From (

    Select t1.Employee_ID, Datediff("ss", t1.Stop_Trip,

    (Select min(t2.Start_Trip) From Trips t2

    Where t1.Company_ID = t2.Company_ID

    And t1.Employee_ID = t2.Employee_ID

    And t1.Stop_Trip < t2.Start_trip)) as TimeDiff

    From Trips t1

    Where cast(cast(t1.Start_Trip as integer) as datetime)

    = cast(cast(DateAdd("dd", -10, getdate()) as integer) as datetime)

    ) As RefTable

    Group By Employee_ID

    This will work but it takes me over 3 minutes to get the result.

    If I had a sequential index, like you adviced, it will take me a few seconds, but the problem is maintaining this column. When users are changing the order of the records in high rate the sequential index should be recalculated. This will slow down the database.

    Regards,

    Tim

  • If the total of the timediffs is the intended result - why go through all of that trouble? In every case - that's going to be:

    Max(stop time) - Min(start time) -sum(stop - start)

    forget the sequential index in that case. what I WOULD go for is a calculated column storing the datediff of stop and start times. persist that, and you should have a fairly fast option with an index.

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Hi there,

    trying this out on around 9 million rows and just updating those records that I needed to execution completed in a couple of seconds.

    DECLARE @Seq int,

    @CompanyID int,

    @EmployeeID varchar(20),

    @PrevEnd datetime,

    @Diff int

    UPDATE Trips

    SET @Seq = Seq = CASE WHEN Company_ID = @CompanyID AND Employee_ID = @EmployeeID THEN @Seq + 1 ELSE 1 END,

    @Diff = Diff = CASE WHEN Company_ID = @CompanyID AND Employee_ID = @EmployeeID THEN DATEDIFF(n,@PrevEnd,Start_Trip) ELSE NULL END,

    @CompanyID = Company_ID,

    @EmployeeID = Employee_ID,

    @PrevEnd = Stop_Trip

    FROM Trips

    WHERE

    Start_Trip >= DATEADD(DD, 0, DATEDIFF(DD, 0, DATEADD(DD, -10, GETDATE())))

    AND Start_Trip < DATEADD(DD, 0, DATEDIFF(DD, 0, DATEADD(DD, -9, GETDATE())))

    OPTION (MAXDOP 1)

    SELECT

    Employee_ID, SUM(Diff)

    FROM Trips

    WHERE

    Start_Trip >= DATEADD(DD, 0, DATEDIFF(DD, 0, DATEADD(DD, -10, GETDATE())))

    AND Start_Trip < DATEADD(DD, 0, DATEDIFF(DD, 0, DATEADD(DD, -9, GETDATE())))

    GROUP BY Employee_ID

    Updating the entire c. 9,000,000 record table took under 30 seconds on my desktop machine.

    Allister

  • @matt-2: The query was ment as an example. Maybe it was not a good example. The application which will report the information from the database will show the details per record. Each record needs to show the time diff between the current row and the next row.

  • @Allister: Are you able to make this query without the need for the new columns sec or diff? If I do need to recalculate these column each time I make the query, I do not need to store them right?

  • Updating the entire c. 9,000,000 record table took under 30 seconds on my desktop machine.

    Did you really update 9M records in 30 seconds? That is fast. How come? When I try to update our tables simply changing a true or false will take more than 5 minutes for 9M records. Does it have somethink to do with the "OPTION (MAXDOP 1)" ?

    Regards,

    Tim

Viewing 15 posts - 1 through 15 (of 17 total)

You must be logged in to reply to this topic. Login to reply