June 20, 2016 at 2:44 am
I have a 680M rows table. This table contains data for each customer. I need to have a rollup table with data for each customer, that means obtain next row for each row for each customer (last row for a customer will have next row value = null) and 3 more values that work as next or previous values, i.e. origin move for a row... all of these are calculated by datetime order.
I am calculating this data and inserting in other table, but I think performance is not the best. I have tested by using LEAD and LAG functions and I found that OUTER APPLY is faster. My huge query is (I resume it in order to clarify)
INSERT INTO destinationTable
SELECT columns
FROM 680MrowsTable currM
OUTER APPLY (
SELECT TOP 1 *
FROM 680MrowsTable nextM
WHERE currM.CustomerId=nextM.CustomerId AND currM.ACTdatetime< nextM.ACTdatetime
ORDER BY ACTdatetime
) nextM
OUTER APPLY (
SELECT TOP 1
*
FROM 680MrowsTable nextMoveM
WHERE currM.CustomerId=nextM.CustomerId AND currM.ACTdatetime< nextMoveM.ACTdatetime
AND --other conditions
ORDER BY ACTdatetime
) nextMoveM
OUTER APPLY (
SELECT TOP 1 *
FROM 680MrowsTable origMoveM
WHERE currM.CustomerId=nextM.CustomerId AND currM.ACTdatetime>= origMoveM.ACTdatetime
AND --other conditions
ORDER BY ACTdatetimeDESC
) origMoveM
OUTER APPLY (
SELECT TOP 1 *
FROM 680MrowsTable PrevMoveM
WHERE currM.CustomerId=nextM.CustomerId AND origMoveM.ACTdatetime> PrevMoveM.ACTdatetime
AND --other conditions
ORDER BY ACTdatetimeDESC
) PrevMoveM
WHERE 1=1
I have read this article: https://www.simple-talk.com/sql/t-sql-programming/calculating-values-within-a-rolling-window-in-transact-sql/
I have tested and when I use a cursor obtaining data from each customer ordering by datetime and storing it in a temp table, and after that I do outer apply and its faster.
Any idea?
June 20, 2016 at 4:54 am
Can you set up some sample data please Celia? It doesn't need to contain all of the columns, just the ones required to help identify the problem domain.
Cheers.
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
June 20, 2016 at 8:37 am
The LAG function is aimed for those kind of problems like yours. Have you saved you query using LAG, if so share it here. Someone may find a better solution with it if you provide some data and objects.
Igor Micev,My blog: www.igormicev.com
June 20, 2016 at 2:12 pm
What's really critical here is how the 680MrowsTable is clustered. Best for what you are doing here would be ( CustomerId, ACTdatetime ). If this is typical of how you process the data in that table, you should consider changing the clustering of the table. If the table is currently clustered on identity, it's about a 90% chance that it should be changed to better match your actual processing.
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
June 21, 2016 at 3:02 am
Igor Micev (6/20/2016)
The LAG function is aimed for those kind of problems like yours. Have you saved you query using LAG, if so share it here. Someone may find a better solution with it if you provide some data and objects.
In my tests LAG function is slower than ROW_NUMBER solution adopted.
Now I am testing with Cluster index.
My next test will be Quirky Update.
See attached an example. I have added an Id column to simplify, but when I get nextRow,PreviousAction,NextAction and OriginAction I get other fields of the action (datetime,details,comments,etc), not Id.
nextRow is the next row of this CustomerId, each Action is identified because has Action=Subaction. In this example Actions 2,8 and 12 are not real action, so when I get previous action I need to obtain previous row for this customer that has Action=Subaction. Same for NextAction; originAction is for rows with Action<>Subaction the row with same Action and with Action=Subaction.
I hope you can understand my example.
<style type="text/css">
table.tableizer-table {
font-size: 12px;
border: 1px solid #CCC;
font-family: Arial, Helvetica, sans-serif;
}
.tableizer-table td {
padding: 4px;
margin: 3px;
border: 1px solid #CCC;
}
.tableizer-table th {
background-color: #104E8B;
color: #FFF;
font-weight: bold;
}
.tableizer-table th2 {
background-color: #ffffc7;
color: #FFF;
font-weight: bold;
}
</style>
<table class="tableizer-table">
<thead><tr class="tableizer-firstrow"><th>ActionId</th><th>CustomerId</th><th>Action</th><th>Subaction</th><th>ACTdatetime</th><th>NextRow</th><th>PreviousAction</th><th>NextAction</th><th>OriginAction</th></tr></thead><tbody>
<tr><td>1</td><td>1</td><td>L</td><td>L</td><td>2013-01-22 12:38:00.000</td><td>2</td><td>NULL</td><td>3</td><td>1</td></tr>
<tr><td>2</td><td>1</td><th>L</th><th>C</th><td>2013-01-22 12:39:00.000</td><td>3</td><td>1</td><td>3</td><td>1</td></tr>
<tr><td>3</td><td>1</td><td>D</td><td>D</td><td>2013-03-11 18:17:00.000</td><td>4</td><td>1</td><td>4</td><td>3</td></tr>
<tr><td>4</td><td>1</td><td>L</td><td>L</td><td>2013-03-16 10:25:00.000</td><td>5</td><td>3</td><td>5</td><td>4</td></tr>
<tr><td>5</td><td>1</td><td>D</td><td>D</td><td>2013-03-20 18:05:00.000</td><td>6</td><td>4</td><td>NULL</td><td>5</td></tr>
<tr><td>6</td><td>2</td><td>G</td><td>G</td><td>2013-04-01 22:15:00.000</td><td>7</td><td>NULL</td><td>7</td><td>6</td></tr>
<tr><td>7</td><td>2</td><td>G</td><td>G</td><td>2013-04-02 11:33:00.000</td><td>8</td><td>6</td><td>9</td><td>7</td></tr>
<tr><td>8</td><td>2</td><th>G</th><th>C</th><td>2013-04-02 18:29:00.000</td><td>9</td><td>6</td><td>9</td><td>7</td></tr>
<tr><td>9</td><td>2</td><td>G</td><td>G</td><td>2013-04-10 09:24:00.000</td><td>10</td><td>7</td><td>10</td><td>9</td></tr>
<tr><td>10</td><td>2</td><td>G</td><td>G</td><td>2013-04-12 20:15:00.000</td><td>11</td><td>9</td><td>11</td><td>10</td></tr>
<tr><td>11</td><td>2</td><td>L</td><td>L</td><td>2013-04-17 17:45:00.000</td><td>12</td><td>10</td><td>NULL</td><td>11</td></tr>
<tr><td>12</td><td>2</td><th>L</th><th>C</th><td>2013-04-17 17:46:00.000</td><td>13</td><td>10</td><td>NULL</td><td>11</td></tr>
</tbody></table>
June 21, 2016 at 3:21 am
This format is preferable because folks can begin working with it straight away:
CREATE TABLE #Temp (ActionId INT NULL, CustomerId INT NULL, Action CHAR(1) NULL, Subaction CHAR(1) NULL, ACTdatetime DATETIME NULL, NextRow INT NULL, PreviousAction INT NULL, NextAction INT NULL, OriginAction INT NULL)
INSERT INTO #Temp (ActionId, CustomerId, Action, Subaction, ACTdatetime, NextRow, PreviousAction, NextAction, OriginAction) VALUES
(1,1,'L','L','2013-01-22 12:38:00.000',2,NULL,3,1),
(1,1,'L','L','2013-01-22 12:38:00.000',2,NULL,3,1)
etc
I recommend you post two tables like this - your source table, and the exact results you expect from it.
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
Viewing 6 posts - 1 through 5 (of 5 total)
You must be logged in to reply to this topic. Login to reply