March 6, 2014 at 4:45 pm
I have a snapshot table of about 15 million records in the form of:
InvoiceIDLineItemIDSnapshotDateAmount
1 1 20140101 12
1 2 20140102 14
1 3 20140103 17
2 1 20140101 10
2 2 20140102 5
1 2 20140105 15
1 3 20140105 20
I want to create an additional column called Current as shown below:
InvoiceIDLineItemIDSnapshotDateAmount Current
1 1 20140101 12 1
1 2 20140102 14 0
1 3 20140103 17 0
2 1 20140101 10 1
2 2 20140102 5 1
1 2 20140105 15 1
1 3 20140105 20 1
How can we write a query to achieve this while keeping in mind:
- We do not want to do unnecessary record lookups and Updates
- We only update records that corresponds to new entries. For example, we should not touch the record for InvoiceID = 2 in the above example
Thanks
March 6, 2014 at 4:54 pm
don't try to do it in the table. create a view or query that uses row_number() OVER (Partition by...order by..)
if you try and do it in the table itself, you will run into a huge performance hit in your trigger, which has to rearrange data with any insert update or delete.
so just don't do that.
use that as a sub query, and whatever has the lowest/highest number gets the 'Current'd description as a case statement.
wild guess:
SELECT CASE WHEN RW = 1 THEN 'Current' ELSE '' END AS isCurrentIndicator,
*
FROM (
SELECT
ROW_NUMBER() OVER (PARTITION BY InvoiceID ORDER BY InvoiceID,LineItemID) AS RW,
InvoiceID,
LineItemID,
SnapshotDate,
Amount
FROM YourTable) SubQueryAlias
ORDER BY InvoiceID,LineItemID
Lowell
March 6, 2014 at 5:11 pm
I'd definitely add a bit column to the table. Your data and your statements seem contradictory, but I think when a new row is added that becomes current, you could just update the old row to turn off the current flag. Edit: the appropriate index should make this update relatively painless.
No way I personally would use ROW_NUMBER() or any other such mechanism, since I wouldn't want to risk sorting 15M+ rows, or any significant part thereof, every time I went to read them.
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
March 6, 2014 at 5:26 pm
I also don't like to use Partitions and Ranks etc due to the sorting overhead
The default value for the new column will be set to 1 thus will allow new records to become active
But what is the best way to detect the old records. I believe I still have to go through all the records via Snapshot_Date. Still can't figure out the best way to extract older records
March 6, 2014 at 5:41 pm
rka (3/6/2014)
But what is the best way to detect the old records. I believe I still have to go through all the records via Snapshot_Date. Still can't figure out the best way to extract older records
I have no idea. You haven't given any specific rules on how to determine what is current and what no longer is.
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
March 6, 2014 at 5:44 pm
I'd take this as a two phase approach.
First, you need to get your current data into the format you want. No matter what you do, you're going to do a massive table update at least once. I recommend you port it to a new table and swap the table objects if you have the space.
To find the correct current values, you need to use the ROW_NUMBER() technique described by Lowell to activate the most recent invoice/line item combination.
Then, you need to include an INSERT trigger for further updates. Said trigger needs to make sure the same invoice/lineitem is distinct in each inclusion. UPDATE the table to turn off any existing invoiceID/lineitem items with current = 1 (as Scott mentioned, use an index to easily locate these) that exist before allowing the trigger to finish the insert.
Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.
For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA
March 7, 2014 at 3:41 am
You might want to take a look at this article I wrote in SimpleTalk. It goes through how to retrieve versioned data[/url] from the table. Assuming you can identify the latest record based on that date column, the techniques outlined here should work for you, and should perform well (we had millions and millions of records and performance was great) as long as you build out your indexes appropriately.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
Viewing 7 posts - 1 through 6 (of 6 total)
You must be logged in to reply to this topic. Login to reply