January 14, 2009 at 8:02 am
Username may be sufficient for the original authors problem, but for mine it really isn't. I often can't guarantee that I have anything that is uniquely identifiable that I am inserting into my table. So, adding a column for the SourceID is really my only option if I want to avoid doing this in a loop, which of course I do want to avoid.
Scott
January 14, 2009 at 8:17 am
Ahh, sorry, I missed that.
[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]
January 14, 2009 at 11:15 am
Keep in mind that if you the Id you are outputting is an identity column, specifying an ORDER BY during the insert will give you a predictable order for the ID's (and which rows they belong to). Would allow you to correlate them back to your original tables....
----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
January 14, 2009 at 11:24 am
The other thing you can do is join the inserted data back to the source data, in a second query.
For example:
create table #T (
MyTableID int primary key);
insert into dbo.MyTable (columns)
output inserted.ID into #T(MyTableID)
select columns
from dbo.MySourceTable
where columns = values;
select *
from #T
inner join dbo.MyTable
on MyTableID = MyTable.ID
inner join dbo.MySourceTable
on MyTable.columns = MySourceTable.columns
Okay, maybe that's making it a little too generic, but you should be able to get the idea from it.
The join to the temp table that holds the output values means it's only selecting the new rows, and the join to the source table on ALL columns inserted should get you a one-to-one relationship between the target table and the source table (unless there are duplicate rows in the source table), and then you have the new IDs and the old IDs in one recordset.
Does that help?
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
January 22, 2009 at 10:33 am
i guess what i am say is that it would be helpfull to be able to search only those questions / topics that have been answered corrctly and solved the user's problem. might reduce redundancy in questions asked. just a thought.
January 23, 2009 at 7:50 pm
Hmm, I hadn't thought of that.
[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]
January 23, 2009 at 7:54 pm
bagofbirds (1/8/2009)
Fantastic. Thanks for everyone who took the time to think about, and reply to, my question. Thanks especially to RBarryYoung who's solution elegantly solves the whole problem. i had no idea about the Output thingy. it will come in very handy.-drew
Ummm... Drew... your last name wouldn't be "Brooks" would it?
--Jeff Moden
Change is inevitable... Change for the better is not.
January 23, 2009 at 7:56 pm
RBarryYoung (1/13/2009)
Abhijit More (1/13/2009)
there is another way which you can use..1. Take the MAX(userID) before batch insert. this will give you the last UserID before batch insert
2. Perform batch insert.
3. Retrive those records which are greater than MAX(userID) (step 1).
this will give you the identity for all newly inserted UserID after batch insert.
Which will not work if someone else inserts records between step 1 and step 3.
Heh... beat me to it. And, it shows that we do, in fact, end up talking about other things. 😉
--Jeff Moden
Change is inevitable... Change for the better is not.
January 23, 2009 at 8:00 pm
Scott D (1/14/2009)
I often can't guarantee that I have anything that is uniquely identifiable that I am inserting into my table.
Not trying to be a smart guy here... just trying to point out that THAT is a fundamental problem in a database. You don't really have a "table" unless each row can be uniquely and certainly identified.
And, now, I'm curious... what do you use such a table for?
--Jeff Moden
Change is inevitable... Change for the better is not.
June 19, 2009 at 7:23 am
Jeff, I can't say that I completely agree. I've run into this problem a number of times now because I have a bunch of data to import that doesn't necessarily have a unique identifier. An example:
An import of Contact records from a 3rd party system. May include multiple records for the same Contact if they updated their profile. In theory perhaps we can rely on a Timestamp value to be our unique differentiator, but in practice we don't want to make this assumption since it is coming from another system.
Many times that you have a batch of records that may cover a significant period of time, and you want to store the history, this problem will crop up. It's been frustrating for me lately as I hesitate to modify table structures just to handle batch inserts like this, and I definitely want to avoid an iterative approach with this volume of data.
I can't think of a legitimate reason why SQL Server wouldn't allow you to link between the source and target tables even on columns that aren't being inserted but as mentioned it's not allowed. No one seems to have a good solution for this common problem because of this limitation.
June 20, 2009 at 10:24 am
RBarryYoung (1/14/2009)
Yep, it's true, the OUTPUT clause cannot pull from your input sources, only the data that you are already outputting.When I've been in your situation, I have always added a SourceID column to the table to get my correlation.
Hey Barry,
The limitation of only being able to OUTPUT columns that appear in the INSERT statement is an annoyance - but logical if one considers which columns make it through the plan to where the INSERT happens. The situation is different with the UPDATE and DELETE statements - you can happily OUTPUT columns from any table referenced anywhere in the FROM clause.
As far as adding audit information is concerned (recording source and destination IDs among other things) I generally use SSIS for this sort of thing - it includes handy stuff for just this sort of thing!
Paul
Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi
June 20, 2009 at 3:55 pm
mpacifico (6/19/2009)
Jeff, I can't say that I completely agree. I've run into this problem a number of times now because I have a bunch of data to import that doesn't necessarily have a unique identifier. An example:An import of Contact records from a 3rd party system. May include multiple records for the same Contact if they updated their profile. In theory perhaps we can rely on a Timestamp value to be our unique differentiator, but in practice we don't want to make this assumption since it is coming from another system.
Many times that you have a batch of records that may cover a significant period of time, and you want to store the history, this problem will crop up. It's been frustrating for me lately as I hesitate to modify table structures just to handle batch inserts like this, and I definitely want to avoid an iterative approach with this volume of data.
I can't think of a legitimate reason why SQL Server wouldn't allow you to link between the source and target tables even on columns that aren't being inserted but as mentioned it's not allowed. No one seems to have a good solution for this common problem because of this limitation.
I don't see how any of that prevents the unique identification of any given row. You may have problems determining what the latest row is for any given set of duplicates, but it shouldn't prevent uniqueness.
So far as determining what the latest update for any given row, that's up to you and your data providers. I tend to find other data providers when they won't (after negotiation) provide a LastModifiedOn column or, at the very least, name the files they deliver with some date/time data embedded in the name of the file. In the cases where they will won't provide such important information in one form or another and they happen to be a critical source of information, then I make sure I append the correct date and time to either the file name (if provided by file) or I make a non-modfiable (at the row level) history table with dates and times and some form of unique identifier to act as a tie breaker to contain the rows as they were received.
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 12 posts - 16 through 26 (of 26 total)
You must be logged in to reply to this topic. Login to reply