March 27, 2008 at 1:17 pm
Hello all. I'm having some trouble coming up with a solution that works the way I want it to.
I'm importing a load of data from a text file into a temporary table (called dts_SWInventoryTemp).
The primary key on this table is ComputerName.
The final destination of this data is tbl_SWInventory - it is identical to the temporary table in structure.
Now that I've got my data in the temporary table, this is what I would like to do:
Query the temporary table against the final table, to see if records already exist for any of the CompterNames.
If the record does exist, then update the record from the temp table.
If the record does not exist, then INSERT the record from the temp table.
The table is very unwieldy -- it has just over 100 columns.
So, first, could anyone recommend a way to do this? It seems like it must be one of the most common operations in the business world, so I'm sure it's been done a hundred different ways, but I can't seem to get it to work with what I've been trying.
Secondly, is there a way to run the UPDATE command against the table, without having to actually type out all 100+ column names specifically?
I've read and searched the site & forums for quite a while now, and been registered for a little bit, but haven't bothered to post yet. There seem to be a lot of talented people here, so it's the first place I asked.
Any help is much appreciated - thanks in advance. 🙂
March 27, 2008 at 1:19 pm
By the way, this is the closest I've gotten on my latest fumbling attempt at this:
SELECT dts_SWInventoryTemp.ComputerName FROM dts_SWInventoryTemp
WHILE EXISTS (SELECT dts_SWInventoryTemp.ComputerName FROM dts_SWInventoryTemp)
BEGIN
IF EXISTS
(
SELECT tbl_SWInventory.ComputerName, dts_SWInventoryTemp.ComputerName from tbl_SWInventory, dts_SWInventoryTemp
WHERE tbl_SWInventory.ComputerName = dts_SWInventoryTemp.ComputerName
)
DELETE FROM tbl_SWInventory WHERE ComputerName = dts_SWInventoryTemp.ComputerName
END
INSERT INTO tbl_SWInventory
SELECT * FROM dts_SWInventoryTemp
When I run that, I get the "column prefix doesn't match with a table name or alias name used in the query" error, and I'm not sure why.
March 27, 2008 at 3:08 pm
From the looks of your code, you are trying to do this procedurally, maybe even with a cursor...correct?? This whole operation can be done with 2 set based queries. Here's an example.
DECLARE @temp_Table TABLE (ComputerName varchar(50) PRIMARY KEY, Col1 int, Col2 varchar(20))
DECLARE @TargetTable TABLE (ComputerName varchar(50) PRIMARY KEY, Col1 int, Col2 varchar(20))
SET NOCOUNT ON
INSERT INTO @TargetTable
SELECT 'Computer 1', 0, 'orig row' UNION ALL
SELECT 'Computer 2', 0, 'orig row' UNION ALL
SELECT 'Computer 3', 0, 'orig row' UNION ALL
SELECT 'Computer 4', 0, 'orig row' UNION ALL
SELECT 'Computer 5', 0, 'orig row' UNION ALL
SELECT 'Computer 6', 0, 'orig row'
INSERT INTO @temp_Table
SELECT 'Computer 2', 1, 'updated row' UNION ALL
SELECT 'Computer 4', 1, 'updated row' UNION ALL
SELECT 'Computer 6', 1, 'updated row' UNION ALL
SELECT 'Computer 8', 0, 'new row' UNION ALL
SELECT 'Computer 10', 0, 'new row'
SELECT * FROM @TargetTable
UPDATE target
SET Col1 = temp.Col1,
Col2 = temp.Col2
FROM @TargetTable target
INNER JOIN @temp_Table temp
ON target.ComputerName = temp.ComputerName
INSERT INTO @TargetTable
SELECT temp.*
FROM @temp_Table temp
LEFT JOIN @TargetTable target
ON target.ComputerName = temp.ComputerName
WHERE target.ComputerName IS NULL
SELECT * FROM @TargetTable
March 27, 2008 at 3:51 pm
aaharris (3/27/2008)
Secondly, is there a way to run the UPDATE command against the table, without having to actually type out all 100+ column names specifically?
Write the INSERT like so:
[font="Courier New"]INSERT into tbl_SWInventory
Select * From dts_SWInventoryTemp D
WHERE Not EXISTS (Select * from tbl_SWInventory T2
Where T2.ComputerName = D.ComputerName)
[/font]
For the UPDATE try this: In QA, open a window to edit your stored procedure with the UPDATE. Write the Update like so:
[font="Courier New"]UPDATE tbl_SWInventory
SET
FROM tbl_SWInventory T
JOIN dts_SWInventoryTemp D ON D.ComputerName = T.ComputerName
[/font]
Now make a new window in QA, and set your output mode to text. Then execute this query:
Select ' ' + Column_name+ ' = D.' + Column_name + ','
From INFORMATION_SCHEMA.COLUMNS
Where Table_Schema = 'dbo'
And Table_Name = 'dts_SWInventoryTemp'
Now copy all of the text data rows (not the row headers or footers) in the output window, switch back to your stored procedure editing window. Put your cursor immediately after the "SET " in your update statement and paste the text in. Find the last comma and delete it.
NOTE: ALWAYS do the Update first!, Then the Insert.
[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]
March 27, 2008 at 4:13 pm
John Rowan (3/27/2008)
From the looks of your code, you are trying to do this procedurally, maybe even with a cursor...correct?? This whole operation can be done with 2 set based queries. Here's an example.
The tables in question have around 100 columns and 900 or so rows. It looks like your query would require writing a line for every column and every row - is that correct, or am I missing something?
Of course, it's not a huge problem to write it that way, but it seems like it would be fairly inefficient.
March 27, 2008 at 4:14 pm
Thanks for the reply rbarryyoung, giving it a shot now.
March 27, 2008 at 4:19 pm
The tables in question have around 100 columns and 900 or so rows. It looks like your query would require writing a line for every column and every row - is that correct, or am I missing something?
Yes, you are missing something. Rbarry and I are actually talking about doing the same thing. Rbarry has just provided you with a slick way to script out the SET portion of your update statement without having to manually type the column names in.
Take the output that Rbarry has given you and put the column names into the UPDATE statement that I gave you and whallah! Run the 2 statements in the order that I gave them to you, first UPDATE, then INSERT. The UPDATE will take care of all existing rows and the INSERT will grab all missing rows. There is no need to do things one row at a time. That is the whole purpose of set based processing.
March 27, 2008 at 4:56 pm
Ah, I see what you're saying.
Thanks for all your help guys, I've got this working now, and added it as a step in my DTS package that is handling all this.
I'm a network admin, not a dba, so some of these concepts are still new to me. I have to say that I'm thoroughly enjoying this though. I'm building an inventory database that will let me track information about all the Windows nodes on my network (since I can't get the company to fork out for a decent product that does it 😉 ).
The table in question is the result of a batch script, to a text file, which is parsed out by Perl into a db-friendly format. It contains all the information on my machines from hardware to uptime, to every piece of software that's installed on them.
Some of them have a lot of software installed, hence the large number of columns in my table (SW_1, SW_2....SW_96).
There is probably a more elegant way to store this information that would make it easier to access and search.
What I would like to be able to do is answer the question "How many machines have this particular piece of licensed software installed on them?"
It looks like I am going to have to full-text index every one of these SoftWare columns, and develop a CONTAINS query that searches every field for the particular piece of software. Because sometimes Adobe Photoshop will be listed under SW_3 - and sometimes it will be listed under SW_7. I can't see any easy way to get around this.
Is it going to completely kill my server to index and query against so many columns?
Thanks again for your help fellas - you saved me a lot of time. :up:
March 27, 2008 at 5:52 pm
It sounds like a spreadsheet may work just as well for you with your current structure??
In any case, rather than having columns such as
software1, software2, software3, etc
and worrying about whether photoshop is in column 1 or 7, you should have a table called "Software" which lists the possible software packages with columns such as
softwareID, softwareName, manufacturer, version, etc
And then another table called ComputerSoftware with columns such as
computerID, softwareID, installedDate, etc
This way you can search for software and computer easily without worrying about software1 vs software7. What if you only had 10 software columns and a computer had 11 applications installed? What if you decided to uninstall photoshop from several of the machines? You'd have to remove it from software 1, 2, 3, 4, etc depending on each computer and then move the data within each row to shuffle the columns down by one.... Lots of work 🙂
Look up some articles on here about normalization - it's not at all hard and will make querying and management of your data much more straightforward. Please post more questions here if you have them! 🙂
March 27, 2008 at 7:59 pm
Thanks for the advice, Ian.
I did actually think about doing it that way when I planned out the db structure, but decided against it mainly for two reasons:
1) It would be much harder to get the data into a format that would work that way. The data I am importing is basically one huge text file with a section for each computer and a list of uptime, OS version, cpu, memory, a few other things, and then a list of every piece of software installed on the machine. This is the way the (free) application I use outputs its results when I run it as a batch. I wrote a Perl script to parse it out into a tab-separated file so that it's basically
ComputerName*TAB*Uptime*TAB*OS*TAB*SW_1*TAB*SW_2*TAB*etc...
A SQL Job kicks off the Perl script, then runs the DTS package that imports the data. I could re-write this to give a different format, but wasn't sure it would be worth it, although I've started to re-consider a couple times while working on it.
2) Some computers have 70 or 80 different pieces of software on them, so I would have to discard all the other pieces of software that I didn't include in my Software table. Then, any time I wanted to include a new piece of software that I hadn't before, I would have to add it to the software table, then go back and change my code to include it in the output file.
Also, this data is generated and will be imported on a daily basis, so I'll never have to manually update the tables.
Still, I see your point and it may be worth looking into.
I know that just from my single post it seems like I could accomplish what I'm trying for with just a spreadsheet, but this is just a small part of the database I'm putting together. I'm also importing Computer and User account information from Active Directory into separate tables in the database, importing some files that are generated by login scripts to it with other information, and a couple of other things. They are all keyed together either by the ComputerName, or by our inventory asset tag numbers on each machine (except servers).
So far, I only have one .asp page that queries the database for information from the LoginHistory table (login script info), but I've already been able to use quick SQL queries against this to report a lot of information that used to take hours to put together. Eventually I plan to have a set of .asp pages on our intranet/helpdesk server and make a lot of these things available to our whole department, so they don't have to rely on me to manually put this stuff together from different sources every time it's needed.
All in all, this is probably going to save our department a dozen man hours every week, so I'm kinda proud of what I've been able to accomplish so far. 🙂
Again though, thanks for the advice (I'm certainly not discarding your suggestion - I may still change things yet), and I'll check into the normalization topics.
March 27, 2008 at 9:25 pm
aaharris (3/27/2008)
Thanks for the reply rbarryyoung, giving it a shot now.
Let us know how it works.
[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]
March 28, 2008 at 8:27 am
It works perfectly. It's now part of the DTS package I'm using to import the data.
Thanks again, this would have taken me forever to figure out otherwise, and my solution undoubtedly wouldn't have been nearly as efficient. :thumbsup:
March 28, 2008 at 8:42 am
aaharris (3/27/2008)
Thanks for the advice, Ian.I did actually think about doing it that way when I planned out the db structure, but decided against it mainly for two reasons:
1) It would be much harder to get the data into a format that would work that way. The data I am importing is basically one huge text file with a section for each computer and a list of uptime, OS version, cpu, memory, a few other things, and then a list of every piece of software installed on the machine. This is the way the (free) application I use outputs its results when I run it as a batch. I wrote a Perl script to parse it out into a tab-separated file so that it's basically
ComputerName*TAB*Uptime*TAB*OS*TAB*SW_1*TAB*SW_2*TAB*etc...
A SQL Job kicks off the Perl script, then runs the DTS package that imports the data. I could re-write this to give a different format, but wasn't sure it would be worth it, although I've started to re-consider a couple times while working on it.
2) Some computers have 70 or 80 different pieces of software on them, so I would have to discard all the other pieces of software that I didn't include in my Software table. Then, any time I wanted to include a new piece of software that I hadn't before, I would have to add it to the software table, then go back and change my code to include it in the output file.
Also, this data is generated and will be imported on a daily basis, so I'll never have to manually update the tables.
Still, I see your point and it may be worth looking into.
It's ultimately a matter of "pay now or pay later". Normalizing the data early on makes for time savings down the line. Having been in your shoes before - the "savings" you're talking about in the initial load will end up costing you in the future. And - the difficulties with playing fast and dirty up front end up costing a lot more overall. Build it "right" or this will come back and bite you, over and over and over again. Really.
Point#1: I guess there's a reason for it, but why parse the file, and then parse the parsed file? Why not import it directly, and handle it within SQL Server? Also - if you ARE going to parse it, why not build something like XML instead of some delimited file, (which would get around your formatting issues)?
As to your point #2 - it's invalid, since you already have to do that. If you were to bring them in in a normalized fashion, you wouldn't have to change the program each time (which you do today, in 3 places: the PERL program to include it as one or more new columns, the data table to have a place to put the date and the import process to include it in the stuff being brought in.)
Normalization allows for a lot more flexibility that you will ever achieve in a flat file context.
Again - just trying to save you some pain. you're setting yourself up for a LOT of it as it now stands.
----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
March 28, 2008 at 9:18 am
edit: On Point #2, I think you may have misunderstood how I have it set up. No matter how many applications get loaded on machines, I'll never have to make any changes, because every application is just listed beside all the other data in the text file; all that is imported directly into the table, and if there are more than 96 pieces of software installed, they just don't get imported. Which is a problem in itself that would probably come back to bite me later.
I'm not extremely familiar with XML (I have worked with it a little, although it was parsing XML files instead of creating them), but if you'd like to show me what an SQL-friendly XML file should look like, and help me out with getting that into the database, I'd love to do it that way.
🙂
March 28, 2008 at 3:32 pm
One thing you could do to make this a little more friendly is change your Perl script to create 2 files. One would contain the data for the computer like
Computer_ID
Computer_Name
Computer_OS
Computer_Uptime
and another that loops through the software
Software_Computer_ID
Software_Name
Software_InstallDate
that way if you want to know what computers have Photoshop it is an easy query against the software table where currently you would need to look at all 96 software fields to find them all. This also removes the 96 software title limitation.
Viewing 15 posts - 1 through 15 (of 17 total)
You must be logged in to reply to this topic. Login to reply