May 17, 2005 at 3:41 pm
My problem is that, we're loading a data from a staging table into a production table via cursors (Don't ask it was already developed before i got there) but anyways the problem is this cursor goes through the whole table (2 mil rows) and then checks to see if the table in prod needs to be updated if not then it inserts the record. It goes through millions of table scans. I have heard of Merge in oracle and Upsert in Db2 (i think) which basically looks at a record updates the record if needs to be updated else it just inserts the record. Can this be achieved in T-SQL??? PLease help. Thanks.
May 17, 2005 at 3:56 pm
Looks like you want to think in terms of sets here.
You can probably indentify the rows that need to be updated with a view (where they exist on bot hh sources, but at least one column has changed), and rows that need to be inserted with a view as well (using a not exists clause in the view...
May 18, 2005 at 1:02 am
No, there is no "UPDATE or INSERT if not exists" command in T-SQL.
May 18, 2005 at 2:14 am
--SET NOCOUNT ON
--THIS PROCESS TOOK 1m47s--2m:05s(varies if you switch on traces)
--MACHINE:LAPTOP
--OS:WINDOWS XP INTEL(R) PENTIUM 4
--CPU 1.60 GHZ 897 MHZ,
--256 MB Of RAM(typically lower ram since i using other devices)
--DROP TABLE #STAGING
--DROP TABLE #PRODUCTION
--TAKE NOTE IF YOU ARE DOING THIS IN A REGULAR BASES, LOOK-UP USING OLAP
--QUERY IT'S REALLY ZOOOOMING....
use northwind
go
--benchmarking performance
DECLARE @COUNT INT
DECLARE @MILLION INT
DECLARE @RECORDS INT
CREATE TABLE #STAGING
(
ProductID varchar(100) NULL,
Qty INT NULL
)
CREATE TABLE #PRODUCTION
(
ProductID varchar(100) NULL,
Qty INT NULL
)
SET @COUNT = 1
SET @MILLION = 1000 --
SET @RECORDS = 0
--lets just fill #STAGING with nonsense data
--a good around 1 million records plus or more
--this select statement from northwind should return around 2155 record
--we loop this around 1000 to get us around 2,155,000 million data
PRINT 'START TIME: ' + CONVERT(VARCHAR(30), GETDATE(), 109)
SET NOCOUNT ON
WHILE @COUNT <= @MILLION
BEGIN
INSERT INTO #STAGING
SELECT @COUNT + CAST(ORDERID AS VARCHAR(10)) + CAST(PRODUCTID AS VARCHAR(10)) ProductID,
QUANTITY
FROM [ORDER DETAILS]
SET @RECORDS = @RECORDS + @@ROWCOUNT
SET @COUNT = @COUNT + 1
END
SET NOCOUNT OFF
PRINT 'END TIME: ' + CONVERT(VARCHAR(30), GETDATE(), 109)
PRINT 'TOTAL RECORDS INSERTED: ' + CONVERT(VARCHAR(30), @RECORDS)
PRINT ' '
--BTW:I ASSUMED THAT YOU ARE NOT PERFORMING
--DATA CHECKING/VALIDATION ANYMORE AND JUST WANT
--TO PERFORM A INSERT INTO THE PRODUCTION TABLE
PRINT 'START TIME(INSERT -->PRODUCTION): ' + CONVERT(VARCHAR(30), GETDATE(), 109)
INSERT INTO #PRODUCTION
SELECT s.PRODUCTID, NULL
FROM #STAGING s
WHERE s.PRODUCTID NOT IN (SELECT p.PRODUCTID FROM #PRODUCTION p)
PRINT 'END TIME(INSERT -->PRODUCTION): ' + CONVERT(VARCHAR(30), GETDATE(), 109)
PRINT ' '
--THIS PART ASSUMES YOU WANT TO UPDATE #PRODUCTION DATA
--WITH THE #STAGING DATA. THIS HAS BEEN SIMPLIFIED
--JUST TO SHOW THIS T-SQL
PRINT 'START TIME(UPDATE -->PRODUCTION): ' + CONVERT(VARCHAR(30), GETDATE(), 109)
--do updates in groups and not in bulk
--this is more faster
UPDATE #PRODUCTION
SET #PRODUCTION.Qty = s.QTY
FROM #STAGING s, #PRODUCTION p
WHERE s.PRODUCTID = p.PRODUCTID AND s.PRODUCTID like '10309%'
PRINT 'END TIME(UPDATE -->PRODUCTION): ' + CONVERT(VARCHAR(30), GETDATE(), 109)
PRINT ' '
--
--************************************************************
--TEST BY PERFORMING INSERT AGAIN
--THERE SHOULD BE NO INSERT MADE SINCE RECORDS ALREADY EXISTS
--IN #PRODUCTION
--REMOVE THE /* */ TO TEST FOR CONDITIONS
--*************************************************************
/*
--TEST CONDITION Q.A 1
PRINT 'START TIMENO RECORDS INSERTED)' + CONVERT(VARCHAR(30), GETDATE(), 109)
INSERT INTO #PRODUCTION
SELECT s.PRODUCTID, NULL
FROM #STAGING s
WHERE s.PRODUCTID NOT IN (SELECT p.PRODUCTID FROM #PRODUCTION p)
PRINT 'END TIMENO RECORDS INSERTED)' + CONVERT(VARCHAR(30), GETDATE(), 109)
PRINT ' '
*/
/*
--THIS WILL PERFORM NEW INSERTS INTO STAGING AND ONLY
--THESE NEW INSERTS WILL BE INSERTED INTO THE PRODUCTION
--TEST CONDITION Q.A 2
PRINT 'START TIME: ' + CONVERT(VARCHAR(30), GETDATE(), 109)
SET NOCOUNT ON
WHILE @MILLION > (@MILLION + 5)
BEGIN
INSERT INTO #STAGING
SELECT @COUNT + CAST(ORDERID AS VARCHAR(10)) + CAST(PRODUCTID AS VARCHAR(10)) ProductID,
QUANTITY
FROM [ORDER DETAILS]
SET @RECORDS = @RECORDS + @@ROWCOUNT
SET @COUNT = @COUNT + 1
END
SET NOCOUNT OFF
PRINT 'END TIME: ' + CONVERT(VARCHAR(30), GETDATE(), 109)
PRINT 'TOTAL RECORDS INSERTED: ' + CONVERT(VARCHAR(30), @RECORDS)
PRINT 'START TIMEINSERT NEW RECORDS) ' + CONVERT(VARCHAR(30), GETDATE(), 109)
INSERT INTO #PRODUCTION
SELECT s.PRODUCTID, NULL
FROM #STAGING s
WHERE s.PRODUCTID NOT IN (SELECT p.PRODUCTID FROM #PRODUCTION p)
PRINT 'END TIMEINSERT NEW RECORDS) ' + CONVERT(VARCHAR(30), GETDATE(), 109)
PRINT ' '
*/
--**************************************************************
--SET NOCOUNT OFF
SELECT COUNT(*) FROM #STAGING where productid like '10309%'
SELECT top 10 * FROM #PRODUCTION where productid like '10309%'
--btw i give no warranty against performance. just play around with it
this query ran against my machine about 1m.47s. pretty bad i know. i should be 30 seconds or less on a production machine, with loads of RAM
May 18, 2005 at 8:25 am
There is nothing like the UPSERT or MERGE command in SQL Server 2000 T-SQL. You might like to know that it is in SQL Server 2005.
March 21, 2006 at 11:07 am
" You might like to know that it is in SQL Server 2005. "
Do you have an example of the syntax?
March 21, 2006 at 1:03 pm
Why not create your own UPSERT T-SQL stored procedure for your data? It's simply the combination of an update and insert in a single stored procedure.
CREATE PROCEDURE spUpsertWuc
AS
INSERT INTO tblWUC (WUC)
SELECT DISTINCT new.WUC
FROM tblWUC_New new
LEFT OUTER JOIN tblWUC wuc
ON new.WUC=wuc.WUC
WHERE wuc.WUC IS NULL
UPDATE wuc SET
wuc.WUC = new.WUC
FROM tblWUC wuc
INNER JOIN tblWUC_New new
ON wuc.ID=new.ID
March 21, 2006 at 1:15 pm
I was just hoping to do a single check against the db -- inserting/updating thousands of rows at a time.
March 21, 2006 at 5:01 pm
BUT, if you don't use a cursor and use the INSERT and UPDATE commands it will process hundreds of times faster. The T-SQL shown doesn't and processes extremely fast.
March 21, 2006 at 7:51 pm
Right approach.
You only need to change the order of statements:
UPDATE first, INSERT second
Otherwise you update just inserted rows.
_____________
Code for TallyGenerator
March 23, 2006 at 7:17 am
thank you
May 9, 2007 at 11:28 am
This is specfic for 2005:
It is IMPORTANT to perform the update first:
-- UPDATE Destination FROM (Source INTERSECT Destination)
-- INSERT INTO Destination FROM (Source EXCEPT Destination)
-- INTERSECT (select the key value)
SELECT
Key(s) FROM ETL.dbo.MyCurr
INTERSECT
SELECT
Key(s) FROM ETL.dbo.MyPrevious
-- EXCEPT
SELECT
Key(s) FROM ETL.dbo.MyCurr
EXCEPT
SELECT
Key(s) FROM ETL.dbo.MyPrevious
HTH's
Viewing 12 posts - 1 through 11 (of 11 total)
You must be logged in to reply to this topic. Login to reply