August 4, 2008 at 1:10 am
I am working on an Inventory database and I am doing a mass insert from a stage table to my production table and I have several duplicate records in my stage table(identified by an inventory tag number but is not unique). I would only like to insert the 1st record I find (DISTINCT doesn't work for me, because the primary key field in my prod table is a combination of two fields from my stage table, which sometimes makes this record unique but it really isn't because it has the same tag number. I hope this make sense. Is it possible to just insert the 1st record I find? Any help is appreciated.
August 4, 2008 at 3:22 am
This is how I would do it:
1. Insert rows without duplicate id. You can find them using something like:
INSERT INTO destinationTable
SELECT *
FROM StageTable
WHERE id IN (
SELECT id, COUNT(*)
FROM StageTable
GROUP BY id
HAVING COUNT(*) = 1
)
2. Work around duplicate ids in a temp table:
SELECT *,
INTO #tempTable
FROM StageTable
WHERE id IN (
SELECT id, COUNT(*)
FROM StageTable
GROUP BY id
HAVING COUNT(*) > 1
)
INSERT INTO destinationTable
SELECT *
FROM (
SELECT *, CNT = ROW_NUMBER() OVER (PARTITION BY id ORDER BY id)
FROM #TempTable
) AS Data
WHERE CNT = 1
DROP TABLE #TempTable
Hope this helps,
Gianluca
-- Gianluca Sartori
August 4, 2008 at 4:24 am
jeordonez (8/4/2008)
I am working on an Inventory database and I am doing a mass insert from a stage table to my production table and I have several duplicate records in my stage table(identified by an inventory tag number but is not unique). I would only like to insert the 1st record I find (DISTINCT doesn't work for me, because the primary key field in my prod table is a combination of two fields from my stage table, which sometimes makes this record unique but it really isn't because it has the same tag number. I hope this make sense. Is it possible to just insert the 1st record I find? Any help is appreciated.
What are the three column names which are significant here? (You mention "tag number" and "combination of two fields"). What are their data types? You can use the "two fields" to distinguish between rows in staging which have the same tag number. How do you decide which row should be chosen - "the 1st record I find" needs to be translated into an ORDER BY.
Cheers
ChrisM
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
August 4, 2008 at 4:54 am
If the the requirement is that you want top 1 record against each Duplicate ID in your stage table, than the solution above should work. Otherwise, can you please post some data and explain with the help of that data...?
Atif Sheikh
August 4, 2008 at 10:19 am
Gianluca, I tried your approach, and it is still giving me the dupes, so here are my tables and sample data.
CREATE TABLE [dbo].[InventoryStage](
[SerialNumber] [varchar](20) NULL,
[TagNumber] [varchar](8) NULL,
[UnitCode] [varchar](7) NULL,
[ItemNumber] [varchar](6) NULL,
[CommodityCode] [char](10) NULL
) ON [PRIMARY]
CREATE TABLE [dbo].[Inventory] (
[InventoryID] [varchar] (15) NOT NULL ,
[SerialNumber] [varchar] (20) NULL ,
[TagNumber] [varchar] (8) NULL ,
[UnitCode] [varchar] (7) NULL ,
[ItemNumber] [varchar] (6) NULL ,
[CommodityCode] [char] (10) NULL ,
) ON [PRIMARY]
GO
ALTER TABLE [dbo].[Inventory] ADD
CONSTRAINT [PK_Inventorya] PRIMARY KEY CLUSTERED
(
[InventoryID]
) WITH FILLFACTOR = 90 ON [PRIMARY]
GO
Ok this is a subset of the fields from my tables, my InventoryStage table is loaded from a text file and doesn't contain a primary key(truncated nightly). Here is the query I am currently using to load Inventory(has the dupes in it)
INSERT INTO Inventory
(
InventoryID,
SerialNumber,
TagNumber,
UnitCode,
ItemNumber,
CommodityCode
)
SELECT TagNumber + ItemNumber,
SerialNumber,
TagNumber,
UnitCode,
ItemNumber,
CommodityCode
FROM InventoryStage
WHERE (
--ItemNumber = '00001' --Originally used but wouldnt give me all records
CommodityCode <> '307' --Exclude software
AND UnitCode <> '7017000'--Exclude surplus items
)
Ok the problem I am encountering is with the ItemNumber Field. Most of my records have an ItemNumber value of '000001' and this is the record I would like to insert. However some records even have ItemNumbers of '000002' and '000003', etc. and these records have the same TagNumber as ItemNumber '000001', hence these are my dupes. I had originally used ItemNumber = '00001' in my WHERE clause as shown above and this does work because there is only one ItemNumber of 00001 per record, however not all the items I need have an ItemNumber of '00001'. Hopefully this makes things clearer. Thanks to those who have responded already.
August 4, 2008 at 10:35 am
Hi Jeordonez
A picture tells a thousand words...
CREATE TABLE #InventoryStage(
[SerialNumber] [varchar](20) NULL,
[TagNumber] [varchar](8) NULL,
[UnitCode] [varchar](7) NULL,
[ItemNumber] [varchar](6) NULL,
[CommodityCode] [char](10) NULL
)
INSERT INTO #InventoryStage (SerialNumber, TagNumber, UnitCode, ItemNumber, CommodityCode)
SELECT SerialNumber, TagNumber, '701700X', '000001', NULL UNION ALL -- include
SELECT SerialNumber, TagNumber, '701700X', '000001', NULL UNION ALL -- include
SELECT SerialNumber, TagNumber, '701700X', '000001', NULL UNION ALL -- include
SELECT SerialNumber, TagNumber, '701700X', '000001', NULL UNION ALL -- include
SELECT SerialNumber, TagNumber, '701700X', '000002', NULL UNION ALL -- remove
SELECT SerialNumber, TagNumber, '701700X', '000001', NULL UNION ALL -- include
SELECT SerialNumber, TagNumber, '701700X', '000001', NULL UNION ALL -- include
SELECT SerialNumber, TagNumber, '701700X', '000001', NULL UNION ALL -- include
SELECT SerialNumber, TagNumber, '701700X', '000001', NULL UNION ALL -- include
SELECT SerialNumber, TagNumber, '701700X', '000001', NULL UNION ALL -- include
SELECT SerialNumber, TagNumber, '701700X', '000002', NULL -- remove
Filling in the blanks in the above code would give people a much better idea of what you're attempting to achievem, and a sample of test data to work with.
Cheers
ChrisM
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
August 4, 2008 at 10:41 am
Is this the code you tried?
--Not duplicated rows
INSERT INTO Inventory (
InventoryID,
SerialNumber,
TagNumber,
UnitCode,
ItemNumber,
CommodityCode
)
SELECT TagNumber + ItemNumber,
SerialNumber,
TagNumber,
UnitCode,
ItemNumber,
CommodityCode
FROM InventoryStage
WHERE TagNumber NOT IN (
SELECT TagNumber, COUNT(*)
FROM InventoryStage
GROUP BY TagNumber
HAVING COUNT(*) > 1
)
AND CommodityCode <> '307'
AND UnitCode <> '7017000'
--Duplicated rows
SELECT *,
INTO #tempTable
FROM InventoryStage
WHERE TagNumber IN (
SELECT TagNumber, COUNT(*)
FROM InventoryStage
GROUP BY TagNumber
HAVING COUNT(*) > 1
)
AND CommodityCode <> '307'
AND UnitCode <> '7017000'
INSERT INTO Inventory (
InventoryID,
SerialNumber,
TagNumber,
UnitCode,
ItemNumber,
CommodityCode
)
SELECT TagNumber + ItemNumber,
SerialNumber,
TagNumber,
UnitCode,
ItemNumber,
CommodityCode
FROM (
SELECT *, CNT = ROW_NUMBER() OVER (PARTITION BY TagNumber ORDER BY TagNumber)
FROM #TempTable
) AS Data
WHERE CNT = 1
DROP TABLE #TempTable
-- Gianluca Sartori
August 4, 2008 at 10:55 am
Ok, I think I got it:
CREATE TABLE #InventoryStage(
[SerialNumber] [varchar](20) NULL,
[TagNumber] [varchar](8) NULL,
[UnitCode] [varchar](7) NULL,
[ItemNumber] [varchar](6) NULL,
[CommodityCode] [char](10) NULL
)
INSERT INTO #InventoryStage (SerialNumber, TagNumber, ItemNumber)
SELECT '701700X', '000001', '000001' UNION ALL -- include
SELECT '701700X', '000001', '000002' UNION ALL -- remove
SELECT '701700X', '000001', '000003' UNION ALL -- remove
SELECT '701700X', '000002', '000004' UNION ALL -- include
SELECT '701700X', '000003', '000001' UNION ALL -- include
SELECT '701700X', '000003', '000002' -- remove
CREATE TABLE #Inventory (
[InventoryID] [varchar] (15) NOT NULL PRIMARY KEY,
[SerialNumber] [varchar] (20) NULL ,
[TagNumber] [varchar] (8) NULL ,
[UnitCode] [varchar] (7) NULL ,
[ItemNumber] [varchar] (6) NULL ,
[CommodityCode] [char] (10) NULL ,
)
--Not duplicated rows
INSERT INTO #Inventory (
InventoryID,
SerialNumber,
TagNumber,
UnitCode,
ItemNumber,
CommodityCode
)
SELECT TagNumber + ItemNumber,
SerialNumber,
TagNumber,
UnitCode,
ItemNumber,
CommodityCode
FROM #InventoryStage
WHERE TagNumber NOT IN (
SELECT TagNumber
FROM #InventoryStage
GROUP BY TagNumber
HAVING COUNT(*) > 1
)
--Duplicated rows
SELECT *
INTO #tempTable
FROM #InventoryStage
WHERE TagNumber IN (
SELECT TagNumber
FROM #InventoryStage
GROUP BY TagNumber
HAVING COUNT(*) > 1
)
INSERT INTO #Inventory (
InventoryID,
SerialNumber,
TagNumber,
UnitCode,
ItemNumber,
CommodityCode
)
SELECT TagNumber + ItemNumber,
SerialNumber,
TagNumber,
UnitCode,
ItemNumber,
CommodityCode
FROM (
SELECT *, CNT = ROW_NUMBER() OVER (PARTITION BY TagNumber ORDER BY TagNumber)
FROM #TempTable
) AS Data
WHERE CNT = 1
SELECT * FROM #Inventory
-- Gianluca Sartori
August 4, 2008 at 11:28 am
Maybe you could be interested in this article:
http://www.sqlservercentral.com/articles/Best+Practices/61537/
This is what Chris meant with his post...
-- Gianluca Sartori
August 4, 2008 at 11:57 am
Gianluca, yes and I get this error on the first insert..
Only one expression can be specified in the select list when the subquery is not introduced with EXISTS.
**EDIT** I posted this before i saw your rework Gianluca, I am testing that out now, looks pretty good so far...:)
Chris, I apologize for that, I did mean to provide sample data on my last post, slipped my mind..
here you go..
CREATE TABLE #InventoryStage(
[SerialNumber] [varchar](20) NULL,
[TagNumber] [varchar](8) NULL,
[UnitCode] [varchar](7) NULL,
[ItemNumber] [varchar](6) NULL,
[CommodityCode] [char](10) NULL
)
INSERT INTO #InventoryStage (SerialNumber, TagNumber, UnitCode, ItemNumber, CommodityCode)
SELECT 'USGW064840', '100003', '3310000', '000001', '208' UNION ALL -- include
SELECT '9143FO130501300238M', '100062', '3475000', '000001', '284' UNION ALL -- include
SELECT '0800690EB40R', '100127', '3710000', '000001', '284' UNION ALL -- include
SELECT '0800690EB40R', '100127', '3710000', '000002', '284' UNION ALL -- dupe,remove
SELECT '0800690EB40R', '100127', '3710000', '000003', '284' UNION ALL -- dupe,remove
SELECT '080069EC322', '100128', '3710000', '000002', '284' UNION ALL -- include
SELECT 'US67195498', '109500', '3037000', '000001', '284' UNION ALL -- include
SELECT 'US67195498', '109500', '3037000', '000004', '284' UNION ALL -- dupe,remove
SELECT '2808792', '109506', '3037000', '000002', '284' UNION ALL -- include
SELECT 'HP7DZ71', '126263', '3037000', '000001', '284' UNION ALL -- include
SELECT '146758390', '126298', '3410000', '000001', '284' UNION ALL -- include
SELECT '146758390', '126298', '3410000', '000002', '284' UNION ALL -- dupe,remove
SELECT '146758390', '126298', '3410000', '000003', '284' -- dupe, remove
August 4, 2008 at 12:10 pm
Gianluca,
Almost there, unfortunately we are not on SQL 2005 yet so ROW_NUMBER() doesn't work.. :(. is there an equivalent to this for 2000???
August 4, 2008 at 12:55 pm
jeordonez (8/4/2008)
Gianluca,Almost there, unfortunately we are not on SQL 2005 yet so ROW_NUMBER() doesn't work.. :(. is there an equivalent to this for 2000???
Yep:
SELECT i.*
FROM #InventoryStage i
INNER JOIN (SELECT SerialNumber, TagNumber, UnitCode, MIN(ItemNumber) AS MinItemNumber
FROM #InventoryStage
GROUP BY SerialNumber, TagNumber, UnitCode
) d ON d.SerialNumber = i.SerialNumber
AND d.TagNumber = i.TagNumber
AND d.UnitCode = i.UnitCode
AND d.MinItemNumber = i.ItemNumber
Results:
SerialNumber TagNumber UnitCode ItemNumber CommodityCode
-------------------- --------- -------- ---------- -------------
0800690EB40R 100127 3710000 000001 284
080069EC322 100128 3710000 000002 284
146758390 126298 3410000 000001 284
2808792 109506 3037000 000002 284
9143FO130501300238M 100062 3475000 000001 284
HP7DZ71 126263 3037000 000001 284
US67195498 109500 3037000 000001 284
USGW064840 100003 3310000 000001 208
(8 row(s) affected)
Thanks for the data jeordonez
Cheers
ChrisM
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
August 4, 2008 at 6:13 pm
Many thanks Gianluca and Chris, it is working just like expected. 😀
Viewing 13 posts - 1 through 12 (of 12 total)
You must be logged in to reply to this topic. Login to reply