Inserting Top 1 Record

  • I am working on an Inventory database and I am doing a mass insert from a stage table to my production table and I have several duplicate records in my stage table(identified by an inventory tag number but is not unique). I would only like to insert the 1st record I find (DISTINCT doesn't work for me, because the primary key field in my prod table is a combination of two fields from my stage table, which sometimes makes this record unique but it really isn't because it has the same tag number. I hope this make sense. Is it possible to just insert the 1st record I find? Any help is appreciated.

  • This is how I would do it:

    1. Insert rows without duplicate id. You can find them using something like:

    INSERT INTO destinationTable

    SELECT *

    FROM StageTable

    WHERE id IN (

    SELECT id, COUNT(*)

    FROM StageTable

    GROUP BY id

    HAVING COUNT(*) = 1

    )

    2. Work around duplicate ids in a temp table:

    SELECT *,

    INTO #tempTable

    FROM StageTable

    WHERE id IN (

    SELECT id, COUNT(*)

    FROM StageTable

    GROUP BY id

    HAVING COUNT(*) > 1

    )

    INSERT INTO destinationTable

    SELECT *

    FROM (

    SELECT *, CNT = ROW_NUMBER() OVER (PARTITION BY id ORDER BY id)

    FROM #TempTable

    ) AS Data

    WHERE CNT = 1

    DROP TABLE #TempTable

    Hope this helps,

    Gianluca

    -- Gianluca Sartori

  • jeordonez (8/4/2008)


    I am working on an Inventory database and I am doing a mass insert from a stage table to my production table and I have several duplicate records in my stage table(identified by an inventory tag number but is not unique). I would only like to insert the 1st record I find (DISTINCT doesn't work for me, because the primary key field in my prod table is a combination of two fields from my stage table, which sometimes makes this record unique but it really isn't because it has the same tag number. I hope this make sense. Is it possible to just insert the 1st record I find? Any help is appreciated.

    What are the three column names which are significant here? (You mention "tag number" and "combination of two fields"). What are their data types? You can use the "two fields" to distinguish between rows in staging which have the same tag number. How do you decide which row should be chosen - "the 1st record I find" needs to be translated into an ORDER BY.

    Cheers

    ChrisM

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • If the the requirement is that you want top 1 record against each Duplicate ID in your stage table, than the solution above should work. Otherwise, can you please post some data and explain with the help of that data...?

    Atif Sheikh

    ----------------------------------------------------------------------------------------------------------------------------------------------------------------------
    Sometimes, winning is not an issue but trying.
    You can check my BLOG
    [font="Arial Black"]here[/font][/url][/right]

  • Gianluca, I tried your approach, and it is still giving me the dupes, so here are my tables and sample data.

    CREATE TABLE [dbo].[InventoryStage](

    [SerialNumber] [varchar](20) NULL,

    [TagNumber] [varchar](8) NULL,

    [UnitCode] [varchar](7) NULL,

    [ItemNumber] [varchar](6) NULL,

    [CommodityCode] [char](10) NULL

    ) ON [PRIMARY]

    CREATE TABLE [dbo].[Inventory] (

    [InventoryID] [varchar] (15) NOT NULL ,

    [SerialNumber] [varchar] (20) NULL ,

    [TagNumber] [varchar] (8) NULL ,

    [UnitCode] [varchar] (7) NULL ,

    [ItemNumber] [varchar] (6) NULL ,

    [CommodityCode] [char] (10) NULL ,

    ) ON [PRIMARY]

    GO

    ALTER TABLE [dbo].[Inventory] ADD

    CONSTRAINT [PK_Inventorya] PRIMARY KEY CLUSTERED

    (

    [InventoryID]

    ) WITH FILLFACTOR = 90 ON [PRIMARY]

    GO

    Ok this is a subset of the fields from my tables, my InventoryStage table is loaded from a text file and doesn't contain a primary key(truncated nightly). Here is the query I am currently using to load Inventory(has the dupes in it)

    INSERT INTO Inventory

    (

    InventoryID,

    SerialNumber,

    TagNumber,

    UnitCode,

    ItemNumber,

    CommodityCode

    )

    SELECT TagNumber + ItemNumber,

    SerialNumber,

    TagNumber,

    UnitCode,

    ItemNumber,

    CommodityCode

    FROM InventoryStage

    WHERE (

    --ItemNumber = '00001' --Originally used but wouldnt give me all records

    CommodityCode <> '307' --Exclude software

    AND UnitCode <> '7017000'--Exclude surplus items

    )

    Ok the problem I am encountering is with the ItemNumber Field. Most of my records have an ItemNumber value of '000001' and this is the record I would like to insert. However some records even have ItemNumbers of '000002' and '000003', etc. and these records have the same TagNumber as ItemNumber '000001', hence these are my dupes. I had originally used ItemNumber = '00001' in my WHERE clause as shown above and this does work because there is only one ItemNumber of 00001 per record, however not all the items I need have an ItemNumber of '00001'. Hopefully this makes things clearer. Thanks to those who have responded already.

  • Hi Jeordonez

    A picture tells a thousand words...

    CREATE TABLE #InventoryStage(

    [SerialNumber] [varchar](20) NULL,

    [TagNumber] [varchar](8) NULL,

    [UnitCode] [varchar](7) NULL,

    [ItemNumber] [varchar](6) NULL,

    [CommodityCode] [char](10) NULL

    )

    INSERT INTO #InventoryStage (SerialNumber, TagNumber, UnitCode, ItemNumber, CommodityCode)

    SELECT SerialNumber, TagNumber, '701700X', '000001', NULL UNION ALL -- include

    SELECT SerialNumber, TagNumber, '701700X', '000001', NULL UNION ALL -- include

    SELECT SerialNumber, TagNumber, '701700X', '000001', NULL UNION ALL -- include

    SELECT SerialNumber, TagNumber, '701700X', '000001', NULL UNION ALL -- include

    SELECT SerialNumber, TagNumber, '701700X', '000002', NULL UNION ALL -- remove

    SELECT SerialNumber, TagNumber, '701700X', '000001', NULL UNION ALL -- include

    SELECT SerialNumber, TagNumber, '701700X', '000001', NULL UNION ALL -- include

    SELECT SerialNumber, TagNumber, '701700X', '000001', NULL UNION ALL -- include

    SELECT SerialNumber, TagNumber, '701700X', '000001', NULL UNION ALL -- include

    SELECT SerialNumber, TagNumber, '701700X', '000001', NULL UNION ALL -- include

    SELECT SerialNumber, TagNumber, '701700X', '000002', NULL -- remove

    Filling in the blanks in the above code would give people a much better idea of what you're attempting to achievem, and a sample of test data to work with.

    Cheers

    ChrisM

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Is this the code you tried?

    --Not duplicated rows

    INSERT INTO Inventory (

    InventoryID,

    SerialNumber,

    TagNumber,

    UnitCode,

    ItemNumber,

    CommodityCode

    )

    SELECT TagNumber + ItemNumber,

    SerialNumber,

    TagNumber,

    UnitCode,

    ItemNumber,

    CommodityCode

    FROM InventoryStage

    WHERE TagNumber NOT IN (

    SELECT TagNumber, COUNT(*)

    FROM InventoryStage

    GROUP BY TagNumber

    HAVING COUNT(*) > 1

    )

    AND CommodityCode <> '307'

    AND UnitCode <> '7017000'

    --Duplicated rows

    SELECT *,

    INTO #tempTable

    FROM InventoryStage

    WHERE TagNumber IN (

    SELECT TagNumber, COUNT(*)

    FROM InventoryStage

    GROUP BY TagNumber

    HAVING COUNT(*) > 1

    )

    AND CommodityCode <> '307'

    AND UnitCode <> '7017000'

    INSERT INTO Inventory (

    InventoryID,

    SerialNumber,

    TagNumber,

    UnitCode,

    ItemNumber,

    CommodityCode

    )

    SELECT TagNumber + ItemNumber,

    SerialNumber,

    TagNumber,

    UnitCode,

    ItemNumber,

    CommodityCode

    FROM (

    SELECT *, CNT = ROW_NUMBER() OVER (PARTITION BY TagNumber ORDER BY TagNumber)

    FROM #TempTable

    ) AS Data

    WHERE CNT = 1

    DROP TABLE #TempTable

    -- Gianluca Sartori

  • Ok, I think I got it:

    CREATE TABLE #InventoryStage(

    [SerialNumber] [varchar](20) NULL,

    [TagNumber] [varchar](8) NULL,

    [UnitCode] [varchar](7) NULL,

    [ItemNumber] [varchar](6) NULL,

    [CommodityCode] [char](10) NULL

    )

    INSERT INTO #InventoryStage (SerialNumber, TagNumber, ItemNumber)

    SELECT '701700X', '000001', '000001' UNION ALL -- include

    SELECT '701700X', '000001', '000002' UNION ALL -- remove

    SELECT '701700X', '000001', '000003' UNION ALL -- remove

    SELECT '701700X', '000002', '000004' UNION ALL -- include

    SELECT '701700X', '000003', '000001' UNION ALL -- include

    SELECT '701700X', '000003', '000002' -- remove

    CREATE TABLE #Inventory (

    [InventoryID] [varchar] (15) NOT NULL PRIMARY KEY,

    [SerialNumber] [varchar] (20) NULL ,

    [TagNumber] [varchar] (8) NULL ,

    [UnitCode] [varchar] (7) NULL ,

    [ItemNumber] [varchar] (6) NULL ,

    [CommodityCode] [char] (10) NULL ,

    )

    --Not duplicated rows

    INSERT INTO #Inventory (

    InventoryID,

    SerialNumber,

    TagNumber,

    UnitCode,

    ItemNumber,

    CommodityCode

    )

    SELECT TagNumber + ItemNumber,

    SerialNumber,

    TagNumber,

    UnitCode,

    ItemNumber,

    CommodityCode

    FROM #InventoryStage

    WHERE TagNumber NOT IN (

    SELECT TagNumber

    FROM #InventoryStage

    GROUP BY TagNumber

    HAVING COUNT(*) > 1

    )

    --Duplicated rows

    SELECT *

    INTO #tempTable

    FROM #InventoryStage

    WHERE TagNumber IN (

    SELECT TagNumber

    FROM #InventoryStage

    GROUP BY TagNumber

    HAVING COUNT(*) > 1

    )

    INSERT INTO #Inventory (

    InventoryID,

    SerialNumber,

    TagNumber,

    UnitCode,

    ItemNumber,

    CommodityCode

    )

    SELECT TagNumber + ItemNumber,

    SerialNumber,

    TagNumber,

    UnitCode,

    ItemNumber,

    CommodityCode

    FROM (

    SELECT *, CNT = ROW_NUMBER() OVER (PARTITION BY TagNumber ORDER BY TagNumber)

    FROM #TempTable

    ) AS Data

    WHERE CNT = 1

    SELECT * FROM #Inventory

    -- Gianluca Sartori

  • Maybe you could be interested in this article:

    http://www.sqlservercentral.com/articles/Best+Practices/61537/

    This is what Chris meant with his post...

    -- Gianluca Sartori

  • Gianluca, yes and I get this error on the first insert..

    Only one expression can be specified in the select list when the subquery is not introduced with EXISTS.

    **EDIT** I posted this before i saw your rework Gianluca, I am testing that out now, looks pretty good so far...:)

    Chris, I apologize for that, I did mean to provide sample data on my last post, slipped my mind..

    here you go..

    CREATE TABLE #InventoryStage(

    [SerialNumber] [varchar](20) NULL,

    [TagNumber] [varchar](8) NULL,

    [UnitCode] [varchar](7) NULL,

    [ItemNumber] [varchar](6) NULL,

    [CommodityCode] [char](10) NULL

    )

    INSERT INTO #InventoryStage (SerialNumber, TagNumber, UnitCode, ItemNumber, CommodityCode)

    SELECT 'USGW064840', '100003', '3310000', '000001', '208' UNION ALL -- include

    SELECT '9143FO130501300238M', '100062', '3475000', '000001', '284' UNION ALL -- include

    SELECT '0800690EB40R', '100127', '3710000', '000001', '284' UNION ALL -- include

    SELECT '0800690EB40R', '100127', '3710000', '000002', '284' UNION ALL -- dupe,remove

    SELECT '0800690EB40R', '100127', '3710000', '000003', '284' UNION ALL -- dupe,remove

    SELECT '080069EC322', '100128', '3710000', '000002', '284' UNION ALL -- include

    SELECT 'US67195498', '109500', '3037000', '000001', '284' UNION ALL -- include

    SELECT 'US67195498', '109500', '3037000', '000004', '284' UNION ALL -- dupe,remove

    SELECT '2808792', '109506', '3037000', '000002', '284' UNION ALL -- include

    SELECT 'HP7DZ71', '126263', '3037000', '000001', '284' UNION ALL -- include

    SELECT '146758390', '126298', '3410000', '000001', '284' UNION ALL -- include

    SELECT '146758390', '126298', '3410000', '000002', '284' UNION ALL -- dupe,remove

    SELECT '146758390', '126298', '3410000', '000003', '284' -- dupe, remove

  • Gianluca,

    Almost there, unfortunately we are not on SQL 2005 yet so ROW_NUMBER() doesn't work.. :(. is there an equivalent to this for 2000???

  • jeordonez (8/4/2008)


    Gianluca,

    Almost there, unfortunately we are not on SQL 2005 yet so ROW_NUMBER() doesn't work.. :(. is there an equivalent to this for 2000???

    Yep:

    SELECT i.*

    FROM #InventoryStage i

    INNER JOIN (SELECT SerialNumber, TagNumber, UnitCode, MIN(ItemNumber) AS MinItemNumber

    FROM #InventoryStage

    GROUP BY SerialNumber, TagNumber, UnitCode

    ) d ON d.SerialNumber = i.SerialNumber

    AND d.TagNumber = i.TagNumber

    AND d.UnitCode = i.UnitCode

    AND d.MinItemNumber = i.ItemNumber

    Results:

    SerialNumber TagNumber UnitCode ItemNumber CommodityCode

    -------------------- --------- -------- ---------- -------------

    0800690EB40R 100127 3710000 000001 284

    080069EC322 100128 3710000 000002 284

    146758390 126298 3410000 000001 284

    2808792 109506 3037000 000002 284

    9143FO130501300238M 100062 3475000 000001 284

    HP7DZ71 126263 3037000 000001 284

    US67195498 109500 3037000 000001 284

    USGW064840 100003 3310000 000001 208

    (8 row(s) affected)

    Thanks for the data jeordonez

    Cheers

    ChrisM

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Many thanks Gianluca and Chris, it is working just like expected. 😀

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply