Duplicate data

  • Hi Experts .

    I have table Temp_decimal_mismatch which has duplicate data . The duplicate data is not start forward to remove .

    Untitled

    Here the row highlighted in yellow is a duplicate . If you see the same data is present in row 1 but in reverse order. I have to get rid rid of the duplicate row.

    Sample query is attached

    Drop table [Temp_decimal_mismatch]

    CREATE TABLE dbo.[Temp_decimal_mismatch](
    [TABLE_SCHEMA] [nvarchar](128) NULL,
    [TABLE_NAME] [sysname] NOT NULL,
    [COLUMN_NAME] [sysname] NULL,
    [datatype] [nvarchar](151) NULL,
    [TABLE_SCHEMA_2] [nvarchar](128) NULL,
    [TABLE_NAME_2] [sysname] NOT NULL,
    [COLUMN_NAME_2] [sysname] NULL,
    [datatype_2] [nvarchar](151) NULL
    ) ON [PRIMARY]

    GO
    INSERT dbo.[Temp_decimal_mismatch] ([TABLE_SCHEMA], [TABLE_NAME], [COLUMN_NAME], [datatype], [TABLE_SCHEMA_2], [TABLE_NAME_2], [COLUMN_NAME_2], [datatype_2]) VALUES (N'ressystem', N'TableA', N'alloc_pct', N'DECIMAL(7,6)', N'dbo', N'TableB', N'alloc_pct', N'DECIMAL(10,6)')
    INSERT dbo.[Temp_decimal_mismatch] ([TABLE_SCHEMA], [TABLE_NAME], [COLUMN_NAME], [datatype], [TABLE_SCHEMA_2], [TABLE_NAME_2], [COLUMN_NAME_2], [datatype_2]) VALUES (N'dbo', N'TableB', N'alloc_pct', N'DECIMAL(10,6)', N'ressystem', N'TableA', N'alloc_pct', N'DECIMAL(7,6)')
    INSERT dbo.[Temp_decimal_mismatch] ([TABLE_SCHEMA], [TABLE_NAME], [COLUMN_NAME], [datatype], [TABLE_SCHEMA_2], [TABLE_NAME_2], [COLUMN_NAME_2], [datatype_2]) VALUES (N'ressystem', N'TableA', N'base_amt', N'DECIMAL(18,6)', N'dbo', N'TableB', N'base_amt', N'DECIMAL(14,2)')
    INSERT dbo.[Temp_decimal_mismatch] ([TABLE_SCHEMA], [TABLE_NAME], [COLUMN_NAME], [datatype], [TABLE_SCHEMA_2], [TABLE_NAME_2], [COLUMN_NAME_2], [datatype_2]) VALUES (N'dbo', N'TableB', N'base_amt', N'DECIMAL(14,2)', N'ressystem', N'TableA', N'base_amt', N'DECIMAL(18,6)')

    Select * from dbo.[Temp_decimal_mismatch]?

     

     

  • What is the PK of this table?

    The absence of evidence is not evidence of absence.
    Martin Rees

    You can lead a horse to water, but a pencil must be lead.
    Stan Laurel

  • There is [TABLE_SCHEMA], [TABLE_NAME], [COLUMN_NAME], [datatype] is the pk

  • arvind.patil 98284 wrote:

    There is [TABLE_SCHEMA], [TABLE_NAME], [COLUMN_NAME], [datatype] is the pk

    You mean this column?

    [datatype] [nvarchar](151) NULL,

    Seems an odd choice. Also, this column is nullable and therefore cannot be a PK.

    The absence of evidence is not evidence of absence.
    Martin Rees

    You can lead a horse to water, but a pencil must be lead.
    Stan Laurel

  • As i cannot post the actual data and table structure. .I have created that table , so that i can post it in sql center portal .

  • arvind.patil 98284 wrote:

    As i cannot post the actual data and table structure. .I have created that table , so that i can post it in sql center portal .

    That's fine, but at least make sure your fake table and data make sense. Can you repost your fake table with a PK defined? The PK is needed for the duplicate deletion.

    The absence of evidence is not evidence of absence.
    Martin Rees

    You can lead a horse to water, but a pencil must be lead.
    Stan Laurel

  • Hi Experts .

    I have table Temp_decimal_mismatch which has duplicate data . The duplicate data is not straightforward to remove .

    Untitled

    Here the row highlighted in yellow is a duplicate . If you see the same data is present in row 1 but in reverse order. I have to get rid rid of the duplicate row.

    Sample query is attached

    Drop table [Temp_decimal_mismatch]

    CREATE TABLE dbo.[Temp_decimal_mismatch](

    [TABLE_SCHEMA] [nvarchar](128) not NULL,

    [TABLE_NAME] [sysname] not NULL,

    [COLUMN_NAME] [sysname] not NULL,

    [datatype] [nvarchar](151) not NULL,

    [TABLE_SCHEMA_2] [nvarchar](128) NULL,

    [TABLE_NAME_2] [sysname] NOT NULL,

    [COLUMN_NAME_2] [sysname] NULL,

    [datatype_2] [nvarchar](151) NULL

    ,constraint pk_zxc primary key ([TABLE_SCHEMA] ,[TABLE_NAME],[COLUMN_NAME],[datatype])

    ) ON [PRIMARY]

    GO

    INSERT dbo.[Temp_decimal_mismatch] ([TABLE_SCHEMA], [TABLE_NAME], [COLUMN_NAME], [datatype], [TABLE_SCHEMA_2], [TABLE_NAME_2], [COLUMN_NAME_2], [datatype_2]) VALUES (N'ressystem', N'TableA', N'alloc_pct', N'DECIMAL(7,6)', N'dbo', N'TableB', N'alloc_pct', N'DECIMAL(10,6)')

    INSERT dbo.[Temp_decimal_mismatch] ([TABLE_SCHEMA], [TABLE_NAME], [COLUMN_NAME], [datatype], [TABLE_SCHEMA_2], [TABLE_NAME_2], [COLUMN_NAME_2], [datatype_2]) VALUES (N'dbo', N'TableB', N'alloc_pct', N'DECIMAL(10,6)', N'ressystem', N'TableA', N'alloc_pct', N'DECIMAL(7,6)')

    INSERT dbo.[Temp_decimal_mismatch] ([TABLE_SCHEMA], [TABLE_NAME], [COLUMN_NAME], [datatype], [TABLE_SCHEMA_2], [TABLE_NAME_2], [COLUMN_NAME_2], [datatype_2]) VALUES (N'ressystem', N'TableA', N'base_amt', N'DECIMAL(18,6)', N'dbo', N'TableB', N'base_amt', N'DECIMAL(14,2)')

    INSERT dbo.[Temp_decimal_mismatch] ([TABLE_SCHEMA], [TABLE_NAME], [COLUMN_NAME], [datatype], [TABLE_SCHEMA_2], [TABLE_NAME_2], [COLUMN_NAME_2], [datatype_2]) VALUES (N'dbo', N'TableB', N'base_amt', N'DECIMAL(14,2)', N'ressystem', N'TableA', N'base_amt', N'DECIMAL(18,6)')

    Select * from dbo.[Temp_decimal_mismatch]

     

  • This works, but it would be much simpler with a PK field.  (I used all of the fields in place of a PK.)

    WITH standardized_mismatches AS
    (
    SELECT o.*
    , ROW_NUMBER() OVER(PARTITION BY st.TABLE_SCHEMA, st.TABLE_NAME, st.COLUMN_NAME, st.datatype, st.TABLE_SCHEMA_2, st.TABLE_NAME_2, st.COLUMN_NAME_2, st.datatype_2 ORDER BY o.TABLE_SCHEMA, o.TABLE_NAME, o.COLUMN_NAME, o.datatype, o.TABLE_SCHEMA_2, o.TABLE_NAME_2, o.COLUMN_NAME_2, o.datatype_2) st_rn
    , ROW_NUMBER() OVER(PARTITION BY o.TABLE_SCHEMA, o.TABLE_NAME, o.COLUMN_NAME, o.datatype, o.TABLE_SCHEMA_2, o.TABLE_NAME_2, o.COLUMN_NAME_2, o.datatype_2 ORDER BY st.TABLE_SCHEMA, st.TABLE_NAME, st.COLUMN_NAME, st.datatype, st.TABLE_SCHEMA_2, st.TABLE_NAME_2, st.COLUMN_NAME_2, st.datatype_2) o_rn
    FROM #Temp_decimal_mismatch AS o -- original
    CROSS APPLY
    (
    VALUES
    (TABLE_SCHEMA, TABLE_NAME, COLUMN_NAME, datatype, TABLE_SCHEMA_2, TABLE_NAME_2, COLUMN_NAME_2, datatype_2)
    ,(TABLE_SCHEMA_2, TABLE_NAME_2, COLUMN_NAME_2, datatype_2, TABLE_SCHEMA, TABLE_NAME, COLUMN_NAME, datatype)
    ) st(TABLE_SCHEMA, TABLE_NAME, COLUMN_NAME, datatype, TABLE_SCHEMA_2, TABLE_NAME_2, COLUMN_NAME_2, datatype_2) -- standardized
    )
    DELETE m
    FROM standardized_mismatches m
    WHERE m.o_rn = 1
    AND m.st_rn = 2
    ;

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply