Remove duplicate values

  • Hi Experts,

    In the below table records are grouped by ID column. If the count for ID is greater than 1 then remove the records which are having 'Null" values in any of the name ,city ,place  columns and remaining columns need to be fetched . If the count for ID=1 even if  we have null values on any of the columns then include that records in the output results.


    CREATE TABLE details (
      id int NOT NULL,
      name varchar(100),
      city varchar(100),
      place varchar(100)
    );

    Sample Data:

    INSERT INTO details
      VALUES (1, 'sara', 'chennai', 't.nagar');
    INSERT INTO details
      VALUES (1, NULL, NULL, NULL);
    INSERT INTO details
      VALUES (1, 'sara', 'chennai', 't.nagar');
    INSERT INTO details
      VALUES (2, 'sara', 'chennai', 't.nagar');
    INSERT INTO details
      VALUES (2, NULL, NULL, NULL);
    INSERT INTO details
      VALUES (3, 'sara', 'chennai', 't.nagar');
    INSERT INTO details
      VALUES (4, NULL, NULL, NULL);
    INSERT INTO details
      VALUES (5, 'raj', 'vellore', 'kodambakkam');
    INSERT INTO details
      VALUES (5, 'raj', 'vellore', NULL);

    Input Data:

    idnamecityplace
    1sarachennait.nagar
    1(null)(null)(null)
    1sarachennait.nagar
    2sarachennait.nagar
    2(null)(null)(null)
    3sarachennait.nagar
    4(null)(null)(null)
    5rajvellorekodambakkam
    5rajvellore(null)


    Output Results:

    idnamecityplace
    1sarachennait.nagar
    2sarachennait.nagar
    3sarachennait.nagar
    4(null)(null)(null)

    5   raj     vellore    kodambakkam

     

    Regards,
    Saravanan

    Saravanan

  • What does your query look like so far? ie. what have you tried?

  • What you state your requirements are, and your expected output don't match. You state that only those with at least 1 value of NULL should be removed, however, in your expected output you only have one line for ID 1. There are 2 rows for ID 1 with no NULLs.

    Thom~

    Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
    Larnu.uk

  • Hi Thorn,
    Its take ID has primary key  and I am only taking  ID count.
    ID=1 is available 3 times.so for ID=1(Multiple value available for this particular ID) if any of the column (excluding ID) is having null value then that column should be excluded.
    ID=2 and ID=5  is available 2 times.so for ID=2 or ID=5(Multiple value available for this particular ID) if any of the column (excluding ID) is having null value then that column should be excluded.
    ID=4 is appearing only once. So even if it is having null value this records should be included in the output results.

    Sample query to calculate to get  count of ID:

    with cte
    as (select id,
               name,
               city,
               place,
               dense_rank() over (order by id) as rnk
        from details
       ),
         cte1
    as (select rnk,
               count(id) as id
        from cte
        group by rnk
       )
    select *
    from cte1

    Regards,
    Saravanan

    Saravanan

  • What would you expect if you have the rows:
    INSERT INTO [details]
    VALUES (6, NULL, 'London','Kensington'),
           (6,'Steve', 'London',NULL);

    Thom~

    Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
    Larnu.uk

  • That's good question I haven't got records like that . But I would like to exclude that records as well.

    Regards,
    Saravanan

    Saravanan

  • So you want to exclude both rows?  Please add Thom's rows and mine (below) to your sample data and show us what would be your expected results.

    INSERT INTO details
    VALUES (7, 'Steve', 'London','Kensington'),
       (7,'Steve', 'London','Barnet');

    John

  • saravanatn - Wednesday, December 6, 2017 3:21 AM

    That's good question I haven't got records like that . But I would like to exclude that records as well.

    Regards,
    Saravanan

    You'd want to exclude both of those, but you're happy for ID 4 (with all values of NULL) to be returned..? These statements seem to conflict.

    Thom~

    Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
    Larnu.uk

  • I am just giving(test data just) an example that (ID=4 )records should be returned even if all the non primary keys columns are null. Because count=1 for ID=4 ....Regards,
    Saravanan

    Saravanan

  • saravanatn - Wednesday, December 6, 2017 4:49 AM

    I am just giving(test data just) an example that (ID=4 )records should be returned even if all the non primary keys columns are null. Because count=1 for ID=4 ....Regards,
    Saravanan

    The reason for our questions, however, is that your test data doesn't provide enough scenarios. As a result, we need to ask for more, or provide test data for you to tell us what you expect.

    At a pure guess, maybe this is what you're after:

    USE Sandbox;
    GO
    CREATE TABLE details (
      id int NOT NULL,
      [name] varchar(100),
      city varchar(100),
      place varchar(100));
    --Sample Data: (comments start with a double hyphen)
    INSERT INTO details
    VALUES (1, 'sara', 'chennai', 't.nagar'),
       (1, NULL, NULL, NULL),
       (1, 'sara', 'chennai', 't.nagar'),
       (2, 'sara', 'chennai', 't.nagar'),
       (2, NULL, NULL, NULL),
       (3, 'sara', 'chennai', 't.nagar'),
       (4, NULL, NULL, NULL),
       (5, 'raj', 'vellore', 'kodambakkam'),
       (5, 'raj', 'vellore', NULL),
       (6, NULL, 'London','Kensington'),
       (6,'Steve', 'London',NULL),
       (7,'Steve', 'London','Kensington'),
       (7,'Steve', 'London','Barnet');
    GO
    SELECT d.id,
       MAX([name]) AS [name],
       MAX(city) AS city,
       MAX(place) AS place
    FROM details d
    GROUP BY d.id;
    GO
    DROP TABLE details;
    GO

    But, like I said, that's a guess. It doesn't do a logic your specified for ID 6 or 7 (which John and I have asked you about), so if this isn't correct you'll need to let us know what you expect the output to be.

    Thom~

    Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
    Larnu.uk

  • Thanks Thorn and John . Will test with more test data and get back to you guys.

    Regards,
    Saravanan

    Saravanan

  • Thom A - Wednesday, December 6, 2017 5:23 AM

    saravanatn - Wednesday, December 6, 2017 4:49 AM

    I am just giving(test data just) an example that (ID=4 )records should be returned even if all the non primary keys columns are null. Because count=1 for ID=4 ....Regards,
    Saravanan

    The reason for our questions, however, is that your test data doesn't provide enough scenarios. As a result, we need to ask for more, or provide test data for you to tell us what you expect.

    At a pure guess, maybe this is what you're after:

    USE Sandbox;
    GO
    CREATE TABLE details (
      id int NOT NULL,
      [name] varchar(100),
      city varchar(100),
      place varchar(100));
    --Sample Data: (comments start with a double hyphen)
    INSERT INTO details
    VALUES (1, 'sara', 'chennai', 't.nagar'),
       (1, NULL, NULL, NULL),
       (1, 'sara', 'chennai', 't.nagar'),
       (2, 'sara', 'chennai', 't.nagar'),
       (2, NULL, NULL, NULL),
       (3, 'sara', 'chennai', 't.nagar'),
       (4, NULL, NULL, NULL),
       (5, 'raj', 'vellore', 'kodambakkam'),
       (5, 'raj', 'vellore', NULL),
       (6, NULL, 'London','Kensington'),
       (6,'Steve', 'London',NULL),
       (7,'Steve', 'London','Kensington'),
       (7,'Steve', 'London','Barnet');
    GO
    SELECT d.id,
       MAX([name]) AS [name],
       MAX(city) AS city,
       MAX(place) AS place
    FROM details d
    GROUP BY d.id;
    GO
    DROP TABLE details;
    GO

    But, like I said, that's a guess. It doesn't do a logic your specified for ID 6 or 7 (which John and I have asked you about), so if this isn't correct you'll need to let us know what you expect the output to be.

    For the test data created by you I want the following output updated in inline commands.

    INSERT INTO details 
    VALUES (1, 'sara', 'chennai', 't.nagar'), 
           (1, NULL, NULL, NULL), 
           (1, 'sara', 'chennai', 't.nagar'), 
           (2, 'sara', 'chennai', 't.nagar'), 
           (2, NULL, NULL, NULL), 
           (3, 'sara', 'chennai', 't.nagar'), 
           (4, NULL, NULL, NULL), 
           (5, 'raj', 'vellore', 'kodambakkam'), 
           (5, 'raj', 'vellore', NULL), 
           (6, NULL, 'London','Kensington'), /* not required as it is having multiple values for ID(Count=2) and it is having Null value/*
           (6,'Steve', 'London',NULL), /* not required as it is having multiple values for ID(Count=2) and it is having Null value/*
           (7,'Steve', 'London','Kensington'), /* Required as it is not having null values/*
           (7,'Steve', 'London','Barnet') /* Required as it is not having null values/*;

    Output Results:

    idnamecityplace
    1sarachennait.nagar
    1sarachennait.nagar
    2sarachennait.nagar
    3sarachennait.nagar
    4NULLNULLNULL
    5rajvellorekodambakkam
    7SteveLondonKensington
    7SteveLondonBarnet

    Kindly let me know if you requires any additional details.

    Regards,
    Saravanan

    Saravanan

  • saravanatn - Wednesday, December 6, 2017 1:00 AM

    >>In the below table records [sic] are grouped by ID column. If the count for ID is greater than 1 then remove the records [sic] which are having 'NULL" values in any of the name ,city, place columns and remaining columns need to be fetched. If the count for ID=1 even if we have NULL values on any of the columns then include that records [sic] in the output results. <<

    Can you please buy read and study a book on RDBMS and SQL? There is no such thing as a generic "id" in RDBMS. It has to be the identifier of something in particular to be valid. Then on top of that, and identifier can never be a numeric because you don't do math on it.

    You might want to read a book on basic data modeling. Things like "city" are too vague generic to be valid column names. They need to have what the ISO 11179 people call "attribute property" to be valid. And you're "name" is one of those attribute properties; it has to be the name of something.

    A table by definition, must have a primary key. But this deck of punch cards doesn't have a key and can never have any keys! Let's at least get the column names and how many datatypes correct. I picked a length of 35 for the names based on the International Postal Union requirements for mailing labels. I'm willing to bet that you made up the length of 100, with no research whatsoever. This is not how to be a database professional

    CREATE TABLE Details
    (detail_id CHAR(3)NOT NULL PRIMARY KEY,
    something_name VARCHAR(35),
    city_name VARCHAR(35),
    place_name VARCHAR(35)
    );

    If you have valid data, then you should be able to use the current table constructor insertion syntax. That would look like this. People who think in sets instead of in punchcards really like it. It lets the optimizer do some things, instead of forcing a sequential insertion. But you're trying put garbage in a table. To do this, you of course have to screwup the DDL for that table you have valid data, then you should be able to use the current table constructor insertion syntax. That would look like this. People who think in sets instead of in punchcards really like it. It lets the optimizer do some things, instead of forcing a sequential insertion. But you're trying put garbage in a table. To do this, you of course have to screwup the DDL for that table

    INSERT INTO Details
    VALUES
    (1, 'sara', 'chennai', 't.nagar'),
    (1, NULL, NULL, NULL), -- dup key
    (1, 'sara', 'chennai', 't.nagar'), --- dup row!

    (2, 'sara', 'chennai', 't.nagar'),
    (2, NULL, NULL, NULL), ---dup key

    (3, 'sara', 'chennai', 't.nagar'),

    (4, NULL, NULL, NULL),

    (5, 'raj', 'vellore', 'kodambakkam'),
    (5, 'raj', 'vellore', NULL); ---dup key

    What are you trying to actually do? My first impulse is that you need to have stored procedure that does the row update for Details.

    CREATE PROCEDURE Fill_Details
    (@in_detail_id CHAR(3), @in_something_name VARCHAR(35),
    @in_city_name VARCHAR(35), @in_place_name )
    BEGIN
    UPDATE Details
    SET something_name = COALESCE(something_name, @in_something_name),
      city_name = COALESCE (city_name, @in_city_name),
      place_name = COALESCE (place_name, @in_place_name)
    WHERE detail_id = @in_detail_id;
    -- insert new row, if it does not exist
    END;

    ]What are you trying to actually do? My first impulse is that you need to have stored procedure that does the row update for Details.CREATE CREATE PROCEDURE Fill_Details(@in_detail_id CHAR(3), @in_something_name VARCHAR(35), @in_city_name VARCHAR(35), @in_place_name )
    BEGIN
    UPDATE Details
    SET something_name = COALESCE(something_name, @in_something_name),  
     city_name = COALESCE (city_name, @in_city_name),
     place_name = COALESCE (place_name, @in_place_name
    )WHERE detail_id = @in_detail_id;-- insert new row, if it does not exist
    END;

    Please post DDL and follow ANSI/ISO standards when asking for help. 

  • jcelko212 32090 - Thursday, December 7, 2017 11:51 AM

    saravanatn - Wednesday, December 6, 2017 1:00 AM

    >>In the below table records [sic] are grouped by ID column. If the count for ID is greater than 1 then remove the records [sic] which are having 'NULL" values in any of the name ,city, place columns and remaining columns need to be fetched. If the count for ID=1 even if we have NULL values on any of the columns then include that records [sic] in the output results. <<

    Can you please buy read and study a book on RDBMS and SQL? There is no such thing as a generic "id" in RDBMS. It has to be the identifier of something in particular to be valid. Then on top of that, and identifier can never be a numeric because you don't do math on it.

    You might want to read a book on basic data modeling. Things like "city" are too vague generic to be valid column names. They need to have what the ISO 11179 people call "attribute property" to be valid. And you're "name" is one of those attribute properties; it has to be the name of something.

    A table by definition, must have a primary key. But this deck of punch cards doesn't have a key and can never have any keys! Let's at least get the column names and how many datatypes correct. I picked a length of 35 for the names based on the International Postal Union requirements for mailing labels. I'm willing to bet that you made up the length of 100, with no research whatsoever. This is not how to be a database professional

    CREATE TABLE Details
    (detail_id CHAR(3)NOT NULL PRIMARY KEY,
    something_name VARCHAR(35),
    city_name VARCHAR(35),
    place_name VARCHAR(35)
    );

    If you have valid data, then you should be able to use the current table constructor insertion syntax. That would look like this. People who think in sets instead of in punchcards really like it. It lets the optimizer do some things, instead of forcing a sequential insertion. But you're trying put garbage in a table. To do this, you of course have to screwup the DDL for that table you have valid data, then you should be able to use the current table constructor insertion syntax. That would look like this. People who think in sets instead of in punchcards really like it. It lets the optimizer do some things, instead of forcing a sequential insertion. But you're trying put garbage in a table. To do this, you of course have to screwup the DDL for that table

    INSERT INTO Details
    VALUES
    (1, 'sara', 'chennai', 't.nagar'),
    (1, NULL, NULL, NULL), -- dup key
    (1, 'sara', 'chennai', 't.nagar'), --- dup row!

    (2, 'sara', 'chennai', 't.nagar'),
    (2, NULL, NULL, NULL), ---dup key

    (3, 'sara', 'chennai', 't.nagar'),

    (4, NULL, NULL, NULL),

    (5, 'raj', 'vellore', 'kodambakkam'),
    (5, 'raj', 'vellore', NULL); ---dup key

    What are you trying to actually do? My first impulse is that you need to have stored procedure that does the row update for Details.

    CREATE PROCEDURE Fill_Details
    (@in_detail_id CHAR(3), @in_something_name VARCHAR(35),
    @in_city_name VARCHAR(35), @in_place_name )
    BEGIN
    UPDATE Details
    SET something_name = COALESCE(something_name, @in_something_name),
      city_name = COALESCE (city_name, @in_city_name),
      place_name = COALESCE (place_name, @in_place_name)
    WHERE detail_id = @in_detail_id;
    -- insert new row, if it does not exist
    END;

    ]What are you trying to actually do? My first impulse is that you need to have stored procedure that does the row update for Details.CREATE CREATE PROCEDURE Fill_Details(@in_detail_id CHAR(3), @in_something_name VARCHAR(35), @in_city_name VARCHAR(35), @in_place_name )
    BEGIN
    UPDATE Details
    SET something_name = COALESCE(something_name, @in_something_name),  
     city_name = COALESCE (city_name, @in_city_name),
     place_name = COALESCE (place_name, @in_place_name
    )WHERE detail_id = @in_detail_id;-- insert new row, if it does not exist
    END;

    Isn't it extraordinarily careless and risky to insert single-digit numeric values into a char(3) key?  Even though we will never do math on detail_id, it should still be numeric to provide automatic validity checking.  Otherwise:


    create table #test1 (
      test_id char(3) PRIMARY KEY
      )
    insert into #test1 values(1)
    insert into #test1 values('01')
    insert into #test1 values('001')
    select * from #test1


    Good luck figuring that mess out later!  Even having to code CHECK constraints for it is extra effort and somewhat error prone.

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • jcelko212 32090 - Thursday, December 7, 2017 11:51 AM

    saravanatn - Wednesday, December 6, 2017 1:00 AM

    >>In the below table records [sic] are grouped by ID column. If the count for ID is greater than 1 then remove the records [sic] which are having 'NULL" values in any of the name ,city, place columns and remaining columns need to be fetched. If the count for ID=1 even if we have NULL values on any of the columns then include that records [sic] in the output results. <<

    Can you please buy read and study a book on RDBMS and SQL? There is no such thing as a generic "id" in RDBMS. It has to be the identifier of something in particular to be valid. Then on top of that, and identifier can never be a numeric because you don't do math on it.

    You might want to read a book on basic data modeling. Things like "city" are too vague generic to be valid column names. They need to have what the ISO 11179 people call "attribute property" to be valid. And you're "name" is one of those attribute properties; it has to be the name of something.

    A table by definition, must have a primary key. But this deck of punch cards doesn't have a key and can never have any keys! Let's at least get the column names and how many datatypes correct. I picked a length of 35 for the names based on the International Postal Union requirements for mailing labels. I'm willing to bet that you made up the length of 100, with no research whatsoever. This is not how to be a database professional

    CREATE TABLE Details
    (detail_id CHAR(3)NOT NULL PRIMARY KEY,
    something_name VARCHAR(35),
    city_name VARCHAR(35),
    place_name VARCHAR(35)
    );

    If you have valid data, then you should be able to use the current table constructor insertion syntax. That would look like this. People who think in sets instead of in punchcards really like it. It lets the optimizer do some things, instead of forcing a sequential insertion. But you're trying put garbage in a table. To do this, you of course have to screwup the DDL for that table you have valid data, then you should be able to use the current table constructor insertion syntax. That would look like this. People who think in sets instead of in punchcards really like it. It lets the optimizer do some things, instead of forcing a sequential insertion. But you're trying put garbage in a table. To do this, you of course have to screwup the DDL for that table

    INSERT INTO Details
    VALUES
    (1, 'sara', 'chennai', 't.nagar'),
    (1, NULL, NULL, NULL), -- dup key
    (1, 'sara', 'chennai', 't.nagar'), --- dup row!

    (2, 'sara', 'chennai', 't.nagar'),
    (2, NULL, NULL, NULL), ---dup key

    (3, 'sara', 'chennai', 't.nagar'),

    (4, NULL, NULL, NULL),

    (5, 'raj', 'vellore', 'kodambakkam'),
    (5, 'raj', 'vellore', NULL); ---dup key

    What are you trying to actually do? My first impulse is that you need to have stored procedure that does the row update for Details.

    CREATE PROCEDURE Fill_Details
    (@in_detail_id CHAR(3), @in_something_name VARCHAR(35),
    @in_city_name VARCHAR(35), @in_place_name )
    BEGIN
    UPDATE Details
    SET something_name = COALESCE(something_name, @in_something_name),
      city_name = COALESCE (city_name, @in_city_name),
      place_name = COALESCE (place_name, @in_place_name)
    WHERE detail_id = @in_detail_id;
    -- insert new row, if it does not exist
    END;

    ]What are you trying to actually do? My first impulse is that you need to have stored procedure that does the row update for Details.CREATE CREATE PROCEDURE Fill_Details(@in_detail_id CHAR(3), @in_something_name VARCHAR(35), @in_city_name VARCHAR(35), @in_place_name )
    BEGIN
    UPDATE Details
    SET something_name = COALESCE(something_name, @in_something_name),  
     city_name = COALESCE (city_name, @in_city_name),
     place_name = COALESCE (place_name, @in_place_name
    )WHERE detail_id = @in_detail_id;-- insert new row, if it does not exist
    END;

    Thanks Celko. I agree with you. We are moving the records from Oracle to Hive(Hadoop), which is similar to SQL. In hive there is no update concept. In hive(for old version) only insert is available .Every time a record came it is inserted (not updated) in hive and  I am fetching the records in hive  using ID(primary key) and Max(run_time) for that also  we are getting multiple records for the particular ID.we are some how getting null values as well  in Hive to eliminate the null record I posted the sample data.
    What Thorn suggested almost worked.

    Regards,
    Saravanan

    Saravanan

Viewing 15 posts - 1 through 15 (of 16 total)

You must be logged in to reply to this topic. Login to reply