Eliminating duplicates without distinct

  • I have an insert…select statement that is violating the constraints of an index. I believe the distinct statement cannot do its thing because the primary keys are less constraining than the index.

    Here's the simplified model:

    --------------------

    CREATE TABLE [Table1] (

    [STNO] [float] NOT NULL ,

    [SGNO] [float] NOT NULL ,

    [SGUB] [char] (10) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL ,

    [ACNO] [float] NOT NULL);

    CREATE UNIQUE INDEX [Table1_2] ON [Table1]([STNO], [SGUB], [ACNO]) WITH FILLFACTOR = 90 ON [PRIMARY];

    --------------------

    Here's the procedure:

    --------------------

    declare @Numbers table

    (

    [Year]int,

    [Segment]float(8)

    );

    insert into @Numbers values (2006, 10);

    insert into @Numbers values (2005, 11);

    insert into @Numbers values (2004, 12);

    insert into @Numbers values (2003, 13);

    declare @stno float(8)

    declare @amt1 float;

    declare @amt2 float;

    set @stno = 936 --Task Selection #

    set @amt1 = 1;

    set @amt2 = 9999999900;

    insert into table1 ( stno, sgno, sgub, acno)

    select distinct @stno, n.[Segment], ' ', acno

    fromtable2 [t2] with (nolock)

    inner join

    table3 [t3] with (nolock)

    on t2.acno = t3.acno and t3.pldg = 'WPC' and t3.plst = 'A'

    inner join

    xDateDimension

    on t2.trdt = [Proprietary Date]

    inner join

    @Numbers n

    on [Calendar Year] = n.[Year]

    where t2.acno not in ( select t1.acno

    from table1 [t1] with (nolock)

    where t1.stno = @stno

    and

    t1.acno = t2.acno

    )

    group by n.[Year], t2.acno, n.[Segment]

    having sum(t2.tamt) between @amt1 and @amt2;

    --------------------

    Finally, here's a sample result set:

    stno,sgno,sgub,acno

    936,13, ,917

    936,13, ,924

    936,12, ,985

    936,13, ,985

    936,13, ,1073

    936,12, ,1114

    936,11, ,1169

    936,10, ,1169

    936,13, ,1223

    936,11, ,1395

    936,13, ,1398

    Note that, for the purposes of this query, stno will always be the same; sgno will be limited to 10,11,12,13.

    Basically, I'm looking for a way to eliminate the dupes without resorting to cursors or recursion.

    ..remy.

  • I don't get it.

    936,12, ,985

    936,13, ,985

    Are you saying that you're getting a result like this, but it's not what you want/need?

    If so, why is sgno included at all?

    Which of the two is the 'correct' one?

    If it's either, I guess you can use MIN(sgno) or MAX(sgno) in your query to get rid of the 'dupes'.

    /Kenneth

  • I also do not see any dupes in your above sample.  As Kenneth said:

    936,12, ,985

    936,13, ,985

    that isn't what you want to receive?  The DISTINCT clause based on your sample it is working as intended.  The above example is distinct.  If you are getting dupes that you have not shown us in your sample please give another example showing us what is being duplicated.

  • Why on earth would you index a float data type ?

    A float is an approximate number, indexing it makes no sense.

     

  • John,

    there are no duplicities in sample data, but rows of the sample violate UNIQUE INDEX, because that index does not include SGNO:

    CREATE UNIQUE INDEX [Table1_2] ON [Table1]([STNO], [SGUB], [ACNO]) WITH FILLFACTOR = 90 ON [PRIMARY];

    That's why Jeremy said that DISTINCT wouldn't work - and he is right.

    Float data type is really strange, it almost looks to me like table imported from Access... are you sure you need that? What about changing the data type?

    As to inserts, I suppose you have to cope with two problems:

    1) not insert a row if row with the same values in [STNO], [SGUB], [ACNO] is already present

    2) avoid inserting several rows with the same values in [STNO], [SGUB], [ACNO]

    To solve these problems, you need a precise definition first. What should happen when this occurs? Which of the rows is "correct"?

    In case of 1), you can use NOT EXISTS in WHERE clause - if you want to keep the previously inserted row and discard the new one.

    In case of 2), a GROUP BY on all columns except MAX(sgno) (or MIN(sgno)) could be used.

    However, all this seems strange to me. You create some rows, and then just abandon half of them because they would violate an index? There seems to be serious design flaw if this is happening.

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply