Eliminating duplicates without distinct

Question

Post reply

Eliminating duplicates without distinct

JMarx

Default port

Points: 1435
More actions
August 16, 2006 at 9:42 am

#114606

I have an insert…select statement that is violating the constraints of an index. I believe the distinct statement cannot do its thing because the primary keys are less constraining than the index.
Here's the simplified model:
--------------------
CREATE TABLE [Table1] (
[STNO] [float] NOT NULL ,
[SGNO] [float] NOT NULL ,
[SGUB] [char] (10) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL ,
[ACNO] [float] NOT NULL);
CREATE UNIQUE INDEX [Table1_2] ON [Table1]([STNO], [SGUB], [ACNO]) WITH FILLFACTOR = 90 ON [PRIMARY];
--------------------
Here's the procedure:
--------------------
declare @Numbers table
(
[Year]int,
[Segment]float(8)
);
insert into @Numbers values (2006, 10);
insert into @Numbers values (2005, 11);
insert into @Numbers values (2004, 12);
insert into @Numbers values (2003, 13);
declare @stno float(8)
declare @amt1 float;
declare @amt2 float;
set @stno = 936 --Task Selection #
set @amt1 = 1;
set @amt2 = 9999999900;
insert into table1 ( stno, sgno, sgub, acno)
select distinct @stno, n.[Segment], ' ', acno
fromtable2 [t2] with (nolock)
inner join
table3 [t3] with (nolock)
on t2.acno = t3.acno and t3.pldg = 'WPC' and t3.plst = 'A'
inner join
xDateDimension
on t2.trdt = [Proprietary Date]
inner join
@Numbers n
on [Calendar Year] = n.[Year]
where t2.acno not in ( select t1.acno
from table1 [t1] with (nolock)
where t1.stno = @stno
and
t1.acno = t2.acno
)
group by n.[Year], t2.acno, n.[Segment]
having sum(t2.tamt) between @amt1 and @amt2;
--------------------
Finally, here's a sample result set:
stno,sgno,sgub,acno
936,13, ,917
936,13, ,924
936,12, ,985
936,13, ,985
936,13, ,1073
936,12, ,1114
936,11, ,1169
936,10, ,1169
936,13, ,1223
936,11, ,1395
936,13, ,1398
Note that, for the purposes of this query, stno will always be the same; sgno will be limited to 10,11,12,13.
Basically, I'm looking for a way to eliminate the dupes without resorting to cursors or recursion.
..remy.

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply

Kenneth Wilhelmsson SSC-Dedicated Points: 30043 More actions · Answer 1

I don't get it.

936,12, ,985

936,13, ,985

Are you saying that you're getting a result like this, but it's not what you want/need?

If so, why is sgno included at all?

Which of the two is the 'correct' one?

If it's either, I guess you can use MIN(sgno) or MAX(sgno) in your query to get rid of the 'dupes'.

/Kenneth

John Dempsey Hall of Fame Points: 3225 More actions · Answer 2

I also do not see any dupes in your above sample. As Kenneth said:

936,12, ,985

936,13, ,985

that isn't what you want to receive? The DISTINCT clause based on your sample it is working as intended. The above example is distinct. If you are getting dupes that you have not shown us in your sample please give another example showing us what is being duplicated.

PW-201837 SSC-Insane Points: 20805 More actions · Answer 3

Why on earth would you index a float data type ?

A float is an approximate number, indexing it makes no sense.

Vladan SSC-Insane Points: 22072 More actions · Answer 4

John,

there are no duplicities in sample data, but rows of the sample violate UNIQUE INDEX, because that index does not include SGNO:

CREATE UNIQUE INDEX [Table1_2] ON [Table1]([STNO], [SGUB], [ACNO]) WITH FILLFACTOR = 90 ON [PRIMARY];

That's why Jeremy said that DISTINCT wouldn't work - and he is right.

Float data type is really strange, it almost looks to me like table imported from Access... are you sure you need that? What about changing the data type?

As to inserts, I suppose you have to cope with two problems:

1) not insert a row if row with the same values in [STNO], [SGUB], [ACNO] is already present

2) avoid inserting several rows with the same values in [STNO], [SGUB], [ACNO]

To solve these problems, you need a precise definition first. What should happen when this occurs? Which of the rows is "correct"?

In case of 1), you can use NOT EXISTS in WHERE clause - if you want to keep the previously inserted row and discard the new one.

In case of 2), a GROUP BY on all columns except MAX(sgno) (or MIN(sgno)) could be used.

However, all this seems strange to me. You create some rows, and then just abandon half of them because they would violate an index? There seems to be serious design flaw if this is happening.