removing duplicates

Question

Post reply

removing duplicates

dav06200ghi

Grasshopper

Points: 18
More actions
April 1, 2015 at 3:10 am

#100579

Hello,
I appeal for help on a sql method to eliminate values considered as "duplicate" on each input rows in a table.
In summary here is the schematics:
A table that collects data "Service" 40 orders (services) possible by invoice
N°Invoice - Service1 - Service2 - Service3 - Service4- Service5 - up to 40 .... Etc...
000001 Null VAA VAA SFG SFG
000002 YBB YBB SFU Null Null
000003 Null Null SAE SFU SFU
000004 VDP VDP VBB VBB VAA
etc ....
The goal is to get this following table B
N°invoice - Service1 - Service2 - Service3 - Service4 - Service5 ........ etc ..
000001 VAA Null Null Null SFG
000002 Null Null Null YBB SFU
000003 Null Null Null SAE SFU
000004 VDP VBB Null Null VAA
Invoices must be addressed once to the service because we just need one validation by services for all no matter if it is mentioned 2, 5 or 10 times on the invoice.
The brutal solution that I found : (ex: to test the update service3, we have to look in a temp table the 2 previous values):
with temp as (select N°facture,service1 as serv1, service2 as serv2) from table A
insert into table B (N°facture,Service3)
select a.N°Invoice, a.Service3 as serv3
from TableA a
Left Join temp
on temp.N°Invoice=a.N°facture
Where serv1<>a.service3 And serv2<>a.service3
The problem with this method is that you test the 40 services one by one .. that becomes undrinkable writing from the 6th or 7th already ...:blink:
I think there must be a faster method (rownum..)

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply

pietlinden SSC Guru Points: 63503 More actions · Answer 1

In a word, Normalize your table. Then this is stupid easy. Repeating fields are going to make your life miserable.

Luis Cazares SSC Guru Points: 183706 More actions · Answer 2

As previously stated, you need to normalize your table. If you can't change the schema, you still need to code the normalization and denormalize again. Here's a sample on how to do it. It will be easier to complete it than your previous option.

CREATE TABLE #Sample(

Invoice char(6),

Service1 char(6), Service2 char(6), Service3 char(6), Service4 char(6), Service5 char(6))

INSERT INTO #Sample

VALUES

('000001', Null, 'VAA', 'VAA', 'SFG', 'SFG'),

('000002', 'YBB', 'YBB', 'SFU', Null, Null),

('000003', Null, Null, 'SAE', 'SFU', 'SFU'),

('000004', 'VDP', 'VDP', 'VBB', 'VBB', 'VAA');

WITH CTE AS(

SELECT Invoice,

MIN(ServiceID) ServiceID,

Value

FROM #Sample

CROSS APPLY (SELECT 1, Service1 UNION ALL

SELECT 2, Service2 UNION ALL

SELECT 3, Service3 UNION ALL

SELECT 4, Service4 UNION ALL

SELECT 5, Service5)x(ServiceID,Value)

GROUP BY Invoice, Value

)

SELECT Invoice,

MAX( CASE WHEN ServiceID = 1 THEN Value END) Service1,

MAX( CASE WHEN ServiceID = 2 THEN Value END) Service2,

MAX( CASE WHEN ServiceID = 3 THEN Value END) Service3,

MAX( CASE WHEN ServiceID = 4 THEN Value END) Service4,

MAX( CASE WHEN ServiceID = 5 THEN Value END) Service5

FROM CTE

GROUP BY Invoice;

GO

DROP TABLE #Sample

Luis C.
General Disclaimer:
Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

How to post data/code on a forum to get the best help: Option 1 / Option 2

Eric M Russell SSC Guru Points: 125561 More actions · Answer 3

You can prevent duplication of service codes by leveraging an INSTEAD OF INSERT, UPDATE trigger where each service column is conditionally populated depending on whether a preceeding service column has the same code. For example:

CREATE TABLE MyInvoice

(

Invoice char(6) not null primary key

, Service1 char(6) null

, Service2 char(6) null

, Service3 char(6) null

, Service4 char(6) null

, Service5 char(6) null

);

GO

CREATE TRIGGER MyInvoiceTrigger on MyInvoice

INSTEAD OF INSERT

AS

BEGIN

INSERT INTO MyInvoice SELECT

Invoice

, Service1

, case when Service2

not in (isnull(Service1,''))

then Service2 else null end

, case when Service3

not in (isnull(Service1,''),isnull(Service2,''))

then Service3 else null end

, case when Service4

not in (isnull(Service1,''),isnull(Service2,''),isnull(Service3,''))

then Service4 else null end

, case when Service5

not in (isnull(Service1,''),isnull(Service2,''),isnull(Service3,''),isnull(Service4,''))

then Service5 else null end

FROM inserted;

END;

GO

insert into MyInvoice values ('001','A','B',null,null,null);

insert into MyInvoice values ('002','A','B','A','C',null);

insert into MyInvoice values ('003','A','B',null,'C','A');

select * from MyInvoice;

Invoice Service1 Service2 Service3 Service4 Service5

001 A B NULL NULL NULL

002 A B NULL C NULL

003 A B NULL C NULL

"Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

dav06200ghi Grasshopper Points: 18 More actions · Answer 4

Many thanks for your reply.

And yes it s not possible to normalize as the data come from pdf captured areas from invoices.

Each invoices could have 40 services it s depend how many purchase order are booked on it.

And the ’null’ result correspond to a normal situation .

The service code show a discrepencie between the PO and the invoice.

So the manager need to received it and approved . So if he had 10 discrepencies he just need to received the document once.

That why i need to remouving duplicates by invoices...