Copying and comparing data

Question

Copying and comparing data

doug 40899

SSCommitted

Points: 1656
More actions
November 5, 2010 at 10:38 am

#100745

I have a query that I run daily against a database that is dumped to me nightly. I want to be able to populate a new database called Newclients and avoid duplicates. This is the query that I'm using:
insert into msbtotal.dbo.newclients
SELECT tcms_members.dbo.memberdata2.* FROM tcms_members.dbo.memberdata2 left outer join
msbtotal.dbo.memberdata on tcms_members.dbo.memberdata2.id =
msbtotal.dbo.memberdata.id where msbtotal.dbo.memberdata.id is
null
GO
SET QUOTED_IDENTIFIER OFF
GO
SET ANSI_NULLS ON
GO
Here is some of my sample data:
6008272 Doctor A Doctor A, MD CHP UT SOUTHWESTERN PSYCHIATRY RES (512) 333-3333 (512) 333-3333 Sep 25 1977 12:00AM 1212 MockingBird Lane Austin TX 78737 2010-10-01 09:52:38.000 2005-06-06 00:00:00.000 NULL
6008272 Doctor A Doctor A, MD CHP UT SOUTHWESTERN PSYCHIATRY RES (512) 333-3333 (512) 333-3333 Sep 25 1977 12:00AM 1212 MockingBird Lane Austin TX 78737 2010-10-01 09:52:38.000 2005-06-06 00:00:00.000 NULL
6008272 Doctor A Doctor A, MD CHP UT SOUTHWESTERN PSYCHIATRY RES (512) 333-3333 (512) 333-3333 Sep 25 1977 12:00AM 1212 MockingBird Lane Austin TX 78737 2010-10-01 09:52:38.000 2005-06-06 00:00:00.000 NULL
6008272 Doctor A Doctor A, MD CHP UT SOUTHWESTERN PSYCHIATRY RES (512) 333-3333 (512) 333-3333 Sep 25 1977 12:00AM 1212 MockingBird Lane Austin TX 78737 2010-10-01 09:52:38.000 2005-06-06 00:00:00.000 NULL
6009837 Some Person Some Person, MD, MPH P UT SOUTHWESTERN PSYCHIATRY RES (512) 555-5555 (512) 555-5555 Jun 26 1975 12:00AM 3501 Residence Ave. Austin TX 78731 2010-09-29 11:53:30.000 2006-06-09 00:00:00.000 2007-11-28 00:00:00.000
6009837 Some Person Some Person, MD, MPH P UT SOUTHWESTERN PSYCHIATRY RES (512) 555-5555 (512) 555-5555 Jun 26 1975 12:00AM 3501 Residence Ave. Austin TX 78731 2010-09-29 11:53:30.000 2006-06-09 00:00:00.000 2007-11-28 00:00:00.000
As you can see, there is duplicate data, and I need only one copy of each "new" record in my database. Can someone please assist me on this.
Doug

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply

LutzM SSC Guru Points: 107049 More actions · Answer 1

several options

1) change the data source to provide clean data to begin with (I know, most probably unlikely to happen, but that's the ideal scenario 😉 )

2) SELECT DISTINCT tcms_members.dbo.memberdata2.* instead of SELECT tcms_members.dbo.memberdata2.*

3) use GROUP BY your_list_of_columns

I'd go with option 2 ...

As a side note: instead of using INSERT INTO target SELECT * you should reference the target and source column names. This is a little more work at the beginning but it'll pay off later on (e.g. if either the source or the target will have additional columns or even a different order of columns)...

Lutz
A pessimist is an optimist with experience.

How to get fast answers to your question[/url]
How to post performance related questions[/url]
Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

nigelrivett SSCertifiable Points: 5472 More actions · Answer 2

maybe

SELECT distinct tcms_members.dbo.memberdata2.* FROM tcms_members.dbo.memberdata2 left outer join

msbtotal.dbo.memberdata on tcms_members.dbo.memberdata2.id =

msbtotal.dbo.memberdata.id where msbtotal.dbo.memberdata.id is

null

or maybe if they aren't full duplicates something like

select ... from

(

SELECT tcms_members.dbo.memberdata2.*, seq = rownumber() over(partition by tcms_members.dbo.memberdata2.id order by tcms_members.dbo.memberdata2.id )

FROM tcms_members.dbo.memberdata2

left outer join msbtotal.dbo.memberdata

on tcms_members.dbo.memberdata2.id = msbtotal.dbo.memberdata.id

where msbtotal.dbo.memberdata.id is null

) where seq = 1

Cursors never.
DTS - only when needed and never to control.

doug 40899 SSCommitted Points: 1656 More actions · Answer 3

Thank you both for the suggestions. I am going to go with the SELECT DISTINCT option. Now the last question I have is how do I delete all the duplicates that I already have in my table?

LutzM SSC Guru Points: 107049 More actions · Answer 4

Do you have any column in your table to identify a single row? (e.g. an identity column?)

If so, you could use a subquery to get either the min or max id per group and delete all other rows.

Lutz
A pessimist is an optimist with experience.

How to get fast answers to your question[/url]
How to post performance related questions[/url]
Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]