Alternative to Distinct

Question

Alternative to Distinct

Tony Trollip

SSC Veteran

Points: 228
More actions
May 24, 2006 at 1:38 am

#112416

Okay so we all know the select distinct {column name} from {Table name} query which returns all none duplicates for the column you are searching against, basic stuff, then what you want to do is add some filters, so you append a where {column name} like/= {value}.
Okay so what if you are returned rows where the first few words are duplicated like;
Tom Jones 1
Tom Jones 2
Tom Jones 3
Lucy Lane
Joe Soap
Mercy Moe 1
Mercy Moe 2
By looking at the results, you can see that there are duplicates, how on earth would you draw out results that would only list the exceptions such as in the case of old Tom Jones and Mercy Moe only and leave out the rest?

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply

John Mitchell-245523 SSC Guru Points: 148809 More actions · Answer 1

Tony

How about creating a view (indexed if necessary) that shows the name column with all non-alphabetic characters stripped from the end? Then you can do a select distinct against the view.

John

Rob Farley SSCrazy Points: 2739 More actions · Answer 2

You wouldn't need to use a view. You could just say:

select distinct dbo.uf_stripnonalpha(colname)

from table

--Rob

Rob Farley
LobsterPot Solutions & Adelaide SQL Server User Group
Company: http://www.lobsterpot.com.au
Blog: http://blogs.lobsterpot.com.au

John Mitchell-245523 SSC Guru Points: 148809 More actions · Answer 3

That would be easier. But bear in mind that any indexes you have on colname would not be used.

John

Rob Farley SSCrazy Points: 2739 More actions · Answer 4

True. But if you wanted indexing, you could use a computed column and index (and query) that.

Rob Farley
LobsterPot Solutions & Adelaide SQL Server User Group
Company: http://www.lobsterpot.com.au
Blog: http://blogs.lobsterpot.com.au

Tony Trollip SSC Veteran Points: 228 More actions · Answer 5

This effort is part of a data normalisation project.

Indexing aside, both queries you have placed return distict lists, this is great, however I need to show the items that match this condition whereby Tom Jones is shown more than once irrispective that he has a numberic suffix. So a distinct select that only show alphanumeric, will not give me the exceptions

John Mitchell-245523 SSC Guru Points: 148809 More actions · Answer 6

Tony

How about:

select colname where colname <> dbo.uf_stripnonalpha(colname)

from table

John

RyanRandall SSChampion Points: 13623 More actions · Answer 7

Hi all,

Tony - it sounds like you only want the ones that have a count of more than one, so maybe you need something like this...

select dbo.uf_stripnonalpha(colname) from table group by dbo.uf_stripnonalpha(colname) having count(*) > 1

Ryan Randall

Solutions are easy. Understanding the problem, now, that's the hard part.

stax68 SSChampion Points: 11711 More actions · Answer 8

Or:

select

t1.vc_col, V.stub

from

tbl t

join

(select distinct

substring(t.vc_col,len(t.vc_col)-patindex('%[0-9] %',reverse(t.vc_col)),20) stub

from tbl t) V

on substring(t.vc_col, len(t.vc_col)-patindex('%[0-9] %',reverse(t.vc_col)),20) like V.stub + '%'

Tim Wilkinson

"If it doesn't work in practice, you're using the wrong theory"
- Immanuel Kant

Tony Trollip SSC Veteran Points: 228 More actions · Answer 9

That may work, I will give it a try. I think that I need to be careful here, as I am expecting duplicates in any event, the duplicates with the data set are addresses (Pysical address) and first I am stripping off the first numerics to get the street name, to validate against the rest of the address. Then I need to make sure that there are no numerics post the street name.

So Data looks like this right now.

1 West Ave

1 West Ave 1

1 West Ave 2

etc, they are grouped well, what I need to do is find these types occurences and mark them as exceptions for manual intervention and validation. I am told that this is normal, as the users on the old system were too lazy to find the address, when they tried to enter new data with that address, they were kicked out because of an illigal address entry, so they enter in the 1, 2, 3 to overcome the issue. I am creating a new system for them where they have to select via other known data and the address is entered in automatically for them.

stax68 SSChampion Points: 11711 More actions · Answer 10

The code I've given should find the last occurrence of a space followed by a number, and remove it. Looking at the code again, it's wrong. Try:.

select

t1.vc_col, V.stub

from

tbl t

join

(

select distinct

substring(t.vc_col,1,len(t.vc_col)-patindex('%[0-9] %',reverse(t.vc_col))) stub

from tbl t) V

on

substring(t.vc_col,1,len(t.vc_col)-patindex('%[0-9] %',reverse(t.vc_col))) like V.stub + '%'

where

patindex('%[0-9] %',reverse(t.vc_col)) > 0

this will only affect records where the vc_col has a space followed by a numeral. Obviously you might want to make many other refinements.

Tim Wilkinson

"If it doesn't work in practice, you're using the wrong theory"
- Immanuel Kant

stax68 SSChampion Points: 11711 More actions · Answer 11

sorry, more corrections:

select t.vc_col, V.stub

from

tbl t

join

(

select distinct

substring(t.vc_col,1,len(t.vc_col)-patindex('%[0-9] %',reverse(t.vc_col))) stub

from tbl t) V

on

substring(t.vc_col,1,len(t.vc_col)-patindex('%[0-9] %',reverse(t.vc_col))) = V.stub

--or:

--on t.vc_col like V.stub + ' [0-9]%'

where

patindex('%[0-9] %',reverse(t.vc_col)) > 0

further mistakes are left as an exercise for the reader.

Tim Wilkinson

"If it doesn't work in practice, you're using the wrong theory"
- Immanuel Kant