Alternative to Distinct

  • Okay so we all know the select distinct {column name} from {Table name} query which returns all none duplicates for the column you are searching against, basic stuff, then what you want to do is add some filters, so you append a where {column name} like/= {value}.

    Okay so what if you are returned rows where the first few words are duplicated like;

    Tom Jones 1

    Tom Jones 2

    Tom Jones 3

    Lucy Lane

    Joe Soap

    Mercy Moe 1

    Mercy Moe 2

    By looking at the results, you can see that there are duplicates, how on earth would you draw out results that would only list the exceptions such as in the case of old Tom Jones and Mercy Moe only and leave out the rest?

     

  • Tony

    How about creating a view (indexed if necessary) that shows the name column with all non-alphabetic characters stripped from the end?  Then you can do a select distinct against the view.

    John

  • You wouldn't need to use a view. You could just say:

    select distinct dbo.uf_stripnonalpha(colname)

    from table

    --Rob

    Rob Farley
    LobsterPot Solutions & Adelaide SQL Server User Group
    Company: http://www.lobsterpot.com.au
    Blog: http://blogs.lobsterpot.com.au

  • That would be easier.  But bear in mind that any indexes you have on colname would not be used.

    John

  • True. But if you wanted indexing, you could use a computed column and index (and query) that.

    Rob Farley
    LobsterPot Solutions & Adelaide SQL Server User Group
    Company: http://www.lobsterpot.com.au
    Blog: http://blogs.lobsterpot.com.au

  • This effort is part of a data normalisation project.

    Indexing aside, both queries you have placed return distict lists, this is great, however I need to show the items that match this condition whereby Tom Jones is shown more than once irrispective that he has a numberic suffix. So a distinct select that only show alphanumeric, will not give me the exceptions

  • Tony

    How about:

    select colname where colname <> dbo.uf_stripnonalpha(colname)

    from table

    John

  • Hi all,

    Tony - it sounds like you only want the ones that have a count of more than one, so maybe you need something like this...

    select dbo.uf_stripnonalpha(colname) from table group by dbo.uf_stripnonalpha(colname) having count(*) > 1

    Ryan Randall

    Solutions are easy. Understanding the problem, now, that's the hard part.

  • Or:

    select

    t1.vc_col, V.stub

    from

    tbl t

    join

    (select distinct
     substring(t.vc_col,len(t.vc_col)-patindex('%[0-9] %',reverse(t.vc_col)),20) stub
     from tbl t) V
    on substring(t.vc_col, len(t.vc_col)-patindex('%[0-9] %',reverse(t.vc_col)),20) like V.stub + '%'

    Tim Wilkinson

    "If it doesn't work in practice, you're using the wrong theory"
    - Immanuel Kant

  • That may work, I will give it a try. I think that I need to be careful here, as I am expecting duplicates in any event, the duplicates with the data set are addresses (Pysical address) and first I am stripping off the first numerics to get the street name, to validate against the rest of the address. Then I need to make sure that there are no numerics post the street name.

    So Data looks like this right now.

    1 West Ave

    1 West Ave 1

    1 West Ave 2

    etc, they are grouped well, what I need to do is find these types occurences and mark them as exceptions for manual intervention and validation. I am told that this is normal, as the users on the old system were too lazy to find the address, when they tried to enter new data with that address, they were kicked out because of an illigal address entry, so they enter in the 1, 2, 3 to overcome the issue. I am creating a new system for them where they have to select via other known data and the address is entered in automatically for them.

     

  • The code I've given should find the last occurrence of a space followed by a number, and remove it. Looking at the code again, it's wrong. Try:.

    select

    t1.vc_col, V.stub

    from

    tbl t

    join

    (

    select distinct

    substring(t.vc_col,1,len(t.vc_col)-patindex('%[0-9] %',reverse(t.vc_col))) stub

    from tbl t) V

    on

    substring(t.vc_col,1,len(t.vc_col)-patindex('%[0-9] %',reverse(t.vc_col))) like V.stub + '%'

    where

    patindex('%[0-9] %',reverse(t.vc_col)) > 0

    this will only affect records where the vc_col has a space followed by a numeral. Obviously you might want to make many other refinements.

    Tim Wilkinson

    "If it doesn't work in practice, you're using the wrong theory"
    - Immanuel Kant

  • sorry, more corrections:
     
    select t.vc_col, V.stub

    from

    tbl t

    join

    (

    select distinct

    substring(t.vc_col,1,len(t.vc_col)-patindex('%[0-9] %',reverse(t.vc_col))) stub

    from tbl t) V

    on

    substring(t.vc_col,1,len(t.vc_col)-patindex('%[0-9] %',reverse(t.vc_col))) = V.stub

    --or:
    --on t.vc_col like V.stub + ' [0-9]%'

    where

    patindex('%[0-9] %',reverse(t.vc_col)) > 0

    further mistakes are left as an exercise for the reader.

    Tim Wilkinson

    "If it doesn't work in practice, you're using the wrong theory"
    - Immanuel Kant

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply