Help please - similar names

  • I need a T-SQL script that:

    looks up a Account.Account field and sees whether the first 10 characters of any two accounts are the same

    I done it by...

    SELECT T1.Account, T2.NoOfMatches

    FROM Account T1

    INNER JOIN

    (

    SELECT SUBSTRING( Account, 1, 10 ) AS Account,

    COUNT( * ) NoOfMatches

    FROM Account

    GROUP BY SUBSTRING( Account, 1, 10 )

    HAVING COUNT( * ) > 1

    ) T2 ON T2.Account = SUBSTRING( T1.Account, 1, 10 )

    The resulting table has:

    (below is suppose to be a table)

    Account----l-----NoOfMatches

    Agcert-----l-----3

    AgCert----l-----3

    agcert-----l-----3

    Is there now way of the result just being???

    Account---l---NoOfMatches

    Agcert----l----3

    Dipesh---l-----4

    etc.......

    therefore grouping the reults?

    Can anybody help me.

  • well - you can convert them all to upper case during the grouping. You're already hurting performance with the function in the group by, so it might not stay the fastest query in the mix.

    The better method would probably be to standardize them to either upper or lower case once and for all (i.e. updating the values). If that's not an option, then simply doing:

    SELECT DISTINCT upper(T1.Account), T2.NoOfMatches

    FROM Account T1

    INNER JOIN

    (

    SELECT upper(SUBSTRING( Account, 1, 10 )) AS Account,

    COUNT( * ) NoOfMatches

    FROM Account

    GROUP BY upper(SUBSTRING( Account, 1, 10 ) )

    HAVING COUNT( * ) > 1

    ) T2 ON T2.Account = SUBSTRING( T1.Account, 1, 10 )

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Won't you get the desired result if you only run the query of the derived table?

    Because you joining that back to Accounts it repeats for each account that starts with the same few characters, if you don't join back to the account and only run the code of the derived table then it should not be repeated?

    SELECT upper(SUBSTRING( Account, 1, 10 )) AS Account,

    COUNT( * ) NoOfMatches

    FROM Account

    GROUP BY upper(SUBSTRING( Account, 1, 10 ) )

    HAVING COUNT( * ) > 1

    Or maybe I'm missing something?

  • Thank you Matt,

    i think i will be using Johannes's script.

  • Here is a different option that doesn't require using upper. I only mention it because I think its really cool 🙂

    SELECT SUBSTRING( Account, 1, 10 ) collate SQL_Latin1_General_CP437_CI_AS AS Account,

    COUNT( 1 ) NoOfMatches

    FROM Account

    GROUP BY SUBSTRING( Account, 1, 10 ) collate SQL_Latin1_General_CP437_CI_AS

    HAVING COUNT( 1 ) > 1

    It sounds like your default collation on your server is case sensative. If you specify a case insensative collation (above is just an example) it will do the grouping as case insensative. Just be careful to use your same collation (its in the properties on the server) and change it from CS to CI. That way you make sure you don't have any other little surprizes in your query.

    Also note I changed the count from COUNT( * ) to COUNT( 1 ) its a little bit faster.

    Kenneth

    Kenneth FisherI was once offered a wizards hat but it got in the way of my dunce cap.--------------------------------------------------------------------------------For better, quicker answers on T-SQL questions, click on the following... http://www.sqlservercentral.com/articles/Best+Practices/61537/[/url]For better answers on performance questions, click on the following... http://www.sqlservercentral.com/articles/SQLServerCentral/66909/[/url]Link to my Blog Post --> www.SQLStudies.com[/url]

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply