Counting duplicate companies

  • Hi all,

    I am generating a report for duplicate companies in a "Company" table. These are our customers and they're entered in by the sales people. Needless to say, the poorly written CRM app doesn't check for potential dups before letting people create new company profiles, and so all the sales people create tons of dups in the table. Now, I wonder if you guys have any suggestions on ways of doing this?

    What I did, I create a temp table and inserted all the "Company_name" in there that have duplicates.

    INSERT INTO #DupCompany

    SELECT count(*), company_name

    FROM company GROUP BY company_name

    HAVING count(*) > 1

    Next I can selet the company info by SELECTing FROM company WHERE company_id IN #DupCompany.

    Finally, here's the problem ... I want to count the duplicate companies created per month. The trick here is though, the first instance of a company profile shouldn't be included, as it is not a duplicate yet. Any suggestions?

    (Fields you may need: company_id, company_name, created_time.)

    Thanks all.

  • A solution might look something like what is given below. Not knowing your data makes it pretty hard to be sure if this is correct but from some test data that I generated you should get the result you are looking for.

    select

    substring(t1.CompanyName, 1, 20) as CompanyName

    , datename(month, CreatedDT) as Month

    , (

    select count(*)

    from Company t2

    where

    t2.CompanyName = t1.CompanyName

    and t2.CreatedDT > min(t1.CreatedDT)

    and datename(month, t2.CreatedDT) = datename(month, t1.CreatedDT)

    ) as NumberofDupes

    from Company t1

    group by

    t1.CompanyName

    , datename(month, CreatedDT)

    Hope this helps. (Sorry for the appearance but when I paste from QA it always eliminates my tabs.)

    David

    David

    @SQLTentmaker

    “He is no fool who gives what he cannot keep to gain that which he cannot lose” - Jim Elliot

  • I think this fits your bill if not should be close.

    SELECT company_name, datepart(mm,created_time) as dupmonth, count(*) as dupcnt

    FROM company

    LEFT JOIN

    (select min(created_time) mct, company_name cn from company) as tblMins

    ON

    company_name = cn AND

    created_time = mct

    GROUP BY company_name, datepart(mm,created_time)

    WHERE

    cn IS NULL

    You will not need the

    HAVING count(*) > 1

    as the query already eliminates the first created instance for all and only the duplicate instances are left.

Viewing 3 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply