Merge count by grouping variables back into source table

  • Summary: I need to identify which records are violating a unique index to get the source file fixed.

    Details: Apologies, I know this should be simple, and I've Googled before posting. I'm used to a different version of SQL where this approach would work:

    select t2.cnt, t1.*

    from #t1 t1

    left join

    (

    select caseid, claimnum, linenum, invoicenum, batchnum, count(*) as cnt

    from #t1

    group by caseid, claimnum, linenum, invoicenum, batchnum

    ) t2

    on t1.caseid=t2.caseid

    and t1.claimnum=t2.claimnum

    and t1.linenum=t2.linenum

    and t1.invoicenum=t2.invoicenum

    and t1.batchnum=t2.batchnum

    having t2.cnt > 1

    I need a count by the grouping (index) variables, then re-join that number into the source data, then filter for count > 1. If my source file is 10K records, the output should be 10K records. I need all source columns in the output to analyze if it really is a duplicate record, or if another column needs to be added to the index.

    The above query returns NULL for cnt for all records.

    If you want a post with sample data to work with just let me know; I thought the above sufficient to explain the issue.

    Thanks...

  • Scott In Sydney (7/15/2014)


    Summary: I need to identify which records are violating a unique index to get the source file fixed.

    Details: Apologies, I know this should be simple, and I've Googled before posting. I'm used to a different version of SQL where this approach would work:

    select t2.cnt, t1.*

    from #t1 t1

    left join

    (

    select caseid, claimnum, linenum, invoicenum, batchnum, count(*) as cnt

    from #t1

    group by caseid, claimnum, linenum, invoicenum, batchnum

    ) t2

    on t1.caseid=t2.caseid

    and t1.claimnum=t2.claimnum

    and t1.linenum=t2.linenum

    and t1.invoicenum=t2.invoicenum

    and t1.batchnum=t2.batchnum

    having t2.cnt > 1

    I need a count by the grouping (index) variables, then re-join that number into the source data, then filter for count > 1. If my source file is 10K records, the output should be 10K records. I need all source columns in the output to analyze if it really is a duplicate record, or if another column needs to be added to the index.

    The above query returns NULL for cnt for all records.

    If you want a post with sample data to work with just let me know; I thought the above sufficient to explain the issue.

    Thanks...

    Hi,

    your idea is correct; you have just made a "typo".

    You perform the grouping in the nested query and not in the main query. Thus you can't use HAVING in the main query.

    So: Just change the last row of your query to use a WHERE instead and it will work properly.

    Good luck!

    /M

Viewing 2 posts - 1 through 1 (of 1 total)

You must be logged in to reply to this topic. Login to reply