Comparing dates from multiple rows of duplicate data

  • Hi

    I have a table in my database where there is a unique identifier of InvoiceNumber (this is not the PK for the table).

    I would like to report on duplicate invoices that were created within a time frame and ignore duplicate invoices outside of the timeframe.

    An invoice can legitimately exist 1 or more times with the exception of when the created date between the duplicate invoices with is < a certain threshold (say 1 hour).

    The only differentiating data for a duplicate invoice is table.ID (PK) and Created date of the invoice. All other data for a duplicate invoice is the same.

    For Example:

    Example 1:

    Legitimate Duplicate Invoice

    Table=Invoice

    Rows:

    ID | invoicenumber | Created

    ------------------------------

    1 | 125870808 | 2012-12-18 15:45:30.000

    2 | 125870808 |2012-12-18 12:23:27.000

    3 | 125870808 | 2012-12-18 06:44:06.000

    The above example is legit because the time between invoice.created is greater than say 1 hour.

    Example 2

    Illegitimate Duplicate Invoice

    ID | invoicenumber | Created

    ------------------------------

    4 | 125870808 | 2012-12-18 15:45:30.000

    5 | 125870808 |2012-12-18 15:23:27.000

    6 | 125870808 | 2012-12-18 15:10:06.000

    In example 2 the invoices are duplicated because the created date Subsequent rows starting from the first created is < 1 hour before the next invoice number that is created.

    Any help would be appreciated.

    Cheers

    Neal

  • Try the following:

    --Load sample data

    CREATE TABLE #Invoice (ID INT, InvoiceNumber INT, Created DATETIME)

    INSERT INTO #Invoice

    SELECT 6,125870808, '2012-12-18 15:45:30.000'

    UNION ALL

    SELECT 5,125870808, '2012-12-18 12:23:27.000'

    UNION ALL

    SELECT 4,125870808, '2012-12-18 06:45:06.000'

    UNION ALL

    SELECT 3,125870808, '2012-12-18 06:44:30.000'

    UNION ALL

    SELECT 2,125870808, '2012-12-18 06:23:27.000'

    UNION ALL

    SELECT 1,125870808, '2012-12-18 06:10:06.000'

    -----

    --Use CTE to Identify first instance of an invoice.

    ;WITH Invoice AS(

    SELECT ID

    , InvoiceNumber

    , Created

    , [Instance] = ROW_NUMBER() OVER (PARTITION BY InvoiceNumber ORDER BY Created ASC)

    FROM #Invoice

    )

    SELECT

    a.ID

    , a.InvoiceNumber

    , a.Created

    , b.Created AS FirstInvoiceCreated

    , TimeDiffInMinutes = DATEDIFF(Minute , b.Created , a.Created)

    FROM Invoice a

    INNER JOIN(SELECT InvoiceNumber, Created FROM Invoice WHERE Instance = 1 ) b --Compare against first instance of invoice:

    ON a.InvoiceNumber = b.InvoiceNumber

    AND DATEDIFF(HOUR , b.Created , a.Created) < 1 --Only diff between first instance of invoice number is less than 1 hour

    WHERE a.Instance <> 1 -- Only compare invoices which occurred after first.

    --Cleanup temp table

    DROP TABLE #Invoice

  • Grinja (12/21/2012)


    Try the following:

    --Load sample data

    CREATE TABLE #Invoice (ID INT, InvoiceNumber INT, Created DATETIME)

    INSERT INTO #Invoice

    SELECT 6,125870808, '2012-12-18 15:45:30.000'

    UNION ALL

    SELECT 5,125870808, '2012-12-18 12:23:27.000'

    UNION ALL

    SELECT 4,125870808, '2012-12-18 06:45:06.000'

    UNION ALL

    SELECT 3,125870808, '2012-12-18 06:44:30.000'

    UNION ALL

    SELECT 2,125870808, '2012-12-18 06:23:27.000'

    UNION ALL

    SELECT 1,125870808, '2012-12-18 06:10:06.000'

    -----

    --Use CTE to Identify first instance of an invoice.

    ;WITH Invoice AS(

    SELECT ID

    , InvoiceNumber

    , Created

    , [Instance] = ROW_NUMBER() OVER (PARTITION BY InvoiceNumber ORDER BY Created ASC)

    FROM #Invoice

    )

    SELECT

    a.ID

    , a.InvoiceNumber

    , a.Created

    , b.Created AS FirstInvoiceCreated

    , TimeDiffInMinutes = DATEDIFF(Minute , b.Created , a.Created)

    FROM Invoice a

    INNER JOIN(SELECT InvoiceNumber, Created FROM Invoice WHERE Instance = 1 ) b --Compare against first instance of invoice:

    ON a.InvoiceNumber = b.InvoiceNumber

    AND DATEDIFF(HOUR , b.Created , a.Created) < 1 --Only diff between first instance of invoice number is less than 1 hour

    WHERE a.Instance <> 1 -- Only compare invoices which occurred after first.

    --Cleanup temp table

    DROP TABLE #Invoice

    A little complicated for my tastes. Using the sample data you knocked up, how about this: -

    SELECT *

    FROM #Invoice a

    CROSS APPLY (SELECT TOP 1 b.Created,

    DATEDIFF(MINUTE, b.Created, a.Created)

    FROM #Invoice b

    WHERE b.InvoiceNumber = a.InvoiceNumber

    AND DATEDIFF(HOUR, b.Created, a.Created) < 1

    AND b.Created < a.Created

    ORDER BY b.Created ASC

    ) ab(FirstInvoiceCreated,TimeDiffInMinutes);


    Forever trying to learn
    My blog - http://www.cadavre.co.uk/
    For better, quicker answers on T-SQL questions, click on the following...http://www.sqlservercentral.com/articles/Best+Practices/61537/
    For better, quicker answers on SQL Server performance related questions, click on the following...http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

Viewing 3 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply