SELECT query question

  • I have an invoice table containing AccountNumber,InvoiceNumber,InvoiceDate and RowID.  InvoiceNumber and RowID are always unique values in the whole table.

    Invoices are always created in ascending RowID order meaning that the invoice for an account with the highest RowID will be for the most recent billing period. 

    I want to write a SELECT query to retrieve the most recent and 2nd most recent invoice for each account (TOP 2 per account).  Retrieving the most recent invoice is easy.  I can't think of an elegant way to retrieve the second most recent invoice.

    What are some options?

  • DECLARE @InvoicesToReturn

    SET @InvoicesToReturn = 2 -- ideally this value must be stored in database

    SET ROWCOUNT @InvoicesToReturn

    SELECT RowID

    INTO #SelectedInvoices

    FROM dbo.Invoice

    ORDER BY ....

    SET ROWCOUNT 0

    SELECT I.*

    FROM dbo.Invoice I

    INNER JOIN #SelectedInvoices S ON I.RowId = S.RowId

    _____________
    Code for TallyGenerator

  • I don't know whether I'd call it elegant but this works:

    DECLARE @invoices TABLE (AccountNumber INT, InvoiceNumber INT, InvoiceDate DATETIME, RowID INT)

    INSERT INTO @invoices VALUES (1, 1001, '2007-01-01', 1)

    INSERT INTO @invoices VALUES (3, 3002, '2007-01-01', 2)

    INSERT INTO @invoices VALUES (1, 1003, '2007-01-02', 3)

    INSERT INTO @invoices VALUES (3, 3004, '2007-01-02', 4)

    INSERT INTO @invoices VALUES (2, 2005, '2007-01-03', 5)

    INSERT INTO @invoices VALUES (3, 3006, '2007-01-04', 6)

    -- SELECT * FROM @invoices

    SELECT DISTINCT accountNumber

    FROM @invoices

    DECLARE @top2Invoices TABLE (AccountNumber INT, InvoiceNumber INT, InvoiceDate DATETIME, RowID INT)

    -- First Invoice Per Account

    INSERT INTO @top2Invoices

    SELECT orig.*

    FROM @invoices orig

    INNER JOIN (

     SELECT accountNumber, MAX(invoiceDate) as firstInv

     FROM @invoices

     GROUP BY accountNumber

    )fst

     ON fst.accountNumber = orig.accountNumber

     AND fst.firstInv = orig.invoiceDate

    ORDER BY orig.accountNumber

    -- Second Invoice Per Account

    INSERT INTO @top2Invoices

    SELECT orig.*

    FROM @invoices orig

    INNER JOIN (

     SELECT accountNumber, MAX(invoiceDate) as firstInv

     FROM @invoices

     WHERE rowID NOT IN (SELECT rowID FROM @top2Invoices)

     GROUP BY accountNumber

    )fst

     ON fst.accountNumber = orig.accountNumber

     AND fst.firstInv = orig.invoiceDate

    ORDER BY orig.accountNumber

    SELECT *

    FROM @top2Invoices

    ORDER BY accountNumber, invoiceDate



    Ade

    A Freudian Slip is when you say one thing and mean your mother.
    For detail-enriched answers, ask detail-enriched questions...[/url]

  • SELECT a.AccountNumber,a.InvoiceNumber,a.InvoiceDate,a.RowID

    FROM [InvoiceTable] a

    WHERE a.RowID IN (SELECT TOP 2 b.RowID

        FROM [InvoiceTable] b

        WHERE b.AccountNumber = a.AccountNumber

        ORDER BY b.RowID DESC)

    Far away is close at hand in the images of elsewhere.
    Anon.

  • David,

    Definitely more elegant than mine!! Should've seen that. I went through all sorts of JOIN routes and everything.

     



    Ade

    A Freudian Slip is when you say one thing and mean your mother.
    For detail-enriched answers, ask detail-enriched questions...[/url]

  • This is a little longer but overall the performance should be better than David's the larger the dataset is you are working with.

    The reason is that in Davids example (which is commonly used) must evaluate all records in the dataset individually as they enter the processing buffers and executes the TOP 2 subquery each time.

    This method builds a derived table for the values aliased as i then joins to the dataset evaluating in fewer steps.

    Up to you to determine what works best for your needs but thought I would include as yet another option.

    SELECT

     X.AccountNumber,

     X.InvoiceNumber,

     X.InvoiceDate,

     X.RowID

    FROM

     @invoices X

    INNER JOIN

     (

      SELECT

       a.AccountNumber,

       a.RowID,

       COUNT(*) Cnt

      FROM

       @invoices A

      INNER JOIN

       @invoices B

      ON

       A.AccountNumber = B.AccountNumber AND

       A.RowID <= B.RowID

      GROUP BY

       a.AccountNumber,

       a.RowID

      HAVING

       COUNT(*) <= 2

    ) i

    ON

     X.AccountNumber = i.AccountNumber AND

     X.RowID = i.RowID

  • quote..overall the performance should be better...

    Yes I was thinking of adding another post to indicate that what I posted may not be that efficient for large datasets

    and I have to admit that it is not my solution but one that had been posted on this forum many times

    Another method I sometimes use is to put the values in a temp table with an identity column and select appropriately

    All the methods have merit but as you say 'determine what works best'

    Far away is close at hand in the images of elsewhere.
    Anon.

  • The query below works well enough and is nice and tight.  It takes about 20 seconds on a 250,000 row table on a 2 CPU/4 GB server.  That's OK for a report that is run a few times per day at most.

     

    SELECT a.AccountNumber,a.InvoiceNumber,a.InvoiceDate,a.RowID

    FROM [InvoiceTable] a

    WHERE a.RowID IN (SELECT TOP 2 b.RowID

        FROM [InvoiceTable] b

        WHERE b.AccountNumber = a.AccountNumber

        ORDER BY b.RowID DESC)

  • Thanks for the feedback Tom

    Far away is close at hand in the images of elsewhere.
    Anon.

  • Revision.  So, I found an invoice table with 3.5 million invoices and discovered the query was taking too long - 11 minutes as of when I killed it.

    So, I switched to a different approach using temp tables.  First, I get a list of the most recent approved invoice for each account and store that in a temp table.

    Then, I repeat the query to get the 2nd most recent invoice by excluding those invoices (uniquely identified by InvoiceDirKey) returned by the first query.

    This runs in 50 seconds on the 3.5 million row table.

    IF OBJECT_ID('tempdb..#LATEST_INVX') IS  NOT NULL

    DROP TABLE [#LATEST_INVX]

    IF OBJECT_ID('tempdb..#2ND_LATEST_INVX') IS  NOT NULL

    DROP TABLE [#2ND_LATEST_INVX]

    DECLARE @InvStatus INT

    SELECT @InvStatus=TablDKey FROM WTABLE WHERE TablNumber=13 AND TablName='APPROVED'

    SELECT a.AccountNumber,MAX(a.InvoiceDirKey) AS InvoiceDirKey

      INTO #LATEST_INVX FROM INVX a

     WHERE InvStatus=@InvStatus

     GROUP BY  a.AccountNumber

    CREATE UNIQUE CLUSTERED INDEX [#LATEST_INVX_001] ON [#LATEST_INVX]([InvoiceDirKey])               ON [PRIMARY]

    SELECT a.AccountNumber,MAX(a.InvoiceDirKey) AS InvoiceDirKey

      INTO #2ND_LATEST_INVX FROM INVX a

     WHERE NOT EXISTS (SELECT 1 FROM #LATEST_INVX b WHERE a.InvoiceDirKey=b.InvoiceDirKey)

       AND a.InvStatus=@InvStatus

     GROUP BY  a.AccountNumber

    CREATE UNIQUE CLUSTERED INDEX [#2ND_LATEST_INVX_001] ON [#2ND_LATEST_INVX]([InvoiceDirKey])       ON [PRIMARY]

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply