SELECT query question

Question

Post reply

SELECT query question

Tom John-342103

SSCrazy

Points: 2694
More actions
April 24, 2007 at 3:09 pm

#116572

I have an invoice table containing AccountNumber,InvoiceNumber,InvoiceDate and RowID. InvoiceNumber and RowID are always unique values in the whole table.
Invoices are always created in ascending RowID order meaning that the invoice for an account with the highest RowID will be for the most recent billing period.
I want to write a SELECT query to retrieve the most recent and 2nd most recent invoice for each account (TOP 2 per account). Retrieving the most recent invoice is easy. I can't think of an elegant way to retrieve the second most recent invoice.
What are some options?

Viewing 10 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply

Sergiy SSC Guru Points: 110209 More actions · Answer 1

DECLARE @InvoicesToReturn

SET @InvoicesToReturn = 2 -- ideally this value must be stored in database

SET ROWCOUNT @InvoicesToReturn

SELECT RowID

INTO #SelectedInvoices

FROM dbo.Invoice

ORDER BY ....

SET ROWCOUNT 0

SELECT I.*

FROM dbo.Invoice I

INNER JOIN #SelectedInvoices S ON I.RowId = S.RowId

_____________
Code for TallyGenerator

Adrian Nichols-360275 SSCertifiable Points: 7694 More actions · Answer 2

I don't know whether I'd call it elegant but this works:

DECLARE @invoices TABLE (AccountNumber INT, InvoiceNumber INT, InvoiceDate DATETIME, RowID INT)

INSERT INTO @invoices VALUES (1, 1001, '2007-01-01', 1)

INSERT INTO @invoices VALUES (3, 3002, '2007-01-01', 2)

INSERT INTO @invoices VALUES (1, 1003, '2007-01-02', 3)

INSERT INTO @invoices VALUES (3, 3004, '2007-01-02', 4)

INSERT INTO @invoices VALUES (2, 2005, '2007-01-03', 5)

INSERT INTO @invoices VALUES (3, 3006, '2007-01-04', 6)

-- SELECT * FROM @invoices

SELECT DISTINCT accountNumber

FROM @invoices

DECLARE @top2Invoices TABLE (AccountNumber INT, InvoiceNumber INT, InvoiceDate DATETIME, RowID INT)

-- First Invoice Per Account

INSERT INTO @top2Invoices

SELECT orig.*

FROM @invoices orig

INNER JOIN (

SELECT accountNumber, MAX(invoiceDate) as firstInv

FROM @invoices

GROUP BY accountNumber

)fst

ON fst.accountNumber = orig.accountNumber

AND fst.firstInv = orig.invoiceDate

ORDER BY orig.accountNumber

-- Second Invoice Per Account

INSERT INTO @top2Invoices

SELECT orig.*

FROM @invoices orig

INNER JOIN (

SELECT accountNumber, MAX(invoiceDate) as firstInv

FROM @invoices

WHERE rowID NOT IN (SELECT rowID FROM @top2Invoices)

GROUP BY accountNumber

)fst

ON fst.accountNumber = orig.accountNumber

AND fst.firstInv = orig.invoiceDate

ORDER BY orig.accountNumber

SELECT *

FROM @top2Invoices

ORDER BY accountNumber, invoiceDate

Ade

A Freudian Slip is when you say one thing and mean your mother.
For detail-enriched answers, ask detail-enriched questions...[/url]

David Burrows SSC Guru Points: 65144 More actions · Answer 3

SELECT a.AccountNumber,a.InvoiceNumber,a.InvoiceDate,a.RowID

FROM [InvoiceTable] a

WHERE a.RowID IN (SELECT TOP 2 b.RowID

FROM [InvoiceTable] b

WHERE b.AccountNumber = a.AccountNumber

ORDER BY b.RowID DESC)

Far away is close at hand in the images of elsewhere.
Anon.

Adrian Nichols-360275 SSCertifiable Points: 7694 More actions · Answer 4

David,

Definitely more elegant than mine!! Should've seen that. I went through all sorts of JOIN routes and everything.

Ade

A Freudian Slip is when you say one thing and mean your mother.
For detail-enriched answers, ask detail-enriched questions...[/url]

Antares686 SSC Guru Points: 125444 More actions · Answer 5

This is a little longer but overall the performance should be better than David's the larger the dataset is you are working with.

The reason is that in Davids example (which is commonly used) must evaluate all records in the dataset individually as they enter the processing buffers and executes the TOP 2 subquery each time.

This method builds a derived table for the values aliased as i then joins to the dataset evaluating in fewer steps.

Up to you to determine what works best for your needs but thought I would include as yet another option.

SELECT

X.AccountNumber,

X.InvoiceNumber,

X.InvoiceDate,

X.RowID

FROM

@invoices X

INNER JOIN

(

SELECT

a.AccountNumber,

a.RowID,

COUNT(*) Cnt

FROM

@invoices A

INNER JOIN

@invoices B

ON

A.AccountNumber = B.AccountNumber AND

A.RowID <= B.RowID

GROUP BY

a.AccountNumber,

a.RowID

HAVING

COUNT(*) <= 2

) i

ON

X.AccountNumber = i.AccountNumber AND

X.RowID = i.RowID

David Burrows SSC Guru Points: 65144 More actions · Answer 6

..overall the performance should be better...

Yes I was thinking of adding another post to indicate that what I posted may not be that efficient for large datasets

and I have to admit that it is not my solution but one that had been posted on this forum many times

Another method I sometimes use is to put the values in a temp table with an identity column and select appropriately

All the methods have merit but as you say 'determine what works best'

Far away is close at hand in the images of elsewhere.
Anon.

Tom John-342103 SSCrazy Points: 2694 More actions · Answer 7

The query below works well enough and is nice and tight. It takes about 20 seconds on a 250,000 row table on a 2 CPU/4 GB server. That's OK for a report that is run a few times per day at most.

SELECT a.AccountNumber,a.InvoiceNumber,a.InvoiceDate,a.RowID

FROM [InvoiceTable] a

WHERE a.RowID IN (SELECT TOP 2 b.RowID

FROM [InvoiceTable] b

WHERE b.AccountNumber = a.AccountNumber

ORDER BY b.RowID DESC)

David Burrows SSC Guru Points: 65144 More actions · Answer 8

Thanks for the feedback Tom

Far away is close at hand in the images of elsewhere.
Anon.

Tom John-342103 SSCrazy Points: 2694 More actions · Answer 9

Revision. So, I found an invoice table with 3.5 million invoices and discovered the query was taking too long - 11 minutes as of when I killed it.

So, I switched to a different approach using temp tables. First, I get a list of the most recent approved invoice for each account and store that in a temp table.

Then, I repeat the query to get the 2nd most recent invoice by excluding those invoices (uniquely identified by InvoiceDirKey) returned by the first query.

This runs in 50 seconds on the 3.5 million row table.

IF OBJECT_ID('tempdb..#LATEST_INVX') IS NOT NULL

DROP TABLE [#LATEST_INVX]

IF OBJECT_ID('tempdb..#2ND_LATEST_INVX') IS NOT NULL

DROP TABLE [#2ND_LATEST_INVX]

DECLARE @InvStatus INT

SELECT @InvStatus=TablDKey FROM WTABLE WHERE TablNumber=13 AND TablName='APPROVED'

SELECT a.AccountNumber,MAX(a.InvoiceDirKey) AS InvoiceDirKey

INTO #LATEST_INVX FROM INVX a

WHERE InvStatus=@InvStatus

GROUP BY a.AccountNumber

CREATE UNIQUE CLUSTERED INDEX [#LATEST_INVX_001] ON [#LATEST_INVX]([InvoiceDirKey]) ON [PRIMARY]

SELECT a.AccountNumber,MAX(a.InvoiceDirKey) AS InvoiceDirKey

INTO #2ND_LATEST_INVX FROM INVX a

WHERE NOT EXISTS (SELECT 1 FROM #LATEST_INVX b WHERE a.InvoiceDirKey=b.InvoiceDirKey)

AND a.InvStatus=@InvStatus

GROUP BY a.AccountNumber

CREATE UNIQUE CLUSTERED INDEX [#2ND_LATEST_INVX_001] ON [#2ND_LATEST_INVX]([InvoiceDirKey]) ON [PRIMARY]