October 3, 2008 at 11:05 pm
Hi All
I ‘m using SQL Server 2005. I not good in SQL query. I got 2 tables as below
Table A:-
Invoice No Invoice Amount Invoice Date
------------------------------------- --------------------
A1234 100030/07/2008
B4444 20011/08/2008
Table B:-
Invoice No Payment No Payment Collected Payment Date
-------------- ------------------ -------------------------- -----------------------
A1234 P111 300 03/08/2008
A1234 P256 400 09/09/2008
B4444 P555 200 31/08/2008
I need the result as below:-
(it's only show the invoice that still have outstanding amount)
Invoice No Invoice Amount outstanding Amount
----------------------------------- -------------------------------
A1234 1000300
(Noted: outstanding Amount = Invoice Amount – Total Payment collected)
how to join these 2 tables in order to get the result that I needed. Please help
Thank
October 4, 2008 at 12:32 pm
I'm not an expert, and I'm certain there are more elegant ways of performing this task but one approach is to use a subquery to total the payments collected
SELECT a.[Invoice no], a.[invoice Amount], (a.[invoice Amount] - (SELECT SUM(b.[payment Collected]) FROM b WHERE b.[Invoice No] = a.[invoice No]))
FROM a
The important bit is to link the two tables in the subquery.
Regards
October 4, 2008 at 6:50 pm
David Stubbington (10/4/2008)
I'm not an expert, and I'm certain there are more elegant ways of performing this task but one approach is to use a subquery to total the payments collectedSELECT a.[Invoice no], a.[invoice Amount], (a.[invoice Amount] - (SELECT SUM(b.[payment Collected]) FROM b WHERE b.[Invoice No] = a.[invoice No]))
FROM a
The important bit is to link the two tables in the subquery.
Regards
Even if you are not an expert you post the simple solution here ...great post David!
:w00t:
October 4, 2008 at 7:18 pm
The correlated sub-query can suffer depending on how big the table is. You might care to try a derived table setup:
SELECT a.[Invoice no], a.[invoice Amount], (a.[invoice Amount] - isnull(b_sum.TotPayment,0))
FROM a
left outer join
(SELECT SUM(b.[payment Collected]) Totpayment FROM b ) b_sum
ON b_sum.[Invoice No] = a.[invoice No])
----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
October 6, 2008 at 3:03 am
Matt is right, using left join to derived table is better than correlated subquery. Unfotunately he has a typo in his post (ID and group by missing in the inner query), this is how it should be (and I added the condition you mentioned:
SELECT a.[Invoice no], a.[invoice Amount], (a.[invoice Amount] - isnull(b_sum.TotPayment,0)) as outstanding
FROM a
LEFT OUTER JOIN
(SELECT [Invoice no], SUM(b.[payment Collected]) Totpayment
FROM b
GROUP BY [Invoice no]) AS b_sum
ON b_sum.[Invoice No] = a.[invoice No]
WHERE a.[invoice Amount] - isnull(b_sum.TotPayment,0) > 0
However, this is also untested, so I just hope I got it right 🙂
October 6, 2008 at 8:32 am
Yup - Vladan's right. Missed a few moving parts there.
thanks for the assist!:D
----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
October 6, 2008 at 1:42 pm
Yeaa all assists are wellcomed... and I can see that these assists are coming so friendly and I can feel that we are big SQL family here!
:w00t:
Thnx for all assists!
Dugi
October 6, 2008 at 4:51 pm
Assuming invoice number is a unique key in table A, here is another approach, using a summary query:
select A.[invoice no], max([invoice amt]) as invoice_amt,
max([invoice amt])-sum([payment collected]) as amount_due
from A
join B on B.[invoice no] = A.[invoice no]
group by A.[invoice no]
having max([invoice amt])-sum([payment collected]) <> 0
order by A.[invoice no]
Please let us know if you see any performance differences.
October 7, 2008 at 12:28 am
Thank for help 🙂
October 7, 2008 at 1:37 am
Another approach would be the following:
SELECT InvoiceNo, Sum(Outstanding)
FROM (
SELECT InvoiceNo, InvoiceAmount AS Outstanding FROM Invoices
UNION ALL
SELECT InvoiceNo, (-1) * CollectedAmount AS Outstanding FROM Payments
)
GROUP BY InvoiceNo
HAVING SUM(Outstanding) <> 0
Theoretically, this should also be performing better than the join approach.
Best Regards,
Chris Büttner
October 7, 2008 at 8:17 am
Christian, that is an interesting approach.
Thinking about this, it is also possible to use CTEs to get the totals and then join the CTEs.
I may have to play around with these techniques on some production-sized volumes, check out the differences in query plans, and see if there are any significant differences in performance. If I find something worthwhile to report, I'll post it back here.
October 7, 2008 at 11:50 am
Okay... so much for theory. I ran the code at the bottom (without DBCC FREEPROCCACHE or DBCC DROPCLEANBUFFERS) and saw no significant IO differences against the base tables (#invoice and #payment) with any approach.
Using a CTE to summarize the payment table before joining to the invoice table was slightly faster than doing a summary query of the base tables joined together. Both did a merge join, but the summarization of the #payment table significantly reduced the volume the join had to handle.
Vlad's join to a derived table was third and the non-join approach was forth. These two approaches took 20-30% longer. Oddly, the display of the execution plan from SQL Server Management Studio suggests that Vlad's approach took no more work than the CTE approach, but the elapsed time was consistently and significantly larger.
I couldn't get Christian's syntax to work, so I used a CTE for the non-join approach. The execution plans revealed that it failed to take advantage of the indexing because the union all forced it to do a sort that was about 70% of the cost.
Obviously, the indexing schemes chosen and the test data influence the results. There are three payments per invoice for 70% of the invoices, and the other 30% have no payments. If the average number of payments per invoice was closer to one, the use of the CTE to summarize the #payment table would not help so much.
It seems that once again the it comes down to (1) understanding your data, (2) testing different approaches, and (3) examining the execution plans to understand what is going on under the covers. It's good to know all of these different approaches are available to solve the same problem. Looking at this got me to think outside my usual box. Thanks!!! 🙂
=============================================
--create and populate temporary tables
/*
CREATE TABLE #invoice
(
InvoiceID int IDENTITY(1,1) NOT NULL,
InvoiceAmt decimal(11,2) NOT NULL,
InvoiceDate datetime NOT NULL,
CONSTRAINT pk_#invoice PRIMARY KEY CLUSTERED (InvoiceID ASC)
)
CREATE TABLE #payment
(
PaymentID int IDENTITY(1,1) NOT NULL,
InvoiceID int not null,
PaymentAmt decimal(11,2) NOT NULL,
PaymentDate datetime NOT NULL,
CONSTRAINT pk_#payment PRIMARY KEY NONCLUSTERED (PaymentID ASC)
)
CREATE CLUSTERED INDEX #payment_cluster ON #payment
(
InvoiceID ASC,
PaymentID ASC
)
insert into #invoice
selectcase when right(t1.N,1) between 1 and 3 then 1000 when right(t1.N,1) between 4 and 6 then 2000 else 3000 end
,getdate()
from tally t1
cross join tally t2
where t2.N <= 10
;with cteAmts(amt) as
(select 250 union all select 300 union all select 125)
insert into #payment
select invoiceID,amt,getdate()
from #invoice
cross join cteAmts
where right(invoiceID,1) not in (2,3,5)
*/
set nocount on;
declare @start datetime
declare @elapsed int
set statistics time off;
set statistics io off;
print '---------------- Joined Summary Query'
set @start = getdate()
select i.invoiceID,max(invoiceAmt) as invoiceAmt,max(invoiceAmt)-sum(paymentAmt) as amtDue
from #invoice i
join #payment p on p.invoiceID = i.invoiceID
group by i.invoiceID
having max(invoiceAmt)-sum(paymentAmt) <> 0
order by i.invoiceID
set @elapsed = datediff(ms,@start,getdate())
print @elapsed
print ''
print ''
print ''
print '---------------- CTE summary of #payment'
set @start = getdate()
;with payment (invoiceID,payments) as
(select invoiceID,sum(paymentAmt)
from #payment
group by invoiceID
)
select i.invoiceID,i.invoiceAmt,i.invoiceAmt-p.payments as amtDue
from #invoice i
join payment p on p.invoiceID = i.invoiceID
order by invoiceID
set @elapsed = datediff(ms,@start,getdate())
print @elapsed
print ''
print ''
print ''
print '---------------- non-JOIN '
set @start = getdate()
;with cteUnion (invoiceID,amt) as
(SELECT InvoiceID, InvoiceAmt
FROM #Invoice
UNION ALL
SELECT InvoiceID, (-1) * paymentAmt
FROM #Payment
)
SELECT InvoiceID, Sum(amt) as Outstanding
FROM cteUnion
GROUP BY InvoiceID
HAVING SUM(amt) <> 0
ORDER BY InvoiceID
set @elapsed = datediff(ms,@start,getdate())
print @elapsed
print ''
print ''
print ''
print '---------------- join to derived table'
set @start = getdate()
SELECT i.invoiceID, i.invoiceAmt, (i.invoiceAmt - isnull(b_sum.TotPayment,0)) as outstanding
FROM #invoice i
LEFT OUTER JOIN
(SELECT invoiceID, SUM(p.paymentAmt) Totpayment
FROM #payment p
GROUP BY invoiceID) AS b_sum
ON b_sum.invoiceID = i.invoiceID
WHERE i.invoiceAmt - isnull(b_sum.TotPayment,0) > 0
set @elapsed = datediff(ms,@start,getdate())
print @elapsed
print ''
print ''
print ''
/*
drop table #invoice
drop table #payment
*/
__________________________________________________
Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills
October 7, 2008 at 1:08 pm
Thanks for testing this.
The union all approach is not really thought through.
1. It makes more sense to SUM the payments before joining to the invoices.
This reduces the amount of data that needs to be joined. Example is your second query.
2. Joining the tables requires less resources than concatenating the data. For the concatenation,
the InvoiceID for Payments needs to be "maintained" and scanned again for the aggregation
that takes place afterwards.
Sorry for the bad information provided.
Best Regards,
Chris Büttner
Viewing 13 posts - 1 through 12 (of 12 total)
You must be logged in to reply to this topic. Login to reply