May 5, 2012 at 1:10 pm
The below scenario represents my question in it's simplest form.
I am trying to join an Item History table to an Invoice History table. Unfortunately the columns I have available to me in the Item History table for join criteria do not always define a unique record in the Invoice History table. I get repeating results included when there are multiple invoices generated for the same item against the same order and line for the same quantity on the same day. Each Invoice History transaction REALLY only corresponds to ONE Item History transaction, but I have insufficient information in Item History to be able to identify the specific corresponding Invoice History transaction
How can I limit the rows returned by this query so that the Item History quantity change is only represented once even if there are multiple Invoice History transactions that meet the join criteria? It seems as though there should be a simple solution, but I am having a difficult time finding it.
create table dbo.ItemHistory
(Item varchar(30)
,OrderNo numeric(9,0)
,OrderLine numeric(4,0)
,TxDate datetime
,QtyChg numeric(10,0)
)
go
create table dbo.InvoiceHistory
(Item varchar(30)
,OrderNo numeric(9,0)
,OrderLine numeric(4,0)
,InvoiceNo numeric(9,0)
,InvoiceDate datetime
,InvoiceQty numeric(10,0)
)
go
insert into
dbo.ItemHistory
values
('123',1,5,'2012-01-01',10)
,('123',1,5,'2012-01-01',10)
,('124',2,5,'2012-01-01',5)
,('125',3,5,'2012-01-01',25)
go
insert into
dbo.InvoiceHistory
values
('123',1,5, 1,'2012-01-01',10)
,('123',1,5, 2, '2012-01-01',10)
,('124',2,5, 3, '2012-01-01',5)
,('125',3,5, 4, '2012-01-01',25)
go
Here is the query I am struggling with. Note that the problem is with Item 123.
select
itm.Item
,itm.OrderNo
,itm.OrderLine
,CONVERT(varchar, itm.TxDate, 101) as TxDate
,itm.QtyChg
,inv.InvoiceNo
,CONVERT(varchar, inv.InvoiceDate, 101) as InvoiceDate
,inv.InvoiceQty
from
dbo.ItemHistory itm
join
dbo.InvoiceHistory inv
on
inv.Item = itm.Item
and inv.OrderNo = itm.OrderNo
and inv.OrderLine = itm.OrderLine
and inv.InvoiceDate = itm.TxDate
and inv.InvoiceQty = itm.QtyChg
With the columns available to me for joining, the first two Item History rows each match to the first two Invoice History rows, giving me the cartesian product result of 4 rows returned. Rows 5 and 6 represent a one to one correspondence between the 2 tables and are not part of the problem.
The result I am after is to have Invoice Numbers 1 and 2 each represented only once.
ItemOrderNoOrderLineTxDateQtyChgInvoiceNoInvoiceDateInvoiceQty
1231501/01/201210101/01/201210
1231501/01/201210101/01/201210
1231501/01/201210201/01/201210
1231501/01/201210201/01/201210
1242501/01/20125301/01/20125
1253501/01/201225401/01/201225
May 5, 2012 at 6:47 pm
Thank you for providing tables and data - this takes out a lot of the guesswork.
Here is a relatively new and efficient way to solve this kind of problem.
To your query I added
selectitm.Item, itm.OrderNo ,itm.OrderLine... as rownum
that adds a row, called rownum, that contains a sequence that re-starts with 1 whenever the Item, OrderNo and InvoiceNo grouping changes.
Then I wrapped it in an outer query that selects only rows that have rownum = 1.
Put all that together and here's the query that solves your problem:
select Item, OrderNo, OrderLine, TxDate, QtyChg, InvoiceNo, InvoiceDate, InvoiceQty, rownum
from (
selectitm.Item, itm.OrderNo ,itm.OrderLine
,CONVERT(varchar, itm.TxDate, 101) as TxDate
,itm.QtyChg, inv.InvoiceNo
,CONVERT(varchar, inv.InvoiceDate, 101) as InvoiceDate
,inv.InvoiceQty
,ROW_NUMBER() OVER (PARTITION BY itm.Item, itm.OrderNo, inv.InvoiceNo
ORDER BY itm.Item, itm.OrderNo, inv.InvoiceNo ) as rownum
from dbo.ItemHistory itm join dbo.InvoiceHistory inv on
(inv.Item = itm.Item
and inv.OrderNo = itm.OrderNo
and inv.OrderLine = itm.OrderLine
and inv.InvoiceDate = itm.TxDate
and inv.InvoiceQty = itm.QtyChg) ) X
WHERE rownum = 1
This solution uses windows functions (because the ROW_NUMBER, OVER, and PARTITION BY are applied the the result set, or "window", after the data has been retrieved).
This and other ways of solving this problem can be found here:
http://www.simple-talk.com/content/print.aspx?article=646
The solution used here is near the bottom of the article in the section titled, "New Techniques for Removing Duplicate Rows in SQL Server 2005".
Google "sql server windows functions" for more info on this solution.
- victor di leo
May 5, 2012 at 6:53 pm
Oops.... where I wrote
To your query I added
select itm.Item, itm.OrderNo ,itm.OrderLine... as rownum
I actually meant to write
To your query I added
,ROW_NUMBER() OVER (PARTITION BY ... as rownum
May 5, 2012 at 7:12 pm
Victor,
I believe this will absolutely resolve my issue. I will test it out shortly.
I have been working with sql for a few months now and have learned one thing so far. There is ALWAYS more to learn.
Thank you very much for the assistance.
Brian Miller
May 5, 2012 at 8:15 pm
If you omit the rownum column from the result, the query plan can use the ANY aggregate:
SELECT
X.Item,
X.OrderNo,
X.OrderLine,
X.TxDate,
X.QtyChg,
X.InvoiceNo,
X.InvoiceDate,
X.InvoiceQty
FROM
(
SELECT
itm.Item,
itm.OrderNo,
itm.OrderLine,
TxDate = CONVERT(char(10), itm.TxDate, 101),
itm.QtyChg,
inv.InvoiceNo,
InvoiceDate = CONVERT(char(10), inv.InvoiceDate, 101),
inv.InvoiceQty,
rownum =
ROW_NUMBER() OVER (
PARTITION BY itm.Item, itm.OrderNo, inv.InvoiceNo
ORDER BY itm.Item, itm.OrderNo, inv.InvoiceNo)
FROM dbo.ItemHistory itm
JOIN dbo.InvoiceHistory inv ON
inv.Item = itm.Item
AND inv.OrderNo = itm.OrderNo
AND inv.OrderLine = itm.OrderLine
AND inv.InvoiceDate = itm.TxDate
AND inv.InvoiceQty = itm.QtyChg
) AS X
WHERE
X.rownum = 1;
You could also just add a DISTINCT to the original query. Both of these approaches are imperfect though; the real solution is to fix the schema. Using a properly relational design will help avoid these issues arising in the first place.
Viewing 5 posts - 1 through 4 (of 4 total)
You must be logged in to reply to this topic. Login to reply