February 3, 2016 at 5:10 pm
Hi All,
I need your help to figure out a query.
create table #Ord (OrdID INT, OrdDt DATE, CustID INT, MtchID INT)
INSERT INTO #Ord
select 101, GETDATE(), 123, 111
union all
select 102, GETDATE(), 124, 112
union all
select 103, (GETDATE()-1), 125, 112
union all
select 103, (GETDATE()-1), 125, 111
union all
select 102, (GETDATE()-1), 126, 110
select * from #Ord
create table #Prs (MtchID INT, Price INT, OrdLoc VARCHAR(6))
INSERT INTO #Prs
select 111, 10, 'Online'
union all
select 110, 10, null
union all
select 112, 20, 'Phone'
union all
select 163, 10, 'Phone'
union all
select 182, 100, 'Online'
select * from #Prs
select * from #Ord o join #Prs p ON o.MtchID = p.MtchID
and OrdLoc = 'Phone'
drop table #Ord
drop table #Prs
Result:
OrdID OrdDt CustID MtchID MtchIDPriceOrdLoc
103 2016-02-0212511111110Online
1022016-02-0312411211220Phone
As shown in the result set there are multiple OrdLoc's, it can be(Phone, Online, Instore etc) but I have to bring in only Phone orders and if there is a Online order I have get the Online order instead of Phone.
Above result set for OrdID 103 there is a Online and Phone Order, but I have to display only Online order details for OrdID 103 and phone OrdLoc for the other OrdID's
appreciate your help.
Thanks in advance!
February 3, 2016 at 9:23 pm
I think the code below does what you want, if I understand the requirements correctly. It scans each table twice, so if the source tables are very large then this might not be so speedy. Having said that, I'm not sure there's actually a more efficient way to do this, given the requirements. There's always the possibility I'm missing some particularly clever approach (especially this late at night), and hopefully someone else points one out if that's the case. 🙂
WITH online_or_phone AS
(
SELECT OrdId,
is_online=MAX(CASE WHEN OrdLoc='Online' THEN 1 ELSE 0 END)
FROM #Ord o INNER JOIN #Prs p ON o.mtchid=p.mtchid
GROUP BY OrdID
HAVING MAX(CASE WHEN OrdLoc='Phone' THEN 1 END)=1
)
SELECT O.*,P.*
FROM #Ord O
INNER JOIN #Prs p ON o.mtchid=p.mtchid
INNER JOIN online_or_phone OOP ON o.OrdId=OOP.OrdId AND
p.OrdLoc=CASE WHEN OOP.is_online=1
THEN 'Online'
ELSE 'Phone'
END;
Cheers!
February 3, 2016 at 10:10 pm
Here is an option that I think does a little less work but I am still not crazy about needing to use row_number. I think there is still a better way:
with ord as (select o.OrdID,
o.OrdDt,
o.CustId,
o.MtchID,
p.Price,
p.OrdLoc,
-- relies on "Online" sorting before "Phone" so delivers Online order first, else delivers Phone
row_number() over (partition by o.OrdID order by p.OrdLoc) as row_num
from #Ord o
join #Prs p ON o.MtchID = p.MtchID
where p.OrdLoc in ('Phone', 'Online'))
select OrdID,
OrdDt,
CustId,
MtchID,
Price,
OrdLoc
from ord
where row_num = 1;
There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato
February 4, 2016 at 2:12 am
Jacob Wilkins (2/3/2016)
I think the code below does what you want, if I understand the requirements correctly. It scans each table twice, so if the source tables are very large then this might not be so speedy. Having said that, I'm not sure there's actually a more efficient way to do this, given the requirements. There's always the possibility I'm missing some particularly clever approach (especially this late at night), and hopefully someone else points one out if that's the case. 🙂
WITH online_or_phone AS
(
SELECT OrdId,
is_online=MAX(CASE WHEN OrdLoc='Online' THEN 1 ELSE 0 END)
FROM #Ord o INNER JOIN #Prs p ON o.mtchid=p.mtchid
GROUP BY OrdID
HAVING MAX(CASE WHEN OrdLoc='Phone' THEN 1 END)=1
)
SELECT O.*,P.*
FROM #Ord O
INNER JOIN #Prs p ON o.mtchid=p.mtchid
INNER JOIN online_or_phone OOP ON o.OrdId=OOP.OrdId AND
p.OrdLoc=CASE WHEN OOP.is_online=1
THEN 'Online'
ELSE 'Phone'
END;
Cheers!
Be careful with this one. This gives 2 rows from the test dataset and I think should give 3 rows. Also, I think the code assumes some type of uniqueness on the join-predicates because as soon as I expanded the test dataset to test for performance a bit I started seeing lots of duplicates in the results.
There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato
February 4, 2016 at 2:15 am
This one seems to do quite a bit better than the others on elapsed time:
with cte as (select distinct OrdID
from #Ord)
select cte.OrdID,
x.OrdDt,
x.CustId,
x.MtchID,
x.Price,
x.OrdLoc
from cte
cross apply (select top 1
o.OrdDt,
o.CustId,
o.MtchID,
p.Price,
p.OrdLoc
from #Ord o
join #Prs p ON o.MtchID = p.MtchID
where p.OrdLoc in ('Phone', 'Online')
and cte.OrdID = o.OrdID
order by p.OrdLoc) x;
I assume you gave us a basic version of the problem using temp tables minus the representative constraints and indexes. It would be good to check the execution plans on your side once you wire it up to make sure the query you decide is well supported.
There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato
February 4, 2016 at 6:32 am
Orlando Colamatteo (2/4/2016)
Jacob Wilkins (2/3/2016)
I think the code below does what you want, if I understand the requirements correctly. It scans each table twice, so if the source tables are very large then this might not be so speedy. Having said that, I'm not sure there's actually a more efficient way to do this, given the requirements. There's always the possibility I'm missing some particularly clever approach (especially this late at night), and hopefully someone else points one out if that's the case. 🙂
WITH online_or_phone AS
(
SELECT OrdId,
is_online=MAX(CASE WHEN OrdLoc='Online' THEN 1 ELSE 0 END)
FROM #Ord o INNER JOIN #Prs p ON o.mtchid=p.mtchid
GROUP BY OrdID
HAVING MAX(CASE WHEN OrdLoc='Phone' THEN 1 END)=1
)
SELECT O.*,P.*
FROM #Ord O
INNER JOIN #Prs p ON o.mtchid=p.mtchid
INNER JOIN online_or_phone OOP ON o.OrdId=OOP.OrdId AND
p.OrdLoc=CASE WHEN OOP.is_online=1
THEN 'Online'
ELSE 'Phone'
END;
Cheers!
Be careful with this one. This gives 2 rows from the test dataset and I think should give 3 rows. Also, I think the code assumes some type of uniqueness on the join-predicates because as soon as I expanded the test dataset to test for performance a bit I started seeing lots of duplicates in the results.
I'm aware that it only gives two rows (and the OPs desired result was just two rows).
I also initially thought there should be 3, but when I reread the OPs post, I realized (I think) what he was saying.
As I read it, he wants to return information only for OrdIDs that match to an OrdLoc of 'Phone'. Of the OrdIDs that match to an OrdLoc of 'Phone', if they also match to an OrdLoc of 'Online', return the row for that OrdId that matches to 'Online' instead of 'Phone'.
That requirement would result in the two rows the OP requested, so I'm guessing that's what he means, and that's what my query implements (I initially also did the row_number approach, assuming 3 rows should be returned, until I re-read the requirements). Of course, we'll just have to wait for the OP to clarify 🙂
If you started extending the result set and saw duplicates, then I'm assuming you just did something like INSERT INTO #Ord SELECT * FROM #Ord and the same for #Prs.
That would result in duplicates because there are now several rows in #Prs for the same MtchID, so each OrdId that matched to 'Phone' originally now matches several 'Online' or 'Phone' rows in #Prs.
I (and the query) did assume the actual data doesn't allow that, as you pointed out; if that is allowed in the data, then yes, my query would have to be changed.
At any rate, the main thing is to see if my reading of the requirements is correct, because that's a different question than your queries (and a query I almost submitted) answered.
Back in the OP's court 🙂
EDIT: Mistakenly referred to #Prs as #Tmp at one point. Fixed that.
February 4, 2016 at 7:16 am
I see what you are pointing out now, that before we decide to return an online order we must first know there was a phone order for that OrdID. Sounds like a call center upsell report. Agreed that raises the bar on complexity.
There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato
February 4, 2016 at 8:10 am
Try the following.
WITH orders AS (
SELECT o.OrdID, o.OrdDt, o.CustID, o.MtchID, p.Price, p.OrdLoc, ROW_NUMBER() OVER(PARTITION BY o.OrdID ORDER BY p.OrdLoc) AS rn, MAX(p.OrdLoc) OVER(PARTITION BY o.OrdID) AS MaxLoc
FROM #Ord o
INNER JOIN #Prs p
ON o.MtchID = p.MtchID
WHERE p.OrdLoc IN ('Online', 'Phone')
)
SELECT OrdId, OrdDt, CustID, MtchID, Price, OrdLoc
FROM orders
WHERE rn = 1
AND MaxLoc = 'Phone';
Drew
J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA
February 4, 2016 at 8:16 am
Thank you all for the replies!
@jacob Wilkins - You are correct, if there are two or more MtchId's with "Phone", "Online" and other OrdLoc's I want to display only the "Online" match, if there is no "Online" then I should have "Phone" , so "online" takes the precedence over "Phone".
Thank you!
February 4, 2016 at 10:38 am
February 4, 2016 at 2:21 pm
ssc_san (2/4/2016)
Thank you all for the replies!@jacob Wilkins - You are correct, if there are two or more MtchId's with "Phone", "Online" and other OrdLoc's I want to display only the "Online" match, if there is no "Online" then I should have "Phone" , so "online" takes the precedence over "Phone".
Thank you!
I think we have it, but for completeness, can you please confirm that if there is only an Online order, i.e. that the OrdID does not also have a match to a Phone order, that we should not return that Online order?
There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato
February 4, 2016 at 2:29 pm
February 4, 2016 at 3:12 pm
OK, I'll go with it...
This query has a lower plan cost than the other two solutions that seem to satisfy the requirements (Jacob's and Drew's) but I haven't tested all three at scale to see how each of them hold up...will do that later tonight:
WITH cte
AS (
SELECT o.OrdID,
o.OrdDt,
o.CustID,
o.MtchID,
p.Price,
p.OrdLoc,
LEAD(o.OrdID, 1, 0) OVER (PARTITION BY o.OrdID ORDER BY o.OrdID) AS NextOrdID,
LAG(o.OrdID, 1, 0) OVER (PARTITION BY o.OrdID ORDER BY p.OrdLoc DESC) AS LastOrdID,
LAG(p.OrdLoc, 1, '') OVER (PARTITION BY o.OrdID ORDER BY p.OrdLoc DESC) AS LastOrdLoc
FROM #Ord o
INNER JOIN #Prs p ON o.MtchID = p.MtchID
WHERE p.OrdLoc IN ('Phone', 'Online')
)
SELECT cte.OrdID,
cte.OrdDt,
cte.CustID,
cte.MtchID,
cte.Price,
cte.OrdLoc
FROM cte
WHERE -- phone-only order ids
(
cte.OrdLoc = 'Phone'
AND cte.NextOrdID = 0
)
OR
-- OrdID has Online and Phone order
(
cte.LastOrdID = cte.OrdID
AND cte.LastOrdLoc = 'Phone'
);
There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato
February 4, 2016 at 5:22 pm
Very nice Orlando! I thought there might be something with LAG/LEAD that allowed a single scan of each table, and it looks like you found one.
I ran through some quick tests with those three solutions, and the below were the results:
1000 rows:
solution last_logical_reads last_worker_time last_elapsed_time
--------------- -------------------- -------------------- --------------------
test_Jacob 28 6000 6000
test_Drew 4617 8997 8997
test_Orlando 14 9998 9998
100000 rows:
solution last_logical_reads last_worker_time last_elapsed_time
-------------- -------------------- -------------------- --------------------
test_Jacob 1252 454098 454098
test_Orlando 626 900020 900020
test_Drew 460629 924051 924051
1000000 rows:
solution last_logical_reads last_worker_time last_elapsed_time
--------------- -------------------- -------------------- --------------------
test_Jacob 12192 5834277 6089390
test_Orlando 12580 10248498 10278545
test_Drew 4609126 10355920 11405656
I expected the reduced IO from Orlando's would allow it to blow the doors off the others, but it seems nothing comes for free. In this case, the reduced IO comes at the cost of a couple very expensive sorts.
A few caveats: 1) my machine has very underwhelming specs, so all the sorts and hash joins spilled to tempdb for the larger row count runs; 2) I didn't change the shape of the data, just multiplied rows in both tables; 3) I didn't look at adding any supporting indexing. With different shapes of the data or some supporting indexing this could be very different, but I was only willing to spend so much time on this tonight :-). I might revisit and tinker with all that later, but for now I'm going to retreat into the non-SQL Server world (a scary place).
I've attached the script I used for testing as a .txt file, since the site didn't seem to like my putting it in the post.
Cheers!
EDIT: Updated my caveats, since I left one out 🙂
February 4, 2016 at 6:41 pm
I think we have it, but for completeness, can you please confirm that if there is only an Online order, i.e. that the OrdID does not also have a match to a Phone order, that we should not return that Online order?
There is a possibility of having "Phone", "Online" and "" (Space), no present both(Phone & Online) will be separate.
Thank you for your time and help @orlando Colamatteo, @jacob Wilkins & @IanMGSmith!
Viewing 15 posts - 1 through 15 (of 17 total)
You must be logged in to reply to this topic. Login to reply