February 17, 2014 at 11:36 am
I'm guessing there's probably a simple and elegant way to accomplish this in T-SQL, just not sure how.
I have a large quantity of information regarding travel routes - lots of pairs involving origin and destination IDs. My end goal is to be able to group by route, regardless of direction. This would mean that a trip from location 1 to location 2, and a return trip from 2 to 1, would both fall into the same group.
My best idea for accomplishing this so far is to create a table containing origins and destinations in both orders, and then assign a route ID for each:
Origin | Destination | RouteID
-------+-------------+---------
1 | 2 | 1
2 | 1 | 1
2 | 3 | 2
3 | 2 | 2
1 | 3 | 3
3 | 1 | 3
Populating the origin and destination columns via a cartesian join is simple, but calculating the RouteID column efficiently is eluding me. Is there a better way to do this other than using loops?
_______________________________________________________________________________________________
Bob Pusateri
Microsoft Certified Master: SQL Server 2008
Blog: The Outer Join[/url]
Twitter: @SQLBob
February 17, 2014 at 12:56 pm
Can you provide some ddl and sample data? It sounds like your origin and destination columns are derived? If your pattern of data is consistent maybe you can just a case expression?
case when Origin < Destination then Origin else Destination end as RouteID
I suspect it isn't really quite that straightforward but I bet there is something in the base table to indicate if it is return trip or not.
_______________________________________________________________
Need help? Help us help you.
Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.
Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.
Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/
February 17, 2014 at 5:27 pm
Sean Lange (2/17/2014)
Can you provide some ddl and sample data? It sounds like your origin and destination columns are derived? If your pattern of data is consistent maybe you can just a case expression?
case when Origin < Destination then Origin else Destination end as RouteID
I suspect it isn't really quite that straightforward but I bet there is something in the base table to indicate if it is return trip or not.
Other than the fact that "as RouteID" wouldn't be valid in a GROUP BY clause, I think Sean's proposed solution hits the mark.
You might also want to take a look at this article: Departures from Origins and Arrivals at Destinations [/url]. While it doesn't address this specific problem, you might find in it some interesting solutions to other problems you'll likely encounter.
My thought question: Have you ever been told that your query runs too fast?
My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.
Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St
February 17, 2014 at 5:41 pm
Ooops! Silly me, I just realized why that won't work. Try this (assuming this is what Sean meant):
WITH SampleData (Origin, Destination) AS
(
SELECT 1, 2
UNION ALL SELECT 2, 1
UNION ALL SELECT 2, 3
UNION ALL SELECT 3, 2
UNION ALL SELECT 1, 3
UNION ALL SELECT 3, 1
UNION ALL SELECT 1, 4
UNION ALL SELECT 4, 1
)
SELECT d1=MIN(Origin), d2=MAX(Destination)
FROM SampleData
GROUP BY CASE WHEN Origin < Destination THEN Origin ELSE Destination END;
But I think this will work (combine origin and destination in whatever fashion makes sense for you):
WITH SampleData (Origin, Destination) AS
(
SELECT 1, 2
UNION ALL SELECT 2, 1
UNION ALL SELECT 2, 3
UNION ALL SELECT 3, 2
UNION ALL SELECT 1, 3
UNION ALL SELECT 3, 1
UNION ALL SELECT 1, 4
UNION ALL SELECT 4, 1
)
SELECT d1=MIN(Origin), d2=MAX(Destination)
FROM SampleData a
CROSS APPLY
(
SELECT d1=RIGHT(10000+Origin, 4)+RIGHT(10000+Destination, 4)
,d2=RIGHT(10000+Destination, 4)+RIGHT(10000+Origin, 4)
) b
GROUP BY CASE WHEN Origin < Destination THEN d1 ELSE d2 END;
I'd assume that your origins/destinations are either the 3 or 4 character airport codes, so a straight concatenate should do the trick.
My thought question: Have you ever been told that your query runs too fast?
My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.
Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St
February 17, 2014 at 7:12 pm
Thanks for all the tips! I came up with a multi-step solution that so far seems to get me what I need, and I'm happy to share. I'll do a blog post with some better examples and sample data shortly.
So for a table "TripData" with columns "Origin" and "Destination", we can see all trips (directionally) with the following query:
select distinct Origin, Destination
from TripData;
To express these trips uniformly regardless of direction, some case statements are necessary to do some swapping of their order. Origin and Destination are both integers, so some simple numeric comparisons do the trick. Distinct ensures we see each route only once.
select distinct
case when Origin<=Destination then Origin
when Origin>Destination then Destination
end as PtA,
case when Origin<=Destination then Destination
when Origin>Destination then Origin
end as PtB
from TripData;
Now that we have distinct routes, we can assign each a RouteID.
SELECT a.PtA, a.PtB, ROW_NUMBER() over (order by a.PtA, a.PtB) AS RouteID
into dbo.RouteIDSTG
from (
select distinct
case when Origin<=Destination then Origin
when Origin>Destination then Destination
end as PtA,
case when Origin<=Destination then Destination
when Origin>Destination then Origin
end as PtB
from TripData
)a;
With that done, we can create a lookup table that contains each trip (directionally) along with the route ID for easy joining.
WITH AllPairs as (
SELECT Origin, Destination,
Case when Origin<=Destination then Origin
when Origin>Destination then Destination
end as PtA,
case when Origin<=Destination then Destination
when Origin>Destination then Origin
end as PtB
from TripData
)
SELECT distinct ap.Origin, ap.Destination, r.RouteID
into RouteIDMaster
from AllPairs ap
inner join dbo.RouteIDSTG r on r.PtA = ap.PtA and r.PtB = ap.PtB;
Now the TripData table can be joined to RouteIDMaster on Origin and Destination columns and grouped by RouteID.
_______________________________________________________________________________________________
Bob Pusateri
Microsoft Certified Master: SQL Server 2008
Blog: The Outer Join[/url]
Twitter: @SQLBob
February 17, 2014 at 7:20 pm
Looks like an awful lot of work and an awful lot of DISTINCTs. Wouldn't this be simpler?
WITH SampleData (Origin, Destination) AS
(
SELECT 1, 2
UNION ALL SELECT 2, 1
UNION ALL SELECT 2, 3
UNION ALL SELECT 3, 2
UNION ALL SELECT 1, 3
UNION ALL SELECT 3, 1
UNION ALL SELECT 1, 4
UNION ALL SELECT 4, 1
)
SELECT Origin, Destination, RouteID
FROM
(
SELECT d1=MIN(Origin), d2=MAX(Destination), RouteID=ROW_Number() OVER (ORDER BY (SELECT NULL))
FROM SampleData a
CROSS APPLY
(
SELECT d1=RIGHT(10000+Origin, 4)+RIGHT(10000+Destination, 4)
,d2=RIGHT(10000+Destination, 4)+RIGHT(10000+Origin, 4)
) b
GROUP BY CASE WHEN Origin < Destination THEN d1 ELSE d2 END
) a
CROSS APPLY
(
SELECT Origin, Destination
FROM SampleData
WHERE d1 IN (Origin, Destination) AND d2 IN (Origin, Destination)
) b;
My thought question: Have you ever been told that your query runs too fast?
My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.
Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St
February 17, 2014 at 8:27 pm
Your code is definitely simpler. But the performance (at least on my machine) is lacking. I just stopped it after letting it run for 10 minutes. My table has about 1.2M rows in it. I'll try tweaking it and see what develops!
_______________________________________________________________________________________________
Bob Pusateri
Microsoft Certified Master: SQL Server 2008
Blog: The Outer Join[/url]
Twitter: @SQLBob
February 17, 2014 at 8:32 pm
With some DDL including indexing on your table it might offer a clue.
My thought question: Have you ever been told that your query runs too fast?
My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.
Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St
February 17, 2014 at 9:29 pm
You're right! That was kind of doggy wasn't it? Try this instead:
CREATE TABLE dbo.OriginsDestinations
(
Origin INT
,Destination INT
,PRIMARY KEY (Origin, Destination)
);
WITH Tally (n) AS
(
SELECT TOP 500000 ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROM sys.all_columns a CROSS JOIN sys.all_columns b
)
INSERT INTO dbo.OriginsDestinations
SELECT n, n+1
FROM Tally
UNION ALL
SELECT n+1, n
FROM Tally;
SET STATISTICS TIME ON;
WITH OD AS
(
SELECT d1=MIN(Origin), d2=MAX(Destination), RouteID=ROW_Number() OVER (ORDER BY (SELECT NULL))
FROM dbo.OriginsDestinations a
CROSS APPLY
(
SELECT d1=RIGHT(1000000+Origin, 6)+RIGHT(1000000+Destination, 6)
,d2=RIGHT(1000000+Destination, 6)+RIGHT(1000000+Origin, 6)
) b
GROUP BY CASE WHEN Origin < Destination THEN d1 ELSE d2 END
)
SELECT d1, d2, RouteID
INTO #Temp
FROM OD
UNION ALL
SELECT d2, d1, RouteID
FROM OD
SET STATISTICS TIME OFF;
GO
DROP TABLE #Temp;
I get these timing results:
SQL Server Execution Times:
CPU time = 8454 ms, elapsed time = 12722 ms.
(1000000 row(s) affected)
My thought question: Have you ever been told that your query runs too fast?
My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.
Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St
February 18, 2014 at 9:31 am
Thanks Dwain. You were right in what you thought I was going for. Seems that this issue is in good hands now.
_______________________________________________________________
Need help? Help us help you.
Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.
Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.
Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/
February 18, 2014 at 2:20 pm
I could be wrong for simplifying it that much but I think we could do this in one pull from [TripData] using the idea from Sean.
select
*, -- columns of interest
dense_rank() over(
order by
case when Origin <= Destination then Origin else Destination end,
case when Origin <= Destination then Destination else Origin end
) as rnk -- route_id
from
TripData;
The issue I see here is how to create a POC index (partition / ordering / covering) to avoid the sorting needed to support the ranking function. May be adding calculated columns (case expressions) and then adding the index.
February 18, 2014 at 5:20 pm
Sean Lange (2/18/2014)
Thanks Dwain. You were right in what you thought I was going for. Seems that this issue is in good hands now.
Thanks for letting me know that Sean. Sometimes I feel a bit sheepish after posting my interpretation of what someone else is saying, in case I was wrong. Enthusiasm sometimes gets the best of me.
My thought question: Have you ever been told that your query runs too fast?
My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.
Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St
Viewing 12 posts - 1 through 11 (of 11 total)
You must be logged in to reply to this topic. Login to reply