Recently I was engaged to do a Proof of Concept (POC) for designing how airline flight schedules could be handled and managed. Having participated in other transport industry projects, I realized this could be of value in many routing scenarios (e.g., multiple-stop, linehaul trucking) and since the whole thing was so interesting, I decided to write about it.
In this article, I will propose a design for implementing airline schedules that should at least partially support international routing, at least sufficiently to get you started. Without a doubt, there will be those that consider alternate designs superior, but I believe that the setups and approach proposed here are quite extensible and potentially usable for applications of this nature. We will discuss the table structures, including the thought process that led to that design where possible, along with providing some sample data and examples of some specialized queries you might want to perform against the data provided.
Before you read the next sections, you may want to download the scripts available in the Resources section at the end of this article to create the tables and functions and populate them with the sample data. Run them in this order:
- Setup Airports.sql – Creates the Airlines and Airports tables. Populates Airlines with only the airlines needed for the examples in this article and all airports with some data cleansing activities included. The basis for the data included can be downloaded as the Global Airports Database, which is available under Common Public License.
- [Optional] Setup Airlines.sql – Sets up sample data for all airlines (or at least a long list) with an IATA code, plus a dummy airline (XX) at the end. The list obtained from Wikipedia included some duplicates which are inelegantly eliminated. Note that if you do want to run this script, you will not be able to do so after you’ve run step #3.
- Setup DailyFlightSchedules.sql – Creates our flight schedules table and populates the table with some reality-based flight schedules, excluding the sample flights for our dummy airline (inserted in step #5).
- CREATE Routines.sql – This creates all the FUNCTIONs and stored procedures described in the remainder of the article. You will also need one additional FUNCTION: DelimitedSplit8K, the world -famous, ultra-fast, community-improved, delimited string splitter championed by SQL MVP Jeff Moden. It has not been included (and should be created before running this script); because there have been frequent updates and improvements and we want to be sure you get the latest.
- Setup Dummy Flights.sql – Creates some dummy flights for our dummy airline to illustrate some examples. It also populates some additional data for the flights loaded initially into the DailyFlightSchedules table in step #3.
- [Optional] Example Queries.SQL – Most of the sample scripts presented in this article are included and should run directly without issue using this script. Run these scripts in concert with your reading of the article to save you a bit of time copying and pasting if you wish.
Note that these scripts and all of the examples that follow were tested on SQL Server 2008 R2 and some of the data types employed are not supported on earlier versions.
One additional notation regarding the statement in the second paragraph, that the proposed technique will “partially” support international routing. There is an extremely complicated wrinkle to this which is discussed but not implemented in the penultimate section of this article, including some suggested workarounds.
Structure of the Data
For our application, we will create 3 tables: Airlines, Airports and DailyFlightSchedules. Here is the DDL for the Airlines table:
CREATE TABLE Airlines (AirlineCode CHAR(2) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL PRIMARY KEY ,AirlineName VARCHAR(150) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL ,CallSign VARCHAR(150) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,Country VARCHAR(100) COLLATE SQL_Latin1_General_CP1_CI_AS NULL)
The columns in our Airlines table are:
- AirlineCode – Is the 2 character IATA airline code designator for the airline.
- AirlineName – Is the formal name of the airline.
- CallSign – Is the radio call sign used when flight or ground crews refer to a flight over the air.
- Country – Is the flagged country under which the airline operates.
In our implementation, the role of the Airlines table is to use AirlineCode as a foreign key later. Alternatively, you may want to define a Countries table and assign Country in the Airlines (and the Airports table that follows) using either the 2 (ISO 3166-1 alpha-2) or 3 (ISO 3166-1 alpha-3) alphanumeric character country codes, or the 3 digit (ISO 3166-1 numeric) country codes according to the referenced standards.
Here is the DDL for the Airports table:
CREATE TABLE Airports( ICAO_Code CHAR(4) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL, IATA_Code CHAR(3) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL PRIMARY KEY, [Name] VARCHAR(200) COLLATE SQL_Latin1_General_CP1_CI_AS NULL, [City] VARCHAR(200) COLLATE SQL_Latin1_General_CP1_CI_AS NULL, [State] CHAR(2) COLLATE SQL_Latin1_General_CP1_CI_AS NULL, [Country] VARCHAR(100) COLLATE SQL_Latin1_General_CP1_CI_AS NULL, [LatDeg] INT NULL, [LatMin] INT NULL, [LatSec] INT NULL, [LatDir] CHAR(1) NULL, [LongDeg] INT NULL, [LongMin] INT NULL, [LongSec] INT NULL, [LongDir] CHAR(1) NULL, [LatDec] AS (CAST(CASE LatDir WHEN 'N' THEN 1. WHEN 'S' THEN -1. END * LatDeg+LatMin/60.+LatSec/3600. AS FLOAT)) PERSISTED, [LongDec] AS(CAST(CASE LongDir WHEN 'E' THEN 1. WHEN 'W' THEN -1. END * LongDeg+LongMin/60.+LongSec/3600. AS FLOAT)) PERSISTED, [Altitude] FLOAT NULL, [UTC] CHAR(6))
The columns in our Airports table are:
- ICAO_Code – Is the four character ICAO airport designator. We won’t be using this information in our application but it is there (when available) if you choose to.
- IATA_Code – Is the three character IATA airport codes familiar to anyone that has ever flown on an airplane.
- Name – Is the formal name of the airport.
- City – Is the name of the city in which the airport is located using Latin-based characters. In other words, Asian and other countries that use non-Latin scripts will have their city names rendered in English, which is in any event the standard for the aviation industry.
- State – Is the US state abbreviation (sparsely populated in the sample data) in which the airport is located.
- Country – Is the name of the country in which the airport is located, again using Latin-based characters, with my apologies to Japan, China, Taiwan, Thailand, the Koreas, etc. for not including their native countries’ language script.
- LatDeg, LatMin, LatSec and LatDir – Are the latitude coordinates (degrees, minutes and seconds) and direction (north or south).
- LongDeg, LongMin, LongSec and LongDir – Are the longitude coordinates (degrees, minutes and seconds) and direction (east or west).
- LatDec and LongDec– Are the calculated, decimal equivalent of the latitude/longitude coordinates in a form most familiar to those that have worked with GPS systems. These computed columns are PERSISTED to minimize the number of times they will be calculated. If NULL, then this is most likely because the direction (either N/S for latitude or E/W for longitude) is coded as U (unknown).
- Altitude – Is the altitude of the airport, which will not be used in this application.
- UTC – Is the UTC time zone offset for the airport.
Case insensitive collation strings defined on various columns (Airlines and Airports) ensure that searches on these columns yield the correct results even if the search argument is lower or mixed case.
Latitude and longitude coordinates (in decimal) will be used to calculate the distance between our origins and destinations, while the UTC time zone offset is needed to calculate a flight’s segment time, because arrivals are typically expressed in local time. One might rightly argue that SQL’s spatial coordinates functionality could be used for the Airports’ latitude and longitude coordinates. The only thing I can say about this is that I did not choose to do so.
The decimal latitude/longitude columns are calculated (as computed columns) from degrees, minutes, seconds and direction because my source Airports table contained the coordinates in this traditional form. The source data also contained a lot of missing direction information (coded as U in the setup scripts), which we have tried to clean up as best we could, but certainly not perfectly. I have never checked whether Altitude is in feet or meters, or even whether it is in consistent units for all airports where it is present.
Here is the DDL for our DailyFlightSchedules table:
CREATE TABLE DailyFlightSchedules (SegmentID INT IDENTITY ,Airline CHAR(2) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL FOREIGN KEY REFERENCES Airlines(AirlineCode) ,Flight VARCHAR(4) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL ,Origin CHAR(3) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL FOREIGN KEY REFERENCES Airports(IATA_Code) ,Destination CHAR(3) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL FOREIGN KEY REFERENCES Airports(IATA_Code) ,DepartureTime TIME NOT NULL ,ArrivalTime TIME NOT NULL ,EffectiveStart DATETIME NULL ,EffectiveEnd DATETIME NULL ,[Day] CHAR(2) NULL ,[DayOfWeek] INT DEFAULT (127) -- Default = every day of the week ,SegmentTime INT DEFAULT(0) ,SegmentDistanceMiles INT NULL DEFAULT(0) ,SegmentDistanceKM AS (ROUND(SegmentDistanceMiles * 1.60934, 0)) ,FlightNo AS (Airline+Flight) PERSISTED ,DepartureDay AS (CASE LEFT([Day], 1) WHEN '1' THEN '+1' ELSE '' END) ,ArrivalDay AS (CASE RIGHT([Day], 1) WHEN '1' THEN '+1' ELSE '' END)) GO CREATE INDEX DFS ON DailyFlightSchedules (FlightNo, Origin, Destination, DepartureTime, ArrivalTime) INCLUDE([DayOfWeek]) GO
The columns in our DailyFlightSchedules table are:
- SegmentID – Is used to order the segments within a flight. Each successive segment should be inserted in order from first to last. Later we will derive a segment number within a particular flight, so you should not be concerned that these integers are an IDENTITY column.
- Airline – Is the airline’s IATA code/designator, with a foreign key constraint to the corresponding column in the Airlines table.
- Flight – Is a 3 or 4 character numeric only flight number.
- Origin – Is the origin for a flight segment and is a 3 character IATA airport code, thus we have added a foreign key constraint to the corresponding column in the Airports table.
- Destination – Is the destination for a flight segment and is a 3 character IATA airport code, thus we have added a foreign key constraint to the corresponding column in the Airports table.
- DepartureTime – Is the time that the flight segment departs expressed in local time for the UTC time offset of the origin’s airport.
- ArrivalTime – Is the time that the flight segment arrives expressed in local time for the UTC time offset of the destination’s airport.
- EffectiveStart and EffectiveEnd – Are used to define when a particular flight segment is operating. If NULL values are present in both columns it means that the flight operates at present and will for the foreseeable future.
- Day – Is a two character encoding, of 00, 01 or 11 that indicates the day offset of the time. Possibly you’ve seen a flight schedule that shows the arrival time as +1 (meaning the next calendar day after the departure). This is encoded as 01. More explanation is provided later.
- DayOfWeek – Is also an encoded value that represents the days of the week when a flight operates (not all flights operate seven days a week). The default value of 127 means that the flight operates on all days of the week. More explanation is provided later.
- SegmentTime – Is the elapsed time in minutes from origin to destination for the segment.
- SegmentDistanceMiles – Is the distance from origin to destination in miles for the segment.
- SegmentDistanceKM – Is the distance from origin to destination in KM for the segment, calculated from the segment miles multiplied by 1.60934.
- FlightNo – Is the 5 or 6 character alphanumeric flight number consisting of the airline designator (2 characters) and 3-4 digits for the flight number. Once again, if you’ve ever flown on an airplane, you probably have seen this rendering of the flight number before. This computed column is PERSISTED so that it can be used in the INDEX.
- DepartureDay and ArrivalDay – Are calculated columns that split the Day encoding into the more familiar +1 or blank notation that you might want to include on an itinerary.
Pundits may argue that the TIME data types could be replaced with 4-byte character strings. I found the TIME data type to be quite handy for those cases where math is needed to calculate flight times and not inconvenient for display purposes. I didn’t like the idea of breaking character strings apart, inserting a colon for presentation, casting back to TIME, etc. and ad nauseum.
Likewise our encoded columns may give the reader pause, but once you see how easy they are to manipulate, perhaps I will win you over.
The SegmentTime and the SegmentDistanceMiles columns cannot be computed columns, because that would require a sub-query within the compute formula. It could be argued that only one segment distance (either miles or KM) is needed depending on your application, but we decided to include both (KM is computed). We explain in the next section how all of these columns are calculated.
Sample Data, Applicable Business Rules and Data Setup Scripts
Firstly it is important to emphasize that the sample data provided for this application is just that. It is probably not good enough to use in a true production airports table for many reasons. The airports included are certainly not complete (only a subset of IATA airport codes are included) and the latitude and longitude are probably close in most cases (where available) but may not be sufficiently accurate for your needs. The UTC time zone offsets were pulled together from a variety of sources and for airports in countries that have a single time zone probably aren’t too bad. For countries with many time zones (unfortunately like the USA, Canada and Russia) they are probably not very good at all. In conclusion I would say that, if you have a better source of airports (and if you’re creating a production application you’d better), you should most certainly use it.
For our daily flight schedules, the same caveats apply. This is sample data that has a basis in the airline routes of Papua New Guinea (PNG), mainly for airlines Air Niugini (PX) and Airlines PNG (CG). There are a few dummy flights for a dummy airline (XX) to demonstrate international routing.
Just for the fun of it and in case you like exotic travel destinations or are a history buff, here is a map of Papua New Guinea so you can get your bearings. It is a scattering of island provinces just north of Australia, with names that are reminiscent of famous land and sea battles of World War II.
The provinces of PNG are scattered across many islands (not unlike the Philippines) and within any single island, cities are separated by mountainous terrain and poor highway infrastructure, so most local travel is by air. There are small airports everywhere. In our sample data, you’ll find pretty good data on most of the major ones. PNG is an interesting example because it is non-trivial in the sense that many of the flights have multiple segments, yet is manageable in size because we’ll be dealing with only about 48 airports. The capital of PNG and its largest city by population is Port Moresby (only Australians and the locals are likely to know that) and the IATA airport code is POM (go figure).
Now that we have our geography, history and cultural lessons behind us, let’s talk for a moment about the kinds of routes we’ll find in our sample data and the associated business rules.
- Flights are uniquely identified by the flight number and are composed of multiple segments serviced by the same aircraft. There is no actual limit on the number of segments in a flight but the typical limit is around four or five (our sample data has one flight with 8 segments).
- A flight does not usually originate and terminate at the same location, although the data structure does not preclude this. Typically, return flights that operate the same route in reverse are assigned a different flight number.
- Within a unique flight number a flight may operate on a different schedule (or itinerary) on different week days.
- A flight does not need to operate every day of the week. Typically a flight consisting of multiple segments operates all segments on the same days; however the data structure does not preclude dropping or adding a segment on any particular day (we’ll see an example of this later).
- Flights do not generally span more than two calendar days. That’s at least 25 hours of flying. The longest flight I’ve ever taken was Bangkok to New York and I believe that was about 20 hours or so. The non-stop flight that Singapore Airlines runs from Singapore to New York is a couple of hours longer and I believe it is the longest duration flight operating currently in the world. Long haul flights typically don’t have enough stops to extend the elapsed time sufficiently to span two calendar days.
- I’ve never heard of a flight that arrives on the day before it departs. Theoretically, I suppose it is possible given crossing the International Date Line in the right direction, at just the right time, on a sufficiently short flight in a really fast airplane. If anyone has a record of such a flight let me know and I’d be happy to apply my algorithms to it and see what happens.
- A flight segment’s distance is calculated as the crow flies, meaning that the endless circling you’ve been forced to endure around airports when arrivals are stacked up, don’t count towards your frequent flier mileage program.
Towards the middle of our script that sets up the data for our dummy flights, we see the following bit of code that populates the segment times and segment distances.
-- Correct the encoding for {Day] and calculate segment times, segment distances UPDATE a SET [Day]=CASE WHEN [Day] = '+1' THEN '01' WHEN [Day] IS NULL THEN '00' WHEN [Day] = ' ' THEN '00' ELSE [Day] END ,SegmentTime=b.SegmentTimeMinutes ,SegmentDistanceMiles=ROUND(e.Miles, 0) FROM DailyFlightSchedules a CROSS APPLY CalculateSegmentTime(a.Origin, a.Destination, a.DepartureTime, a.ArrivalTime ,CASE [Day] WHEN [Day] = '+1' THEN '01' WHEN [Day] IS NULL THEN '00' WHEN [Day] = ' ' THEN '00' ELSE [Day] END) bCROSS APPLY ( SELECT IATA_Code, LatDec, LongDec FROM Airports WHERE IATA_Code = a.Origin) c CROSS APPLY ( SELECT IATA_Code, LatDec, LongDec FROM Airports WHERE IATA_Code = a.Destination) d OUTER APPLY HaversineDistance(c.LatDec, c.LongDec, d.LatDec, d.LongDec) e
Our initial load of the table (all but the dummy flights) put a ‘+1’ into the [Day] column because we had this information available to record an arrival on the following day, so the UPDATE statement corrects our encoding format. Note that our convention for this table, specifically for a multi-segment flight where one or more of the segments departs after midnight on the following day, is to encode that departure/arrival [Day] as 11. We’ll show an example of this later.
The UPDATE also calls two Table Valued Functions: CalculateSegmentTime and HaversineDistance. Both of these in-line TVFs (iTVFs) are shown below. We can use CROSS APPLY to get our latitude and longitude coordinates because we’re sure each of our origin and destination airports are in the Airports table, due to the foreign key constraints on IATA_Code. However we must use OUTER APPLY to calculate the distance by the Haversine method because we’re not certain we always have coordinates for each airport.
The iTVF to calculate segment time of a route is shown below.
CREATE FUNCTION CalculateSegmentTime (@Origin VARCHAR(3) ,@Destination VARCHAR(3) ,@DepartureTime TIME ,@ArrivalTime TIME ,@Day CHAR(2)) RETURNS TABLE RETURN SELECT SegmentTimeMinutes=DATEDIFF(minute, CAST(CONVERT(VARCHAR(16), @DepartureTime + DepDate, 121) + c.UTC AS DATETIMEOFFSET), CAST(CONVERT(VARCHAR(16), @ArrivalTime + ArrDate, 121) + d.UTC AS DATETIMEOFFSET)) FROM ( SELECT CAST('1900-01-01' AS DATETIME) + CAST(LEFT(@Day, 1) AS INT) ,CAST('1900-01-01' AS DATETIME) + CAST(RIGHT(@Day, 1) AS INT) ) b (DepDate, ArrDate) CROSS APPLY ( SELECT IATA_Code, UTC FROM Airports WHERE IATA_Code = @Origin) c CROSS APPLY ( SELECT IATA_Code, UTC FROM Airports WHERE IATA_Code = @Destination) d
By adding the day offset (@Day) to a base DATETIME (1900-01-01) and then applying the UTC time zone offset from the Airports table, we are able to construct a DATETIMEOFFSET for arrival and departure times and then simply use the DATEDIFF function to calculate the segment time in minutes. Let’s do a quick check on our calculation using HKG and POM as our example:
SELECT FlightNo, Origin, Destination ,DepartureTime=CAST(DepartureTime AS VARCHAR(5)) ,ArrivalTime=CAST(ArrivalTime AS VARCHAR(5)) ,SegmentTime FROM DailyFlightSchedules WHERE FlightNo IN ('PX008', 'PX009') FlightNo Origin Destination DepartureTime ArrivalTime ArrivalDay SegmentTime PX008 POM HKG 16:35 20:55 380 PX009 HKG POM 22:55 07:35 +1 400
POM is +10:00 and HKG is +08:00 (time zone offsets). For the POM to HKG flight, the elapsed time appears to be 20:55 – 16:35 or 4 hours and 20 minutes (260 minutes), however since POM is +2 hours later the actual flight time is increased by 120 minutes, so the result of 380 minutes for the segment time shown is correct. In typical travel vernacular, you’ve “gained” 2 hours flying from west to east due to the time change.
The similar calculation for HKG to POM reverses this effect, so the apparent elapsed time (1:05 + 07:35 = 8:40 = 520 minutes) is reduced by 120 minutes to achieve a 400 minute segment time. In this case you’ve “lost” 2 hours flying from east to west. Differences in segment times in this case are likely due to the prevailing trade winds.
The code to calculate the Haversine distance between the origin and destination coordinates is shown below.
CREATE FUNCTION HaversineDistance (@OriginLat FLOAT ,@OriginLong FLOAT ,@DestLat FLOAT ,@DestLong FLOAT) RETURNS TABLE WITH SCHEMABINDING RETURN SELECT Miles=CASE -- Fudge the case when Lat/Long for Origin/Destination is the same to avoid -- possibly getting a floating point error. WHEN @OriginLat = @DestLat AND @OriginLong = @DestLong THEN CAST(0. AS FLOAT) ELSE -- Replace 3960 with 6371 for KM instead of miles (radius of the Earth) ( 3960. * ACOS( COS( RADIANS( @OriginLat ) ) * COS( RADIANS( @DestLat ) ) * COS( RADIANS( @DestLong ) - RADIANS( @OriginLong ) ) + SIN( RADIANS( @OriginLat ) ) * SIN( RADIANS( @DestLat ) ) ) ) END
The Haversine formula is used to calculate the distance between two points on a sphere. Technically the Earth is not a sphere, rather it is an oblate ellipsoid (the diameter through the poles is shorter than the diameter directly through the Earth along any two opposite points at the equator), and there are methods to adjust the Haversine formula based on latitude, which were not employed in this FUNCTION. The FUNCTION also does not take into account altitude differences between the origin and destination points. So if you have a more accurate method to calculate the difference between geographic coordinates on the Earth, by all means use it!
Towards the end of the script that sets up our daily flight schedules, we see the following code (many data rows are omitted for the sake of brevity) that encodes the operation days for each flight segment.
;WITH PNGFlights (FlightNo, Origin, Destination, DepartureTime, ArrivalTime ,MO, TU, WE, TH, FR, SA, SU) AS ( SELECT 'PX003','POM','BNE','06:30','09:25','YES','NO','YES','YES','NO','NO','YES' UNION ALL SELECT 'PX003','POM','BNE','09:30','12:25','NO','YES','NO','NO','NO','NO','NO' UNION ALL SELECT 'PX003','POM','BNE','12:15','15:15','NO','NO','NO','NO','NO','YES','NO' -- [snip] UNION ALL SELECT 'CG4817','EFG','POM','08:50','09:10','NO','NO','NO','NO','YES','NO','NO' UNION ALL SELECT 'CG4820','POM','KGW','08:00','08:25','NO','NO','YES','NO','NO','NO','NO' UNION ALL SELECT 'CG4820','KGW','MMV','08:35','08:45','NO','NO','YES','NO','NO','NO','NO' UNION ALL SELECT 'CG4823','MMV','POM','08:55','09:20','NO','NO','YES','NO','NO','NO','NO') UPDATE b SET [DayOfWeek]= CASE UPPER(SU) WHEN 'YES' THEN 1 ELSE 0 END + CASE UPPER(MO) WHEN 'YES' THEN 2 ELSE 0 END + CASE UPPER(TU) WHEN 'YES' THEN 4 ELSE 0 END + CASE UPPER(WE) WHEN 'YES' THEN 8 ELSE 0 END + CASE UPPER(TH) WHEN 'YES' THEN 16 ELSE 0 END + CASE UPPER(FR) WHEN 'YES' THEN 32 ELSE 0 END + CASE UPPER(SA) WHEN 'YES' THEN 64 ELSE 0 END FROM DailyFlightSchedules b INNER JOIN PNGFlights a ON a.FlightNo = b.FlightNo AND a.Origin = b.Origin AND a.Destination = b.Destination AND a.DepartureTime = CAST(b.DepartureTime AS VARCHAR(5)) AND a.ArrivalTime = CAST(b.ArrivalTime AS VARCHAR(5))
The encoding scheme for our example is based on @@DATEFIRST = 7, where DATEPART(dw, '2012-11-12') returns 2 (a Monday). You will see later how this is easily decoded.
Adding Additional Flights to the DailyFlightSchedules Table
If you look at the setup data for DailyFlightSchedules, you’ll see that each flight segment is INSERTed with its own individual INSERT statement. Because of the IDENTITY PRIMARY KEY on this table, we must ensure that each flight segment is inserted in the sequence that it occurs. We cannot easily use the departure time to accomplish this, because in cases where a departure of a segment occurs in the next day it becomes problematic (although not impossible). So we chose to ensure the ordering of our segments in this manner.
To INSERT a new flight segment with all the columns necessary to support queries against that row, we use an INSERT that looks like this:
INSERT INTO DailyFlightSchedules (Airline, Flight, Origin, Destination, DepartureTime, ArrivalTime ,EffectiveStart, EffectiveEnd, [Day], [DayOfWeek] ,SegmentTime, SegmentDistanceMiles) SELECT LEFT(FlightNo, 2), SUBSTRING(FlightNo, 2, LEN(FlightNo)), Origin, Destination ,DepartureTime, ArrivalTime ,EffectiveStart, EffectiveEnd, [Day], [DayOfWeek] ,SegmentTimeMinutes, Miles FROM ( VALUES('XX100', 'BKK', 'HKG', '22:00', '02:25', NULL, NULL, ' ', '+1' ,'YES', 'YES', 'YES', 'YES', 'NO', 'NO', 'NO') ) a (FlightNo, Origin, Destination, DepartureTime, ArrivalTime ,EffectiveStart, EffectiveEnd, DepartureDay, ArrivalDay ,MO, TU, WE, TH, FR, SA, SU) CROSS APPLY ( SELECT CASE UPPER(SU) WHEN 'YES' THEN 1 ELSE 0 END + CASE UPPER(MO) WHEN 'YES' THEN 2 ELSE 0 END + CASE UPPER(TU) WHEN 'YES' THEN 4 ELSE 0 END + CASE UPPER(WE) WHEN 'YES' THEN 8 ELSE 0 END + CASE UPPER(TH) WHEN 'YES' THEN 16 ELSE 0 END + CASE UPPER(FR) WHEN 'YES' THEN 32 ELSE 0 END + CASE UPPER(SA) WHEN 'YES' THEN 64 ELSE 0 END ,CASE RIGHT(DepartureDay, 1) WHEN '1' THEN '1' ELSE '0' END + CASE RIGHT(ArrivalDay, 1) WHEN '1' THEN '1' ELSE '0' END ) b([DayOfWeek], [Day]) CROSS APPLY CalculateSegmentTime(a.Origin, a.Destination, a.DepartureTime, a.ArrivalTime ,[Day]) c CROSS APPLY ( SELECT IATA_Code, LatDec, LongDec FROM Airports WHERE IATA_Code = a.Origin) d CROSS APPLY ( SELECT IATA_Code, LatDec, LongDec FROM Airports WHERE IATA_Code = a.Destination) e OUTER APPLY HaversineDistance(d.LatDec, d.LongDec, e.LatDec, e.LongDec) f
If this appears a bit unwieldy to you (it does to me), simply encapsulate the logic in a stored procedure to which you pass the VALUES to be INSERTed.
Note that the above code assumes your UI has captured the operating days of the week (MO, TU, WE, etc.) as a YES/NO checkbox. You may want your UI to capture that information in an alternative manner, e.g., as a string that looks like ‘MOTUWETHFRSASU’ (or some other way), so in that case simply modify the operation in derived table b accordingly.
Searching on Flight Origins/Destinations and Departure/Arrival Dates
Our first goal is to create a function that supports searching on origins and destinations, and also by departure date. Let’s first look at the heart of this query, which is to identify unique flights taking into account those that have different itineraries on different week days, and also to add a segment number. The SegmentID column in the table is not suitable as our segment number because we really want to see a natural number from 1, 2, ..., n.
Let’s look at the following basic query for this purpose.
WITH Flights AS ( SELECT SegmentID, FlightNo, Origin, Destination ,DepartureTime ,ArrivalTime ,[Day] -- Decode the [DayOfWeek] saved in the table ,[DaysOfWeek]=CASE [DayOfWeek] % 2 WHEN 1 THEN 'SU' ELSE '^^' END + CASE ([DayOfWeek]/2) % 2 WHEN 1 THEN 'MO' ELSE '^^' END + CASE ([DayOfWeek]/4) % 2 WHEN 1 THEN 'TU' ELSE '^^' END + CASE ([DayOfWeek]/8) % 2 WHEN 1 THEN 'WE' ELSE '^^' END + CASE ([DayOfWeek]/16) % 2 WHEN 1 THEN 'TH' ELSE '^^' END + CASE ([DayOfWeek]/32) % 2 WHEN 1 THEN 'FR' ELSE '^^' END + CASE ([DayOfWeek]/64) % 2 WHEN 1 THEN 'SA' ELSE '^^' END ,DepartureDay, ArrivalDay -- The PARTITION below identifies cases where the same flight number -- operates on a different schedule on different days of the week ,SegmentNo=ROW_NUMBER() OVER ( PARTITION BY FlightNo, [DayOfWeek] ORDER BY SegmentID) FROM DailyFlightSchedules) SELECT FlightNo, SegmentNo ,Origin, Destination ,DepartureTime=CAST(DepartureTime AS VARCHAR(5)) ,ArrivalTime=CAST(ArrivalTime AS VARCHAR(5)) ,ArrivalDay ,[DaysOfWeek] FROM Flights WHERE FlightNo IN ('PX009','PX011','PX054','PX055','CG100', 'CG1403', 'CG4352')
The results displayed include the following information:
FlightNo SegmentNo Origin Destination DepartureTime ArrivalTime ArrivalDay DaysOfWeek CG100 1 POM CNS 15:40 17:50 SU^^^^^^^^^^^^ CG100 1 POM CNS 17:30 19:40 ^^MOTUWETHFR^^ CG1403 1 TBG UNG 07:45 08:10 SU^^^^^^^^^^^^ CG1403 2 UNG HGU 08:40 09:35 SU^^^^^^^^^^^^ CG1403 3 HGU POM 10:05 11:30 SU^^^^^^^^^^^^ CG4352 1 BOT OBX 09:10 09:30 SU^^^^^^^^^^^^ CG4352 2 OBX SKC 09:40 10:05 SU^^^^^^^^^^^^ CG4352 3 SKC DAU 10:15 11:05 SU^^^^^^^^^^^^ CG4352 4 DAU OPU 11:25 11:55 SU^^^^^^^^^^^^ CG4352 5 OPU KRI 12:05 12:50 SU^^^^^^^^^^^^ CG4352 6 KRI POM 13:10 14:45 SU^^^^^^^^^^^^ CG4352 1 POM KRI 09:00 10:35 ^^^^^^^^^^^^SA CG4352 2 KRI OPU 10:55 11:40 ^^^^^^^^^^^^SA CG4352 3 OPU DAU 11:50 12:20 ^^^^^^^^^^^^SA CG4352 4 DAU WPM 12:40 13:00 ^^^^^^^^^^^^SA CG4352 5 WPM SKC 13:10 13:50 ^^^^^^^^^^^^SA CG4352 6 SKC OBX 14:00 14:25 ^^^^^^^^^^^^SA CG4352 7 OBX LMY 14:35 15:00 ^^^^^^^^^^^^SA CG4352 8 LMY UNG 15:10 15:40 ^^^^^^^^^^^^SA PX009 1 HKG POM 22:55 07:35 +1 ^^^^TU^^^^^^^^ PX011 1 MNL POM 21:45 05:05 +1 ^^^^^^WE^^^^SA PX054 1 POM NRT 14:35 20:15 ^^^^^^WE^^^^SA PX055 1 NRT POM 21:25 04:30 +1 ^^^^^^WE^^^^SA
Notice how flight CG100 has two departures at different times on Sunday vs. Monday through Friday, and flight CG1403 has 3 segments departing on Sundays only. CG4352 is also interesting because on Sundays it operates 6 segments but on Saturdays it operates 8 segments. CG4352 doesn’t even start at the same initial origin or end at the same final destination on the two operating days!
The remaining flights are examples of international flights, some of which arrive the next calendar day. Note the ArrivalDay (computed) column displays the encoded [Day] column in a format suitable for use in producing a passenger itinerary.
As a further example, let’s modify this query to handle effective dates based on a local variable (@DepartureDate), which we will rename to OperatingDate in a CROSS APPLY. Ultimately we’ll modify this SQL to return OperatingDate as an intermediate column from the CTE.
DECLARE @DepartureDate DATETIME = NULL -- '2013-01-06' ;WITH Flights AS ( SELECT SegmentID, FlightNo, Origin, Destination ,DepartureTime ,ArrivalTime ,[Day] ,[DaysOfWeek]=CASE [DayOfWeek] % 2 WHEN 1 THEN 'SU' ELSE '^^' END + CASE ([DayOfWeek]/2) % 2 WHEN 1 THEN 'MO' ELSE '^^' END + CASE ([DayOfWeek]/4) % 2 WHEN 1 THEN 'TU' ELSE '^^' END + CASE ([DayOfWeek]/8) % 2 WHEN 1 THEN 'WE' ELSE '^^' END + CASE ([DayOfWeek]/16) % 2 WHEN 1 THEN 'TH' ELSE '^^' END + CASE ([DayOfWeek]/32) % 2 WHEN 1 THEN 'FR' ELSE '^^' END + CASE ([DayOfWeek]/64) % 2 WHEN 1 THEN 'SA' ELSE '^^' END ,DepartureDay, ArrivalDay -- The PARTITION below identifies cases where the same flight number -- operates on a different schedule on different days of the week ,SegmentNo=ROW_NUMBER() OVER ( PARTITION BY FlightNo, [DayOfWeek] ORDER BY SegmentID) FROM DailyFlightSchedules OUTER APPLY (SELECT @DepartureDate) b(OperatingDate) WHERE (OperatingDate IS NULL OR EffectiveStart IS NULL OR OperatingDate >= EffectiveStart) AND (OperatingDate IS NULL OR EffectiveEnd IS NULL OR OperatingDate <= EffectiveEnd)) SELECT FlightNo, SegmentNo ,Origin, Destination ,DepartureTime=CAST(DepartureTime AS VARCHAR(5)) ,ArrivalTime=CAST(ArrivalTime AS VARCHAR(5)) ,ArrivalDay FROM Flights WHERE FlightNo IN ('XX001','XX002','XX003')
This has now become a catch all query, which you can read about in the linked article by SQL MCM Gail Shaw. Since ultimately we want our search function to be defined in a TVF, we cannot use the dynamic SQL approach described in that article.
If you test with the same flights as in the original query and NULL for @DepartureDate, you’ll get the same results as the first query. With the dummy flights (for airline XX) you can experiment with different dates to see that only appropriate flights appear for the selected departure date. For example, with @DepartureDate set to NULL, we get all three of the selected flights from BKK to POM. With @DepartureDate set to 2013-01-06 we get only the following two flights, because the effective dates of flight number XX002 indicate that it does not operate on the @DepartureDate.
FlightNo SegmentNo Origin Destination DepartureTime ArrivalTime ArrivalDay XX001 1 BKK POM 08:00 19:25 XX003 1 BKK POM 09:00 20:25
Developing this example further, we came up with the following FUNCTION than supports searching on origin, destination and departure dates.
CREATE FUNCTION FlightSchedule (@Airline CHAR(2) = NULL ,@Origin VARCHAR(3) = NULL ,@Destination VARCHAR(3) = NULL ,@DepartureDate DATETIME = NULL) RETURNS TABLE RETURN -- Select all flight segments with corresponding departure and arrival times WITH Flights AS ( SELECT SegmentID, FlightNo, Origin, Destination ,DepartureTime ,ArrivalTime ,[Day] -- Format for display ,[DaysOfWeek]=CASE [DayOfWeek] % 2 WHEN 1 THEN 'SU' ELSE '^^' END + CASE ([DayOfWeek]/2) % 2 WHEN 1 THEN 'MO' ELSE '^^' END + CASE ([DayOfWeek]/4) % 2 WHEN 1 THEN 'TU' ELSE '^^' END + CASE ([DayOfWeek]/8) % 2 WHEN 1 THEN 'WE' ELSE '^^' END + CASE ([DayOfWeek]/16) % 2 WHEN 1 THEN 'TH' ELSE '^^' END + CASE ([DayOfWeek]/32) % 2 WHEN 1 THEN 'FR' ELSE '^^' END + CASE ([DayOfWeek]/64) % 2 WHEN 1 THEN 'SA' ELSE '^^' END ,[DayOfWeek] ,DepartureDay, ArrivalDay -- The PARTITION below identifies cases where the same flight number -- operates on a different schedule on different days of the week ,SegmentNo=ROW_NUMBER() OVER ( PARTITION BY FlightNo, [DayOfWeek] ORDER BY SegmentID) ,SegmentTime ,SegmentDistanceMiles ,SegmentDistanceKM ,OperatingDate ,Airline FROM DailyFlightSchedules -- Empty string as @DepartureDate is handled as a NULL (no) value -- Drop the time portion of any actual date supplied. OUTER APPLY ( SELECT ISNULL(DATEADD(day, 0, DATEDIFF(day, 0, @DepartureDate)), '') ) b(OperatingDate) WHERE (OperatingDate = '' OR EffectiveStart IS NULL OR OperatingDate >= EffectiveStart) AND (OperatingDate = '' OR EffectiveEnd IS NULL OR OperatingDate <= EffectiveEnd)) SELECT a.FlightNo, SegmentNo, a.Origin, a.Destination ,DepartureTime=CASE WHEN OperatingDate = '' THEN CONVERT(VARCHAR(5), a.DepartureTime, 108) WHEN OperatingDate <> '' AND InitialDeparture > 1 AND a.SegmentNo > 1 AND [Day] = '11' THEN CONVERT(VARCHAR(10), OperatingDate, 121) + ' ' + CAST(a.ArrivalTime AS VARCHAR(5)) ELSE CONVERT(VARCHAR(16) ,DATEADD(day, CAST(LEFT([Day], 1) AS INT), OperatingDate) + ' ' + CAST(a.DepartureTime AS VARCHAR(5)),121) END ,ArrivalTime=CASE WHEN OperatingDate = '' THEN CONVERT(VARCHAR(5), a.ArrivalTime, 108) WHEN OperatingDate <> '' AND InitialDeparture > 1 AND a.SegmentNo > 1 AND [Day] = '11' THEN CONVERT(VARCHAR(10), OperatingDate, 121) + ' ' + CAST(a.ArrivalTime AS VARCHAR(5)) ELSE CONVERT(VARCHAR(16) ,DATEADD(day, CAST(RIGHT([Day], 1) AS INT), OperatingDate) + ' ' + CAST(a.ArrivalTime AS VARCHAR(5)),121) END ,DepartureDay=CASE WHEN OperatingDate IS NULL THEN DepartureDay ELSE ' ' END ,ArrivalDay=CASE WHEN OperatingDate IS NULL THEN ArrivalDay ELSE ' ' END -- Decode the DayOfWeek INT to a string of operating days ,DaysOfWeek ,SegmentTime ,SegmentDistanceMiles ,SegmentDistanceKM FROM Flights a INNER JOIN ( SELECT FlightNo, [DayOfWeek] ,InitialDeparture=CASE -- No (NULL) or empty string as origin supplied so include -- the earliest segment WHEN ISNULL(@Origin, '') = '' THEN MIN(SegmentNo) -- If % not in @Origin, we find the first segment that's an exact match -- to @Origin WHEN CHARINDEX('%', @Origin) = 0 THEN MIN(CASE WHEN Origin = @Origin THEN SegmentNo END) -- If % in @Origin, do a LIKE search to identify the first matching segment ELSE MIN(CASE WHEN Origin LIKE @Origin THEN SegmentNo END) END ,FinalArrival=CASE -- No (NULL) or empty string as destination supplied so include -- the earliest segment WHEN ISNULL(@Destination, '') = '' THEN MAX(SegmentNo) -- If % not in @Destination, we find the last segment that's an exact match -- to @Destination WHEN CHARINDEX('%', @Destination) = 0 THEN MAX(CASE WHEN Destination = @Destination THEN SegmentNo END) -- If % in @Destination, do a LIKE search to identify the last matching segment ELSE MAX(CASE WHEN Destination LIKE @Destination THEN SegmentNo END) END FROM Flights GROUP BY FlightNo, [DayOfWeek]) b ON a.FlightNo = b.FlightNo AND a.[DayOfWeek] = b.[DayOfWeek] WHERE -- Select only segments between the search criteria a.SegmentNo BETWEEN InitialDeparture AND FinalArrival AND -- Empty string as @Airline is handled as a NULL (no) value (ISNULL(@Airline, '') = '' OR Airline = @Airline) AND -- OperatingDate IS NULL if no @DepartureDate was supplied (OperatingDate = '' OR -- Decode the days of the week on which the flight operates -- Note: Sensitive to @@DATEFIRST (b.[DayOfWeek]/POWER(2, DATEPART(dw, OperatingDate) -1)) % 2 = 1)
Despite its seeming complexity, it remains an iTVF. Look at how InitialDeparture and FinalArrival are calculated and then used in the final WHERE clause. This method discards flight segments if they are not contained within the @Origin and @Destination parameters, while preserving leading segments when only @Destination is provided and trailing segments when only @Origin is provided.
Now let’s run some tests on this FUNCTION.
In the setup data, there is flight CG100, which departs at two different times from POM to CNS (Cairns, Australia). The later departure time operates five days a week, while the earlier departure operates on Sundays only. We can run the following query (including a few additional international flights) and see the results.
SELECT FlightNo, Origin, Destination ,DepartureTime, ArrivalTime, ArrivalDay, DaysOfWeek FROM FlightSchedule(NULL, NULL, NULL, NULL) WHERE FlightNo IN ('CG100', 'PX009', 'PX011', 'PX055') FlightNo Origin Destination DepartureTime ArrivalTime ArrivalDay DaysOfWeek CG100 POM CNS 17:30 19:40 ^^MOTUWETHFR^^ CG100 POM CNS 15:40 17:50 SU^^^^^^^^^^^^ PX009 HKG POM 22:55 07:35 +1 ^^^^TU^^^^^^^^ PX011 MNL POM 21:45 05:05 +1 ^^^^^^WE^^^^SA PX055 NRT POM 21:25 04:30 +1 ^^^^^^WE^^^^SA
We can pass an additional argument to FlightSchedule to include only flights for a selected airline.
SELECT FlightNo, Origin, Destination , DepartureTime, ArrivalTime, ArrivalDay, DaysOfWeek FROM FlightSchedule('CG', NULL, NULL, NULL) WHERE FlightNo IN ('CG100', 'PX009', 'PX011', 'PX055') FlightNo Origin Destination DepartureTime ArrivalTime ArrivalDay DaysOfWeek CG100 POM CNS 17:30 19:40 ^^MOTUWETHFR^^ CG100 POM CNS 15:40 17:50 SU^^^^^^^^^^^^
Let’s now run the same query, where we have dropped ArrivalDay because it won’t be needed, and look at the results when we pass in a Wednesday departure date. Our results show only flights that operate (depart) on Wednesdays. Note how the ArrivalTime’s date has been adjusted in cases where the arrival is the next calendar day.
SELECT FlightNo, Origin, Destination ,DepartureTime, ArrivalTime, DaysOfWeek FROM FlightSchedule(NULL, NULL, NULL, '2012-11-14') WHERE FlightNo IN ('CG100', 'PX009', 'PX011', 'PX055') FlightNo Origin Destination DepartureTime ArrivalTime DaysOfWeek CG100 POM CNS 2012-11-14 17:30 2012-11-14 19:40 ^^MOTUWETHFR^^ PX011 MNL POM 2012-11-14 21:45 2012-11-15 05:05 ^^^^^^WE^^^^SA PX055 NRT POM 2012-11-14 21:25 2012-11-15 04:30 ^^^^^^WE^^^^SA
It should be apparent but worth mentioning anyway, that the DepartureTime and ArrivalTime columns returned by both the FlightSchedule and FlightItinerary iTVFs are VARCHAR data type. These can easily be CAST or assigned to TIME or DATETIME data types as need be and without incident.
The flights with multiple segments make for some interesting results, so let’s look at a couple of those.
SELECT FlightNo, SegmentNo, Origin, Destination ,DepartureTime, ArrivalTime, DaysOfWeek FROM FlightSchedule(NULL, NULL, NULL, NULL) WHERE FlightNo IN ('CG1411', 'CG1616') FlightNo SegmentNo Origin Destination DepartureTime ArrivalTime DaysOfWeek CG1411 1 TBG UNG 08:00 08:25 ^^^^TU^^TH^^SA CG1411 2 UNG HGU 08:55 09:50 ^^^^TU^^TH^^SA CG1411 3 HGU GKA 10:20 10:50 ^^^^TU^^TH^^SA CG1411 4 GKA LAE 11:20 11:55 ^^^^TU^^TH^^SA CG1616 1 HGU GKA 08:45 09:15 ^^^^^^^^^^FR^^ CG1616 2 GKA MAG 09:40 10:10 ^^^^^^^^^^FR^^ CG1616 3 MAG WWK 10:35 11:25 ^^^^^^^^^^FR^^
If we are interested in searching for flights originating in GKA, we can run the following query.
SELECT FlightNo, SegmentNo, Origin, Destination ,DepartureTime, ArrivalTime, DaysOfWeek FROM FlightSchedule(NULL, 'GKA', NULL, NULL) FlightNo SegmentNo Origin Destination DepartureTime ArrivalTime DaysOfWeek CG1616 2 GKA MAG 09:40 10:10 ^^^^^^^^^^FR^^ CG1616 3 MAG WWK 10:35 11:25 ^^^^^^^^^^FR^^ CG1411 4 GKA LAE 11:20 11:55 ^^^^TU^^TH^^SA CG1410 2 GKA HGU 13:35 14:05 SUMOTUWETHFRSA PX961 1 GKA POM 10:45 11:55 SUMOTUWETHFRSA PX963 1 GKA POM 16:25 17:35 SUMOTUWETHFRSA
You can see in the above that flight CG1410 originates in GKA at segment 2. Likewise CG1411 originates at GKA in segment 4. In both cases, earlier segments don’t apply to our search criteria so they are excluded. Flights CG1616 also originates somewhere before GKA, so the segment with GKA as the origin is segment 2. Flights PX961 and PX963 are single segment (non-stop) flights originating in GKA.
We can do a similar query listing only flights that terminate in GKA.
SELECT FlightNo, SegmentNo, Origin, Destination ,DepartureTime, ArrivalTime, DaysOfWeek FROM FlightSchedule(NULL, NULL, 'GKA', NULL) FlightNo SegmentNo Origin Destination DepartureTime ArrivalTime DaysOfWeek CG1616 1 HGU GKA 08:45 09:15 ^^^^^^^^^^FR^^ CG1410 1 LAE GKA 12:30 13:05 ^^^^TU^^TH^^SA CG1411 1 TBG UNG 08:00 08:25 ^^^^TU^^TH^^SA CG1411 2 UNG HGU 08:55 09:50 ^^^^TU^^TH^^SA CG1411 3 HGU GKA 10:20 10:50 ^^^^TU^^TH^^SA PX960 1 POM GKA 09:15 10:25 SUMOTUWETHFRSA PX962 1 POM GKA 14:55 16:05 SUMOTUWETHFRSA
In this case, flight CG1410 has 3 additional segments that are not shown after it arrives in GKA. Since we have not supplied an origin, flight CG1411 lists all segments prior to the termination point of GKA.
Searching on both @Origin and @Destination limits the segments listed to only those that meet the parameterized query criteria.
Let’s look at one additional (dummy) flight example for international flights to compare the Day encoding with the results that we observe.
-- Raw data for flights XX004 and XX005 SELECT FlightNo, SegmentNo, Origin, Destination ,DepartureTime, DepartureDay ,ArrivalTime, ArrivalDay, SegmentTime FROM FlightSchedule(NULL, NULL, NULL, NULL) WHERE FlightNo IN ('XX004', 'XX005') FlightNo SegmentNo Origin Destination DepartureTime DepartureDay ArrivalTime ArrivalDay SegmentTime XX004 1 HKG LAX 23:45 20:15 750 XX004 2 LAX JFK 23:35 08:00 +1 325 XX005 1 HKG LAX 23:45 23:15 930 XX005 2 LAX JFK 01:35 +1 08:00 +1 205
-- Show departures for both flights regardless of origin -- for the departure date SELECT FlightNo, SegmentNo, Origin, Destination ,DepartureTime, ArrivalTime, SegmentTime FROM FlightSchedule(NULL, NULL, NULL, '2012-11-15') WHERE FlightNo IN ('XX004', 'XX005') FlightNo SegmentNo Origin Destination DepartureTime ArrivalTime SegmentTime XX004 1 HKG LAX 2012-11-15 23:45 2012-11-15 20:15 750 XX004 2 LAX JFK 2012-11-15 23:35 2012-11-16 08:00 325 XX005 1 HKG LAX 2012-11-15 23:45 2012-11-15 23:15 930 XX005 2 LAX JFK 2012-11-16 01:35 2012-11-16 08:00 205
-- Show departures from LAX as the origin for departure date SELECT FlightNo, SegmentNo, Origin, Destination ,DepartureTime, ArrivalTime, SegmentTime FROM FlightSchedule(NULL, 'LAX', NULL, '2012-11-15') FlightNo SegmentNo Origin Destination DepartureTime ArrivalTime SegmentTime XX004 2 LAX JFK 2012-11-15 23:35 2012-11-16 08:00 325 XX005 2 LAX JFK 2012-11-15 08:00 2012-11-15 08:00 205
Note that segment 2 of XX005 is encoded as 11. The importance of the last example is to show that segments that depart +1 day after the origin segment show properly when searching on a departure date.
In this case, there’s a big discrepancy on segment time because of the mocked up data, however the example does serve to show how to encode a segment when both departure and arrival times are the day after the originating flight.
Finally, some final points for conclusion:
- It is also possible to search on a wildcard origin and/or destination like LA%.
- When the departure date (operating date) is specified, only flights that operate on that particular weekday are listed.
- When the departure date (operating date) is specified, the DepartureDay and ArrivalDay columns always return blanks, because once the date is prepended to departure/arrival time, there is no need to show the +1 case for display.
Search and Display a Collapsed Itinerary
Instead of showing each segment of the flight in its own row, sometimes it is useful to show only one row per flight with a collapsed itinerary that shows all stops along the flight path, but only shows the times for the initial origin’s departure and the final destination’s arrival. Since our original TVF does the heavy lifting of identifying each flight and searching by parameter, we’ll use it to construct a new iTVF that shows collapsed itineraries.
CREATE FUNCTION FlightItinerary (@Airline CHAR(2) = NULL ,@Origin VARCHAR(3) = NULL ,@Destination VARCHAR(3) = NULL ,@DepartureDate DATETIME = NULL) RETURNS TABLE RETURN -- Select all flights from the FlightSchedule TVF using the -- same parameters passed to the FlightItinerary TVF WITH FlightSched AS ( SELECT FlightNo, SegmentNo ,Origin, Destination, DepartureTime, ArrivalTime, DepartureDay, ArrivalDay ,DaysOfWeek, SegmentTime, Miles=SegmentDistanceMiles, KM=SegmentDistanceKM FROM FlightSchedule(@Airline, @Origin, @Destination, @DepartureDate)) SELECT FlightNo ,Itinerary, DepartureTime, ArrivalTime, DepartureDay, ArrivalDay ,DaysOfWeek, TotalFlightTime, TotalMiles, TotalKM ,ElapsedTime=SegmentTimeMinutes FROM ( SELECT FlightNo -- Itinerary is the flattened list of 1st origin + all destinations ,Itinerary=( SELECT TOP 1 Origin + '-' FROM FlightSched b WHERE a.FlightNo = b.FlightNo AND a.DaysOfWeek = b.DaysOfWeek ORDER BY SegmentNo) + STUFF(( SELECT '-' + Destination FROM FlightSched b WHERE a.FlightNo = b.FlightNo AND a.DaysOfWeek = b.DaysOfWeek ORDER BY SegmentNo FOR XML PATH('')), 1, 1, '') -- Initial departure time is departure time for the first segment ,DepartureTime=( SELECT TOP 1 DepartureTime FROM FlightSched b WHERE a.FlightNo = b.FlightNo AND a.DaysOfWeek = b.DaysOfWeek ORDER BY SegmentNo) -- Final arrival time is arrival time for the last segment ,ArrivalTime=( SELECT TOP 1 ArrivalTime FROM FlightSched b WHERE a.FlightNo = b.FlightNo AND a.DaysOfWeek = b.DaysOfWeek ORDER BY SegmentNo DESC) -- Initial departure's departure day is from the first segment ,DepartureDay=( SELECT TOP 1 DepartureDay FROM FlightSched b WHERE a.FlightNo = b.FlightNo AND a.DaysOfWeek = b.DaysOfWeek ORDER BY SegmentNo) -- Final arrival's arrival day is from the last segment ,ArrivalDay=( SELECT TOP 1 ArrivalDay FROM FlightSched b WHERE a.FlightNo = b.FlightNo AND a.DaysOfWeek = b.DaysOfWeek ORDER BY SegmentNo DESC) ,DaysOfWeek ,TotalFlightTime=SUM(SegmentTime) ,TotalMiles=SUM(Miles) ,TotalKM=SUM(KM) FROM FlightSched a GROUP BY FlightNo, DaysOfWeek) a -- Create the encoded [Day] to pass to the next CROSS APPLY CROSS APPLY ( SELECT LEFT(Itinerary, 3), RIGHT(Itinerary, 3) ,CASE RIGHT(DepartureDay, 1) WHEN '1' THEN '1' ELSE '0' END + CASE RIGHT(ArrivalDay, 1) WHEN '1' THEN '1' ELSE '0' END ) b(Origin, Destination, [Day]) -- This CROSS APPLY calculates the total "segment" time or really the total time from -- the initial origin to the final destination CROSS APPLY CalculateSegmentTime(Origin, Destination, DepartureTime, ArrivalTime, [Day]) c
As you can see, this was pretty straightforward so our explanation is limited to the comments appearing in the code. Note that this iTVF is passed the identical arguments supplied to FlightSchedule.
Let’s give the FlightItinerary FUNCTION a try and compare the results to those of FlightSchedule:
SELECT FlightNo, Itinerary, DepartureTime, ArrivalTime ,ArrivalDay, DaysOfWeek FROM FlightItinerary(NULL, NULL, NULL, NULL) WHERE FlightNo IN ('CG1411', 'CG1616') FlightNo Itinerary DepartureTime ArrivalTime ArrivalDay DaysOfWeek CG1411 TBG-UNG-HGU-GKA-LAE 08:00 11:55 ^^^^TU^^TH^^SA CG1616 HGU-GKA-MAG-WWK 08:45 11:25 ^^^^^^^^^^FR^^
SELECT FlightNo, SegmentNo, Origin, Destination ,DepartureTime, ArrivalTime, DaysOfWeek FROM FlightSchedule(NULL, NULL, NULL, NULL) WHERE FlightNo IN ('CG1411', 'CG1616') FlightNo SegmentNo Origin Destination DepartureTime ArrivalTime DaysOfWeek CG1411 1 TBG UNG 08:00 08:25 ^^^^TU^^TH^^SA CG1411 2 UNG HGU 08:55 09:50 ^^^^TU^^TH^^SA CG1411 3 HGU GKA 10:20 10:50 ^^^^TU^^TH^^SA CG1411 4 GKA LAE 11:20 11:55 ^^^^TU^^TH^^SA CG1616 1 HGU GKA 08:45 09:15 ^^^^^^^^^^FR^^ CG1616 2 GKA MAG 09:40 10:10 ^^^^^^^^^^FR^^ CG1616 3 MAG WWK 10:35 11:25 ^^^^^^^^^^FR^^
Another interesting query is to look ahead to see what flights between origin and destination are operating within the next 7 days.
DECLARE @Days INT = 7 ,@StartingDay DATETIME = GETDATE() -- Today is: 2012-11-15 08:00 ,@Origin CHAR(3) = 'POM' ,@Destination CHAR(3) = 'WTP' ;WITH Calendar (OperatingDate) AS ( SELECT TOP (@Days) DATEADD(day ,ROW_NUMBER() OVER (ORDER BY (SELECT NULL))-1 ,@StartingDay) FROM (VALUES ($),($),($),($),($),($),($),($),($),($)) a (n1) CROSS JOIN (VALUES ($),($),($),($),($),($),($),($),($),($)) b(n2)) SELECT FlightNo -- Use next 3 columns for FlightSchedule --,SegmentNo, Origin, Destination -- Use the next column for FlightItinerary ,Itinerary ,DepartureTime, ArrivalTime ,WD=UPPER(LEFT(DATENAME(dw, OperatingDate), 2)) ,DaysOfWeek FROM Calendar --CROSS APPLY FlightSchedule(NULL, @Origin, @Destination, OperatingDate) CROSS APPLY FlightItinerary(NULL, @Origin, @Destination, OperatingDate)
WD is the weekday of the operating (departure) date, shown to confirm that the flight operates on that day (you can check this against DaysOfWeek). In this query, you can replace the Calendar CTE with a calendar table should you have access to one. Results are reported starting on the current date.
FlightNo Itinerary DepartureTime ArrivalTime WD DaysOfWeek CG4806 POM-ONB-WTP 2012-11-15 06:30 2012-11-15 07:20 TH ^^^^TU^^TH^^^^ CG4802 POM-FNE-WTP 2012-11-19 06:30 2012-11-19 07:20 MO ^^MO^^^^^^^^^^ CG4806 POM-ONB-WTP 2012-11-20 06:30 2012-11-20 07:20 TU ^^^^TU^^TH^^^^
The FlightSchedule FUNCTION can also be used in this context (replace the Itinerary column with SegmentNo, Origin, Destination) to show the individual flight segment records. Here is the output with those modifications made:
FlightNo SegmentNo Origin Destination DepartureTime ArrivalTime WD DaysOfWeek CG4806 1 POM ONB 2012-11-15 06:30 2012-11-15 07:00 TH ^^^^TU^^TH^^^^ CG4806 2 ONB WTP 2012-11-15 07:10 2012-11-15 07:20 TH ^^^^TU^^TH^^^^ CG4802 1 POM FNE 2012-11-19 06:30 2012-11-19 07:00 MO ^^MO^^^^^^^^^^ CG4802 2 FNE WTP 2012-11-19 07:10 2012-11-19 07:20 MO ^^MO^^^^^^^^^^ CG4806 1 POM ONB 2012-11-20 06:30 2012-11-20 07:00 TU ^^^^TU^^TH^^^^ CG4806 2 ONB WTP 2012-11-20 07:10 2012-11-20 07:20 TU ^^^^TU^^TH^^^^
The FlightItinerary FUNCTION also returns four additional columns of interest:
- TotalFlightTime – Is the total time for all segments of the flight excluding layovers.
- TotalMiles – Is the total miles flown point to point to point.
- TotalKM – Is the total KM flown point to point to point.
- ElapsedTime – Is the total time for the flight including segments and layovers.
Display Mileage from Point to Point
Often times we are presented with a table that lists mileage between a series of points. This can easily be constructed knowing a list of points (airports) and using the HaversineDistance function to calculate the miles between them.
To do this, we’ll use the standard cross tab query approach, demonstrated below for five airports.
DECLARE @Airports VARCHAR(8000) = 'BKK;HKG;CNS;POM;LAX' SELECT [Origin/Destination]=Origin ,BKK=MAX(CASE Destination WHEN 'BKK' THEN Miles END) ,CNS=MAX(CASE Destination WHEN 'CNS' THEN Miles END) ,HKG=MAX(CASE Destination WHEN 'HKG' THEN Miles END) ,LAX=MAX(CASE Destination WHEN 'LAX' THEN Miles END) ,POM=MAX(CASE Destination WHEN 'POM' THEN Miles END) FROM ( SELECT Origin=a.Item, Destination=b.Item, Miles=ROUND(Miles, 0) FROM DelimitedSplit8K(@Airports, ';') a CROSS JOIN DelimitedSplit8K(@Airports, ';') b CROSS APPLY ( SELECT IATA_Code, LatDec, LongDec FROM Airports WHERE IATA_Code = a.Item) c CROSS APPLY ( SELECT IATA_Code, LatDec, LongDec FROM Airports WHERE IATA_Code = b.Item) d OUTER APPLY HaversineDistance(c.LatDec, c.LongDec, d.LatDec, d.LongDec) f) a GROUP BY Origin ORDER BY Origin Origin/Destination BKK CNS HKG LAX POM BKK 0 3745 1048 8255 3581 CNS 3745 0 3461 7192 524 HKG 1048 3461 0 7250 3147 LAX 8255 7192 7250 0 6833 POM 3581 524 3147 6833 0
Since the list of points may vary, we’ll create a stored procedure to generate our table using dynamic SQL.
CREATE PROCEDURE DisplayMileageChart (@Airports VARCHAR(8000) ,@MilesOrKM CHAR(1) = 'M') AS BEGIN DECLARE @SQL NVARCHAR(MAX) SELECT @SQL = 'SELECT [Origin/Destination]=Origin ' + ( SELECT ',[' + Item + ']=MAX(CASE Destination WHEN ''' + Item + ''' THEN Miles END) ' FROM DelimitedSplit8K(@Airports, ';') ORDER BY Item FOR XML PATH('')) + 'FROM ( SELECT Origin=a.Item, Destination=b.Item ,Miles=ROUND(Miles*CASE @MilesOrKM WHEN ''K'' THEN 1.60934 ELSE 1. END, 0) FROM DelimitedSplit8K(@Airports, '';'') a CROSS JOIN DelimitedSplit8K(@Airports, '';'') b CROSS APPLY ( SELECT IATA_Code, LatDec, LongDec FROM Airports WHERE IATA_Code = a.Item) c CROSS APPLY ( SELECT IATA_Code, LatDec, LongDec FROM Airports WHERE IATA_Code = b.Item) d OUTER APPLY HaversineDistance(c.LatDec, c.LongDec, d.LatDec, d.LongDec) f) a GROUP BY Origin ORDER BY Origin' EXEC sp_executesql @SQL ,N'@Airports VARCHAR(8000), @MilesOrKM CHAR(1)' ,@Airports=@Airports ,@MilesOrKM=@MilesOrKM END
You can test this SP using the following to see all PNG airports listed in the mileage table:
DECLARE @Airports VARCHAR(8000) = STUFF(( SELECT ';' + IATA_Code FROM Airports WHERE Country = 'PAPUA NEW GUINEA' ORDER BY IATA_Code FOR XML PATH('')), 1, 1, '') EXEC DisplayMileageChart @Airports
You can also pass the second argument as ‘K’ to display the results in KM. Note how named columns constructed within the dynamic SQL are bracketed ([]) to ensure no errors on airport codes that happen to be SQL reserved keywords (like AND=ANDERSON RGNL in ANDERSEN USA).
Try that query with Country = ‘USA’ and it will run for more than a minute with the 460+ USA airports in the Airports table!
Display Segment Time from Point to Point
Segment times from point to point must be based on actual flights, so the query is similar to the above but slightly different. What you’ll have are NULL values in table entries where there is no flight segment connecting the two points. In cases where multiple flights connect two points, we’ll choose the maximum segment time.
DECLARE @Airports VARCHAR(8000) = 'BKK;HKG;CNS;POM;LAX' SELECT [Origin/Destination]=Origin ,BKK=MAX(CASE Destination WHEN 'BKK' THEN SegmentTime END) ,CNS=MAX(CASE Destination WHEN 'CNS' THEN SegmentTime END) ,HKG=MAX(CASE Destination WHEN 'HKG' THEN SegmentTime END) ,LAX=MAX(CASE Destination WHEN 'LAX' THEN SegmentTime END) ,POM=MAX(CASE Destination WHEN 'POM' THEN SegmentTime END) FROM ( SELECT Origin=a.Item, Destination=b.Item, SegmentTime FROM DelimitedSplit8K(@Airports, ';') a CROSS JOIN DelimitedSplit8K(@Airports, ';') b OUTER APPLY ( SELECT SegmentTime FROM DailyFlightSchedules WHERE Origin = a.Item and Destination = b.Item) c) a GROUP BY Origin ORDER BY Origin
In this case, our resulting table is quite sparse due to lack of available connecting flight segment information.
Origin/Destination BKK CNS HKG LAX POM BKK NULL NULL 205 NULL 505 CNS NULL NULL NULL NULL 105 HKG NULL NULL NULL 930 400 LAX NULL NULL NULL NULL NULL POM NULL 165 380 NULL 380
Let’s once again put this into a stored procedure so we can use dynamic SQL to construct the table from as many points as necessary. Since the NULLs are a bit ugly, we’ll replace them with N/A; while at the same time preserve the integrity of the calculated segment miles.
CREATE PROCEDURE DisplaySegmentTimes (@Airports VARCHAR(8000)) AS BEGIN DECLARE @SQL NVARCHAR(MAX) SELECT @SQL = 'SELECT [Origin/Destination] ' + ( SELECT ',[' + Item + ']=CASE WHEN [' + Item + '] IS NULL THEN ''N/A'' ELSE CAST([' + Item + '] AS VARCHAR(5)) END ' FROM DelimitedSplit8K(@Airports, ';') ORDER BY Item FOR XML PATH('')) + 'FROM (SELECT [Origin/Destination]=Origin ' + ( SELECT ',[' + Item + ']=MAX(CASE Destination WHEN ''' + Item + ''' THEN SegmentTime END) ' FROM DelimitedSplit8K(@Airports, ';') ORDER BY Item FOR XML PATH('')) + ' FROM ( SELECT Origin=a.Item, Destination=b.Item, SegmentTime FROM DelimitedSplit8K(@Airports, '';'') a CROSS JOIN DelimitedSplit8K(@Airports, '';'') b OUTER APPLY ( SELECT SegmentTime FROM DailyFlightSchedules WHERE Origin = a.Item and Destination = b.Item) c) a GROUP BY Origin) a ORDER BY [Origin/Destination]' EXEC sp_executesql @SQL ,N'@Airports VARCHAR(8000)', @Airports=@Airports END
This SP can be tested using all flights within PNG as follows:
DECLARE @Airports VARCHAR(8000) = STUFF(( SELECT ';' + IATA_Code FROM Airports WHERE Country = 'PAPUA NEW GUINEA' ORDER BY IATA_Code FOR XML PATH('')), 1, 1, '') EXEC DisplaySegmentTimes @Airports
Daylight Savings Time Adjustments to Scheduled Flights
There are those that say I have a penchant for understatement, and the simplified approach to this problem is a clear case of that. However, I would be remiss to not at least mention the impact of Daylight Savings Time (DST) on this problem, even though consideration of that was not required for my POC based on the flight schedules of PNG. Let’s consider a few cases and how DST impacts them.
The first case considers only domestic and no international routing. Depending on your country (not all observe DST) you may not need to deal with DST at all. In which case, what you’ve seen to this point should work for you without any related modifications.
If your country has only one time zone (that’s quite a large number of countries) or just a few, but does observe DST and you are dealing with local routing only, there is minimal impact. Since flight times would be expected to adjust with time changes, as long as the flight does not span the time of the time change, there is no impact. Remember most flights don’t operate in the early morning hours when the clocks change and most people are sleeping. However, there are some that do, e.g., red-eye flights in the US. If you only have a few score flights that operate during a time change, it would be possible to use the effective date columns of the DailyFlightSchedules table to handle this, making adjustments as necessary to the arrival time while keeping the segment time fixed.
Once you throw international routing into the fray, it becomes a serious issue because DST is observed in different ways and on different dates in different countries. In my case of PNG, I could more or less ignore it because the only consideration was either arrival time of the international flight in PNG or its departure time from PNG. Any necessary adjustments can be handled using the flight effective dates.
But if you want a truly global picture that automatically adjusts flight arrival and departure times based on when DST causes the clocks to “spring forward” or “fall back,” you’ll have a highly complex issue to deal with. Normally, most countries either observe DST or they do not, and if they do it is usually on the same schedule across all of their time zones. In this case, it would be possible to include a DST table that lists country, time zone (e.g., as UTC offset), DST adjustment amount (usually one hour but possibly not) and effective DATETIME fields that indicate precisely when the clock is changed. This would then need to be integrated as an adjustment into the CalculateSegmentTime routine (for flights that span a clock change) and also into the display routines for departure and arrival dates. So for example when an international departure departs for your country, you would need to adjust the arrival time in your country to account for the new DST in the originating country, assuming that the departure time of that flight did not change.
I mentioned linehaul trucking as another example to which the flight schedules routines could be adapted. Linehaul trucks are significantly more likely to be operating when there is a time change (many long haul truck routes run overnight), so could be more heavily impacted than flight schedules. Of course linehaul trucks may also operate routes that take several days to complete and that is a separate complication.
Further complicating the matter, your company will adjust its server clocks to the time change. While this would not impact any of the routines suggested in this article, it may have other impacts to a system that uses these flight schedules.
You can probably now appreciate why we did not include the details of this in the article.
Conclusions and Final Remarks
Airline flight schedule problems can be quite convoluted and complex. If handled with a reasonably sound design, some of the required results can be quite simple to achieve. The accuracy of the results will always be highly dependent on the quality of the data loaded into the base tables.
Along the way we’ve seen examples of indexing, foreign key constraints, default values and computed columns. We may have learned about the Haversine formula and a little bit about geographical coordinates and how they can be used to compute distances around the globe. Finally, we may also have learned how easy it is to calculate elapsed time across time zones using UTC time zone offsets.
We will not try to convince our readers that we’ve provided the all singing, all dancing solution to problems of this sort. Rather what we’ve attempted to achieve is a reasonable basis or starting point for building such an application. In terms of my POC for PNG, the results should indeed meet these criteria.
In the discussion thread that follows, we look forward to our intrepid readers suggesting refinements and alternative designs. In particular, we hope to improve the quality of our base table information (Airlines and Airports) and we hope to post in the discussion thread additional scripts to improve the data we have so far obtained. You may wish to note that at the end of the Set Up Airports.sql script, there is a MERGE statement used to update the airports of Thailand. This small script could be adapted to use for any country and we’re hoping our readers might wish to contribute source data for their particular countries. The SQLverse would be woefully incomplete without the global contributions made possible only by the SQL Server Central web site.
We hope that our excursion into the world of airline flight schedules has been useful to our valued readers. We also hope that even if you have no comments to post, you’ll sign up for the discussion thread so you receive regular updates on the progress of this interesting application.
-- Dwain Camps, MS SQL Aficionado