I stumbled across the "findin flight legs" case study. However I could not fin the web page, so here is a copy from the wayback machine..
Original
Jonathan Gennick
Writer * Book Editor * Oracle DBA * SQL & PL/SQL Developer
Father * Husband * Son * Mountain Biker * EMT
Finding Flight Legs
In response to a call for interesting applications of union queries, Nuno Souto sent me a query that applied INTERSECT to the results of two CONNECT BY queries as part of a solution to determine possible flight paths from one point to another. I was attracted to Nuno's solution, because I'd recently written some articles on CONNECT BY, and Nuno's use of it didn't fit the canonical parts-explosion pattern so often used in examples of recursive queries.
The Background
Imagine with me that you're in the package transportation business. Your company manages a large number of planes that fly a variety of flights paths between airports. A single plane will stop at one or more destinations. For example, on a given day you might have:- Plane QF1 that flies from A to B, and then on to Z.
- Plane QF2 that flies from A to Z directly, and then stops.
- Plane QF3 that flies from D to E to F, and finally to Z.
create table flegs ( flt varchar2(10), /* flight name */ loc_st varchar2(10), /* leg start */ loc_en varchar2(10), /* leg end */ tim_st date, /* departure time */ tim_en date /* arrival time */ );Each flight is a trip made by a single plane consisting of one or more legs. Each leg has starting and ending location, and also a starting (departure) and ending (arrival) time. Of course in a real system you'd have additional columns, primary keys, and foreign keys to tables with other information. To make the technique described in this article easy to understand, Nuno and I have simplified the example to its essential elements.
The Problem
Here's the problem. A customer has a package that you need to ship from A to Z on July 30, 2003. You won't receive the package from your customer until 0600 (6:00 AM), and you must have the package at its destination by 1500 (3:00PM). You need to find the possible flight solutions for travel from A to Z in that time frame. This isn't as simple as looking for flights that begin at A and end at Z. You must also find flight solutions that begin at A and that stop at one or more other destinations on their way to Z. Notice I said "flight solutions". You must consider the possibility of switching planes (i.e. switching flights) midstream. To solve this problem, Nuno applied the following approach:- Find all flights leaving point A, and all subsequent legs of those flights, during the specified time period.
- Find all flights arriving at point Z, and all preceding legs of those flights, also during the specified time period.
- Take the intersection of #1 and #2.
Where Can We Go From Point A?
Consider the flights departing from point A. Getting the first leg of each flight departing from point A is easy; you can simply check the loc_st column:select flt,loc_st,loc_en,tim_st,tim_en from flegs where tim_st between to_date('20030730 060000','yyyymmdd hh24miss') and to_date('20030730 150000','yyyymmdd hh24miss') and loc_st = 'A';But this isn't good enough. You need more than just the first leg of each flight leaving point A, you need *every* leg. You need every leg so you can determine whether any of those flights eventually arrive at point Z. Nuno correctly recognized this as a recursive SQL problem:
select flt,loc_st,loc_en,tim_st,tim_en from flegs where tim_st between to_date('20030730 060000','yyyymmdd hh24miss') and to_date('20030730 150000','yyyymmdd hh24miss') connect by prior loc_en = loc_st and prior tim_en < tim_st start with loc_st = 'A' and tim_st >= to_date('20030730 060000','yyyymmdd hh24miss');The START WITH clause specifies A as the beginning point. The CONNECT BY clause links the beginning of each subsequent leg with the ending of the previous. Legs are linked based on both location and time. We're looking for flights into, and then out of, a location, and departing flights must always occur later in time than arriving flights. The results are as follows:
FLT LOC_ST LOC_EN TIM_ST TIM_EN ---------- ---------- ---------- --------------- --------------- QF1 A B 20030730 070000 20030730 100000 QF1 B Z 20030730 110000 20030730 120000 QF2 A Z 20030730 080000 20030730 120000 QF5 A B 20030730 070000 20030730 090000 QF1 B Z 20030730 110000 20030730 120000 QF3 B D 20030730 100000 20030730 110000 QF3 D E 20030730 120000 20030730 130000 QF3 E F 20030730 140000 20030730 150000 QF6 A G 20030730 080000 20030730 090000 QF6 G H 20030730 100000 20030730 110000We now know where we can go beginning from point A between 0600 and 1500 on 30-Jul-2003. Not all flights from A will get us to Z. QF6, for example, gets us to H, which is not our destination. I'll explain how Nuno winnowed these results down shortly. It's interesting to notice that one path to Z involves switching flights. The results show the following solution:
- Depart point A on QF5 at 0700, arriving into point B at 0900.
- Depart point B on QF1 at 1100, arriving into point Z at 1200.
"From Where Can We Arrive At Point Z?"
Nuno's next step was to look at all flights into point Z, also during the specified time period. The query is much the same as in the previous section, except it now recursively traverses the tree of flight legs in the reverse direction, working its way backwards in time:select flt,loc_st,loc_en,tim_st,tim_en from flegs where tim_en between to_date('20030730 060000','yyyymmdd hh24miss') and to_date('20030730 150000','yyyymmdd hh24miss') connect by prior loc_st = loc_en and prior tim_st > tim_en start with loc_en = 'Z' and tim_en <= to_date('20030730 150000','yyyymmdd hh24miss');Following are the results:
FLT LOC_ST LOC_EN TIM_ST TIM_EN ---------- ---------- ---------- --------------- --------------- QF1 B Z 20030730 110000 20030730 120000 QF1 A B 20030730 070000 20030730 100000 QF5 A B 20030730 070000 20030730 090000 QF2 A Z 20030730 080000 20030730 120000 QF7 O Z 20030730 140000 20030730 150000 QF7 M O 20030730 120000 20030730 130000Again, not all flights arriving into Z pass through A on their way to Z. QF7 is a notable example, beginning at M and passing through O on the way to Z.
Finding Legs Common to Both A and Z
CONNECT BY queries have to start someplace. One of the things I found insightful about Nuno's solution is that he realized the need for two queries: one to fan out and follow flights radiating outwards from point A, and the other to fan out in reverse from point Z, tracing backwards the paths of arriving flights. Any chain of flight legs returned in common by these two queries represents a solution to the problem of getting from point A to point Z. For example, both queries returned legs from flight QF1, albeit in slightly different order:Query radiating outwards from A: QF1 A B 20030730 070000 20030730 100000 QF1 B Z 20030730 110000 20030730 120000 Query radiating inwards towards Z: QF1 B Z 20030730 110000 20030730 120000 QF1 A B 20030730 070000 20030730 100000You may need to think about this a bit, and perhaps run some of these queries yourself using the test data at the end of this article, but you should realize that the only way these two legs can possibly appear in both result sets is if they represent a solution for flying from A to Z during the specified time frame. Thus, the following INTERSECT query can be used to return all such flight legs:
select flt,loc_st,loc_en,tim_st,tim_en from flegs where tim_st between to_date('20030730 060000','yyyymmdd hh24miss') and to_date('20030730 150000','yyyymmdd hh24miss') connect by prior loc_en = loc_st and prior tim_en < tim_st start with loc_st = 'A' and tim_st >= to_date('20030730 060000','yyyymmdd hh24miss') INTERSECT select flt,loc_st,loc_en,tim_st,tim_en from flegs where tim_en between to_date('20030730 060000','yyyymmdd hh24miss') and to_date('20030730 150000','yyyymmdd hh24miss') connect by prior loc_st = loc_en and prior tim_st > tim_en start with loc_en = 'Z' and tim_en <= to_date('20030730 150000','yyyymmdd hh24miss');For any set of flight legs *not* representing a solution for travelling from A to Z, there are only three possibilities:
- The set of flight legs originates at A and does not pass through Z.
- The set of flight legs terminates at Z, but does not pass through A on the way to Z.
- The set of flight legs does not involve A or Z at all.
FLT LOC_ST LOC_EN TIM_ST TIM_EN ---------- ---------- ---------- --------------- --------------- QF1 A B 20030730 070000 20030730 100000 QF1 B Z 20030730 110000 20030730 120000 QF2 A Z 20030730 080000 20030730 120000 QF5 A B 20030730 070000 20030730 090000These results show three solutions for getting from A to Z:
- Fly QF1 from A to B to Z.
- Fly QF2 directly from A to Z.
- Fly QF5 from A to B, and then fly QF1 from B to Z.
What About the Time Period?
When I first examined Nuno's queries, I was a bit puzzled over the following parts of his WHERE clauses:Query radiating outwards from A: where tim_st between to_date('20030730 060000','yyyymmdd hh24miss') and to_date('20030730 150000','yyyymmdd hh24miss') Query radiating inwards towards Z: where tim_en between to_date('20030730 060000','yyyymmdd hh24miss') and to_date('20030730 150000','yyyymmdd hh24miss')These predicates are not strictly necessary for proper operation of the query. They are an optimization. Given appropriate indexes on the tim_st and tim_en columns, these predicates enable the database to eliminate many rows from consideration before getting to the INTERSECT operation. So how does the query constrain solutions to the given time period? The key lies in the START WITH clauses:
The first query starts with the beginning time: start with loc_st = 'A' and tim_st >= to_date('20030730 060000','yyyymmdd hh24miss') The second query starts with the ending time: start with loc_en = 'Z' and tim_en <= to_date('20030730 150000','yyyymmdd hh24miss');It's possible for a chain of flight legs to leave A during the specified time period, yet reach Z after the specified ending time. Those legs will be picked up by the first query, but not by the second. Likewise, it's possible for a chain of flight legs to arrive at Z during the specified time period, but depart from A too early. Such legs will be picked up by the second query, but not the first. Thus, spurious legs can appear in the result sets of each separate subquery, but only *solutions* will appear in both subqueries, and thus only solutions will be returned by the INTERSECT query. There's also the two time conditions in the CONNECT BY clauses of the subqueries:
and prior tim_en < tim_st and: and prior tim_st > tim_enThese conditions ensure that the queries link legs in a rational time sequence. It does no good, for example, to fly into B only to find that your next flight left before you arrived! These conditions also prevent what Nuno refers to as the "loop" problem: a series of legs might come back to the origin of a flight and then take off again in another direction, still under the same overall flight "name". Without a way of "serializing" the time of each leg, a query encountering such looping legs would fail with a "loop detected" error message.
Applications of Recursive SQL
A CONNECT BY query really represents a recursive SQL statement. If you're not familiar with the concept of recursive SQL, you might want to read my July article on the topic: Is Yours A UNION Shop? Recursive SQL, and especially CONNECT BY, is often presented and explained in the context of containership. You might have a bill-of-materials listing the parts making up a car, with each car part, such as an engine, being made up of other parts, such as pistons and rings, and each of those parts in turn being composed of yet more parts, and so forth on down to the smallest and simplest of parts such as nuts and bolts and springs. Nuno's solution demonstrates that CONNECT BY queries have wider application than the canonical bill-of-materials example. In the case flight legs, CONNECT BY linked each leg with the next on the path from one point to another. The recursion had nothing to do with containership, but rather had to do with a series: here's a leg, I have to find the next, but now I have another leg, so I have to find the next, but now I have yet another leg, so I have to find the next. Nuno's solution is creative and insightful. I enjoyed tearing it apart and examining in detail how all the pieces worked together towards the goal of finding options for flying from point A to point Z. Frankly, it's opened my eyes to a whole new class of problems that can be solved using CONNECT BY. I thank him for letting me write about it, and I encourage you to experiment with it for yourself.Acknowledgments
Many thanks to Nuno Souto for the very interesting SQL solution described in this article. Nuno came up with it. All I did was write about it.Example Data for the FLEGS Table
You can use the following INSERT statements to populate the FLEGS table created in the article. The data given here is the same data as used in this article's examples.insert into flegs values('QF1','A','B', to_date('20030730070000','yyyymmddhh24miss'), to_date('20030730100000','yyyymmddhh24miss')); insert into flegs values('QF1','B','Z', to_date('20030730110000','yyyymmddhh24miss'), to_date('20030730120000','yyyymmddhh24miss')); insert into flegs values('QF2','A','Z', to_date('20030730080000','yyyymmddhh24miss'), to_date('20030730120000','yyyymmddhh24miss')); insert into flegs values('QF3','B','D', to_date('20030730100000','yyyymmddhh24miss'), to_date('20030730110000','yyyymmddhh24miss')); insert into flegs values('QF3','D','E', to_date('20030730120000','yyyymmddhh24miss'), to_date('20030730130000','yyyymmddhh24miss')); insert into flegs values('QF3','E','F', to_date('20030730140000','yyyymmddhh24miss'), to_date('20030730150000','yyyymmddhh24miss')); insert into flegs values('QF4','F','Z', to_date('20030730160000','yyyymmddhh24miss'), to_date('20030730170000','yyyymmddhh24miss')); insert into flegs values('QF5','A','B', to_date('20030730070000','yyyymmddhh24miss'), to_date('20030730090000','yyyymmddhh24miss')); insert into flegs values('QF6','A','G', to_date('20030730080000','yyyymmddhh24miss'), to_date('20030730090000','yyyymmddhh24miss')); insert into flegs values('QF6','G','H', to_date('20030730100000','yyyymmddhh24miss'), to_date('20030730110000','yyyymmddhh24miss')); insert into flegs values('QF7','M','O', to_date('20030730120000','yyyymmddhh24miss'), to_date('20030730130000','yyyymmddhh24miss')); insert into flegs values('QF7','O','Z', to_date('20030730140000','yyyymmddhh24miss'), to_date('20030730150000','yyyymmddhh24miss')); COMMIT;