This article presents an algorithm to compute set union and set minus in a specific context. Each set is a set of contiguous stretches on a real line.
Problem Description
Union
Consider a scenario where two people are teamed up to contribute to a particular task, say attending to an infant child. As long as any one of them is available to look after the child, it's fine. For other times, they probably would have to look for an alternative arrangement. The time any one person is available can be represented as a set of durations. The time for which at least one of the two persons is available is nothing but the union of the sets of durations during which each one of them is available.
Set Minus
Consider a scenario when two people are looking for a slot wherein both are available for a discussion. This can be thought of a difference between the the available time of one person and the busy time of the other.
Solution Approach
The solution approach makes use of a variable level which keeps track of the starts and ends of durations. After placing all the important time instants on a time line, we scan the time line from left to right.
While computing union, we use the following strategy to set the level during our scan of the timeline:
- Each time any of the duration begins, we increment level by 1.
- Each time any of the durations ends, we decrement level by 1.
As per the above, the stretches of the timeline with level = 0 indicate stretches when none of the two sets is present.
The rest of the timeline will have positive values of level indicating that at least one of the two sets is present there.
In another pass of the timeline, we now collect all the continuous stretches of the timeline where level is not zero. This gives us the set of durations corresponding to the union of the two sets.
While computing set minus, we employ the following strategy for maintaining level.
- The starting point of a duration in the first set and end of a duration in the second set cause level to be incremented by 1.
- The end point of a duration in the first set and start of a duration in the second set cause level to be decremented by 1.
As per the above, the stretches of the timeline with level = 1 indicates stretches in the difference between the first set and the second.
In another pass, we collect the set of contiguous portions of the timeline where level = 1. This is our answer of set minus.
Extensions
The union and setminus functions can be used to implement other useful set operations on the above kind of sets, viz. complement and intersection.
Implementation
Representation of the sets
We represent the sets of durations as lists of pairs. s1 and s2 in the above figure (Timeline) have been represented as follows:
We represent the sets of durations as lists of pairs. s1 and s2 in the above figure (Timeline) have been represented as follows:
s1 = [(1, 5), (8, 10), (15, 20)] s2 = [(2, 4), (6, 12), (16, 21)]
Code Design
When we implement the above two algorithms as two completely separate functions union and setminus, we realise that they have a very common structure. This gives us the motivation to try and capture this commonality between the two functions in a piece of re-usable code. We do so by designing a higher-order function make_function, which is parameterised by another function which captures the specific elements of the two functions. f1 and f2 are the two functions capturing these specificities of union and set minus.
The make_function function prepares a function and returns it as a result. the two functions union and setminus are created by calling make_function with f1 and f2 as arguments respectively. Note that this could have been implemented alternatively by embedding a call to make_function in union and setminus. This is a perfectly good option when union and setminus get used only once. However, considering a scenario where union and setminus may get called several times in a larger program, the strategy used here will lead to a slightly improved speed, not to mention improved readability (arguable).
Code
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 | # utility function for the implementation of union def f1(s1, s2): START = 1 END = -1 l = [] for (s, e) in s1: l.append((s, START)) l.append((e, END)) for (s, e) in s2: l.append((s, START)) l.append((e, END)) return sorted(l, key = lambda (x, y): x) #utility function for implementation of setminus def f2(s1, s2): START1 = 1 END1 = -1 START2 = -1 END2 = 1 l = [] for (s, e) in s1: l.append((s, START1)) l.append((e, END1)) for (s, e) in s2: l.append((s, START2)) l.append((e, END2)) return sorted(l, key = lambda (x, y): x) # Higher order function to create union and setminus functions def make_function(f): # boilerplate function to be returned from make_function after being plugged with f. def g(s1, s2): level = 0 result = [] l = f(s1, s2) for (t, SE) in l: print (t, level) last = level level += SE if level == 1 and last == 0: start = t if level == 0 and last == 1: end = t result.append((start, end)) return result return g union = make_function(f1) setminus = make_function(f2) def t1(): s1 = [(1, 5), (8, 10), (15, 20)] s2 = [(2, 4), (6, 12), (16, 21)] print union(s1, s2) def t2(): s1 = [(1, 5), (8, 10), (15, 20)] s2 = [(2, 4), (6, 12), (16, 21)] print setminus(s1, s2) if __name__ == "__main__": t1() t2() |