Activity selection problem

The activity selection problem is a combinatorial optimization problem concerning the selection of non-conflicting activities to perform within a given time frame, given a set of activities each marked by a start time (si) and finish time (fi). The problem is to select the maximum number of activities that can be performed by a single person or machine, assuming that a person can only work on a single activity at a time. The activity selection problem is also known as the Interval scheduling maximization problem (ISMP), which is a special type of the more general Interval Scheduling problem.

A classic application of this problem is in scheduling a room for multiple competing events, each having its own time requirements (start and end time), and many more arise within the framework of operations research.

Formal definition
Assume there exist n activities with each of them being represented by a start time si and finish time fi. Two activities i and j are said to be non-conflicting if si ≥ fj or sj ≥ fi. The activity selection problem consists in finding the maximal solution set (S) of non-conflicting activities, or more precisely there must exist no solution set S' such that |S'| > |S| in the case that multiple maximal solutions have equal sizes.

Optimal solution
The activity selection problem is notable in that using a greedy algorithm to find a solution will always result in an optimal solution. A pseudocode sketch of the iterative version of the algorithm and a proof of the optimality of its result are included below.

Explanation
Line 1: This   algorithm is called Greedy-Iterative-Activity-Selector, because it is first of all a greedy algorithm, and then it is iterative. There's also a recursive version of this greedy algorithm.
 * $$A$$ is an array containing the activities.
 * $$s$$ is an array containing the start times of the activities in $$A$$.
 * $$f$$ is an array containing the finish times of the activities in $$A$$.

Note that these arrays are indexed starting from 1 up to the length of the corresponding array.

Line 3: Sorts in increasing order of finish times the array of activities $$A$$ by using the finish times stored in the array $$f$$. This operation can be done in $$O(n \cdot \log n)$$ time, using for example merge sort, heap sort, or quick sort algorithms.

Line 4: Creates a set $$S$$ to store the selected activities, and initialises it with the activity $$A[1]$$ that has the earliest finish time.

Line 5: Creates a variable $$k$$ that keeps track of the index of the last selected activity.

Line 9: Starts iterating from the second element of that array $$A$$ up to its last element.

Lines 10,11: If the start time $$s[i]$$ of the $$ith$$ activity ($$A[i]$$) is greater or equal to the finish time $$f[k]$$ of the last selected activity ($$A[k]$$), then $$A[i]$$ is compatible to the selected activities in the set $$S$$, and thus it can be added to $$S$$.

Line 12: The index of the last selected activity is updated to the just added activity $$A[i]$$.

Proof of optimality
Let $$S = \{1, 2, \ldots, n\}$$ be the set of activities ordered by finish time. Assume that $$A\subseteq S$$ is an optimal solution, also ordered by finish time; and that the index of the first activity in A is $$k\neq 1$$, i.e., this optimal solution does not start with the greedy choice. We will show that $$B = (A \setminus \{k\}) \cup \{1\}$$, which begins with the greedy choice (activity 1), is another optimal solution. Since $$f_1 \leq f_k$$, and the activities in A are disjoint by definition, the activities in B are also disjoint. Since B has the same number of activities as A, that is, $$|A| = |B|$$, B is also optimal.

Once the greedy choice is made, the problem reduces to finding an optimal solution for the subproblem. If A is an optimal solution to the original problem S containing the greedy choice, then $$A^\prime = A \setminus \{1\}$$ is an optimal solution to the activity-selection problem $$S' = \{i \in S: s_i \geq f_1\}$$.

Why? If this were not the case, pick a solution B′ to S′ with more activities than A′ containing the greedy choice for S′. Then, adding 1 to B′ would yield a feasible solution B to S with more activities than A, contradicting the optimality.

Weighted activity selection problem
The generalized version of the activity selection problem involves selecting an optimal set of non-overlapping activities such that the total weight is maximized. Unlike the unweighted version, there is no greedy solution to the weighted activity selection problem. However, a dynamic programming solution can readily be formed using the following approach:

Consider an optimal solution containing activity $k$. We now have non-overlapping activities on the left and right of $k$. We can recursively find solutions for these two sets because of optimal sub-structure. As we don't know $k$, we can try each of the activities. This approach leads to an $$O(n^3)$$ solution. This can be optimized further considering that for each set of activities in $$(i, j)$$, we can find the optimal solution if we had known the solution for $$(i, t)$$, where $t$ is the last non-overlapping interval with $j$ in $$(i, j)$$. This yields an $$O(n^2)$$ solution. This can be further optimized considering the fact that we do not need to consider all ranges $$(i, j)$$ but instead just $$(1, j)$$. The following algorithm thus yields an $$O(n \log n)$$ solution: