I read The Algorithm Design Manual by Steven S Skiena on Apr 11th, 2018

Thanks SurveyMonkey’s Take 4 policy, I took a 4-week sabbatical to recharge after serving the company more than 4 years. During my visit to China, I also spent some time to hone my problem-solving skills. The journey is mainly guided with The Algorithm Design Manual, Manual thereafter, and use the CLRS as a reference.

The Algorithm Design Manual

Introduction to Algorithm Design

Algorithm is designed to solve a problem, thus a clear problem definition acts as the corner stone toward the solution:

Ill-defined problem without clear metrics. For example, the best is always evaluated with a context.
Compound goals lead to scope creeping.
Specialization evolves optimization.

After the solution is proposed, we need to prove its correctness; or seeking an counter-example to prove its incorrectness:

Think small.
Think exhaustively.
Hunt for the weakness.
Go for a tie.
Seek extremes.

Induction and recursion are widely used to build the proof chain.

Algorithm Analysis

The Big Oh notation defines function bound as:

$\Theta-notation$ : asymptotically tight bound
$\Omega-notation$ : asymptotic lower bound
$O-notation$ : asymptotic upper bound

The master theorem: assume $T(n) = a T(n/b) + f(n)$ for $a \ge 1$ and $b > 1$ ,

If $f(n) = O(n^{\log_b{ a - \epsilon }})$ for some constant $\epsilon > 0$ , then $T(n) = \Theta(n^{\log_b{a}})$ .
If $f(n) = \Theta(n^{\log_b{a}})$ , then $T(n) = \Theta(n^{\log_b{a}}\lg{n})$ .
If $f(n) = \Omega(n^{\log_b{a+\epsilon}})$ for some constant $\epsilon > 0$ , and if $a f(n/b) \le cf(n)$ for some $c < 1$ , then $T(n) = \Theta(f(n))$ .

Data Structures

The Manual categorize the data structures into basic data structures such as Array and concepts, such as Dictionaries. The concepts expose interfaces to manipulate the internal states, the implementations MAY vary for different use cases.

Array

Array is continuously-allocated data structure, its simplicity pays back as:

Space efficiency, no pointer overhead.
Memory locality, easily fit into the cache line.
Constant-time for random access, required by some algorithms, such as quicksort.

The amortized time complexity of dynamical array is the same as $O(n)$ as single array.

Hashing

The Manual sees the hashing from the information theory perspective: it extracts features from a high dimensional data, think about PCA.

Binary Search Tree(BST)

BST is an very effective data structure for data retrieval if the height of the tree is $O(\log{n})$ . The performance MAY degrade if the tree is not-balanced. Maintaining the balance in the insertion and deletion operation is not free, there are many balanced BST variances to strike the balance between the perfect balance and operation cost, notably:

AVL tree
red-black tree
splay-tree

Dictionary

It is tricky to pick the most efficient implementation for specialized use cases:

Scalability
Read / Write ratio
Access pattern, uniform, random or skewed, aka hot zone?
Latency vs. throughput.

However, in practice, it is more important to avoid using a bad data structure than to identify the single best option available.

Rule of thumbs:

Hash table is probably the right way to go.
Balanced BST, such as red-black tree, or splay tree.
B-Tree if the data cannot fit into the memory.
Skip lists shine for the relative easier implementation compared to BST.

Priority Queue

We can use the following data structures to implement the priority queue:

Binary heap if we know the upper bound of the maximum number of tasks in advance, this can be mitigated by the dynamic array though.
Binary search tree.
Bounded height priority queue. If we know the range of the task priority, we can use an array of linked lists to put the tasks with the same priority in the same bucket; — pretty much the same idea as the LFU cache.
Fibonacci and pairing heaps. [Citation required]

Set

The Set data structure is alike the Dictionary, but focusing on the set-specific operations, such as:

test whether $u_i \in S_j$ .
union operation
intersect and delete operation

The implementation might be:

Bit vector, using n-bit array to represent any subset of U containing n items. Suitable for fixed universal set.
Dictionary
Bloom filter, tolerate small probability of error.

Sorting and Searching

Sorting is the cornerstone to solve other problems with the price tag $O(n\log{n})$

Searching
Closest pair
Element uniqueness
Frequency distribution
Selection, such as median 75pct tile.
Convex hulls

Most $O(n\log{n})$ complexity sort algorithm are NOT stable:

QuickSort is by far the fastest in-memory sorting algorithm, we can randomize the array to avoid the worst case.
InsertSort might be useful if the array is roughly sorted.
MergeSort requires $n$ extra spaces for array, but suite the linked list very well; on the contrary, the QuickSort does not work for the linked list.
HeapSort is an elegant algorithm out of the slick data structure, heap, aka the priority queue.

The heap compress the binary tree perfectly in the array, and it allows us to insert new elements and maintain the order.

Q: The upper bound of the sorting algorithm is $O(n\log{n})$ , why?

A: Assume there exists no duplicated keys, the sorting algorithm aims to find one and only one ordering in all $n!$ permutations, which is bound by the $O(\log{n!})$ :

$\log{n!} = \sum\limits\_{i=1}^n \log{i} = O(n\log{n})$

Dynamic Programming (DP)

Dynamic Programming, aka DP seeks the global optimization. Unlike the greedy algorithm, which achieve the global optimal by summing up the local optimal; the DP stores the consequences of all possible decision to achieve the global optimal.

The DP is typically implemented as the following by CLRS:

Characterize the structure of an optimal solution.
Recursively define the value of an optimal solution.
Compute the value of an optimal solution, typically in a bottom-up fashion.
Construct an optimal solution from computed information.

In practice, I find the top-down with memoization approach is more intuitive, but the top-down approach requires $O(n)$ extra space to store the intermediate results, and invokes $O(n)$ recursive calls. It MAY cause stack overflow, and incurs more overhead. See my deep dive here.

Backtracking

Backtracking tries to find all(or some) solution by exploring the configuration space with a depth-first search approach. It constructs a tree of partial solutions, then incrementally extends the candidate towards the solution, and abandons(prune) a candidate as soon as the candidate cannot possibly be completed to a valid solution.

The general steps for backtracking are:

Iterate all candidates, make one step of move to integrate the candidate into the solution.
Recursively call the backtracking algorithm until either a solution is found.
Or rollback the last move, and try another candidate.

We should pay attention to the following details:

Explore the symmetry of the problem to reduce the scope.
Prune the branch with more likely to fail more aggressively.

Graph

The graph, $G = (V, E)$ consists a set of vertices V together with a set of E of edges. It is natural to model the real-world relationship with graph, the graph theory also help to explore the solution space implicitly. In practice, it is more important to recognize and categorize a unknown problem to an known graph algorithm

The data structures to store graph are:

Adjacent Matrix: a $n \times n$ matrix for the adjacent relationship.
Adjacency Lists: for each vertex, we store the connected vertices as linked list.

Traversing a Graph

Traversing a graph is particularly interesting as some complicated problems can be modeled as whether there exists a path from the initial state to the expected state.

Breadth-first search, aka BFS takes the following steps to traverse a graph:

Initialize the queue, Q: pick the vertex, v, mark it discovered, and enqueue it to Q.
Dequeue the vertex, v from Q, process the vertex, and enqueue all its adjacent vertex which has not been marked as discovered.
Repeat step 2, until the Q is empty.

Depth-first search, aka DFS takes the following steps:

Given the vertex, v, iterate all its adjacent vertices, x
Visit x, mark it visited, and recursively invoke DFS on x

In practice, DFS is preferred if the ancestor-descendant matters. The BFS MAY require more memory(see backtracking), while the DFS incurs more recursion overhead.

Graph Problems: Polynomial-Time

We study the characteristic of the graph itself to have a better understanding of the network, city traffic. And here are some examples:

Connectivity

Are vertices in the graph are connected? If not, how are they clustered as components?

Mark all vertices with component = 0, initialize i = 1.
Pick one vertex, v with component == 0, mark all its connected vertices with component = i.
Bump the value of i, redo step 2 until there exist no vertex with component == 0.

Does this graph have articulation vertex for single point of failure?

We can brute-force the problem by removing every vertex from the graph, then check the graph’s connectivity. Or we can use the following linear-complexity algorithm.

DFS traverse the graph to construct the DFS tree.
If the root has more than two children, the root is an articulation vertex.
If the vertex u has a child v with no back edge from v’s subtree, u is an articulation vertex.

To validate the case 2, we keep the access counter to track the ancestor(lower) and descendant(higher) relationship in the pre-order traverse, uses an extra low to track the lowest access counter reachable from v with back edge in the post-order traverse. Thus the case 2 is interpreted as v.low >= u.access_counter. The pre-order and post-order traverse can be combined as one in the DFS.

Minimum Spanning Trees

The minimum spanning trees connected all vertices in most inexpensive way, used in the network routing. The Prim’s algorithm is greedy:

Pick an arbitrary vertex, v as the start point of the tree.
Pick the shortest edge between tree and non-tree vertex
Add the selected edge and vertex to the tree. Repeat step 2 until all vertices are included in the tree.

So it is with Kruskal’s algorithms:

Sort the edge in the ascending order, pick $n - 1$ edges
For each edge, if the edge does not belong to the same component, add to the tree, and merge the components Shortest Paths

Shortest Path

If there exists more than one path from point A to point B, what is the shortest path? The Dijkstra’s shortest path algorithm works in the dynamic programming fashion:

if the shortest path from s to t via intermediate vertex x, the path from s to x MUST be shortest.
if we get the shortest path from s to ALL other vertices, clearly we also get the shortest path from s to t.

Here is the algorithm:

For any non-visited vertex x, there exists a visited vertex, v that can minimize the dist[x], then we update dist[x] = min(dist[x], dist[v] + w[v, x])
Add the vertex x to the known vertices.
Repeat the above steps until we reach t.

Graph Problems: Hard Problems

These problems look intuitive, because the human being digest the graph in a different approach, but they are NP-complete hard. We may deal them with combinatorial search, heuristics, approximation algorithms, or algorithms for restricted instances.

Clique
Independent Set
Vertex Cover
Traveling Salesman Problem
Hamiltonian Cycle
Graph Partition
Vertex Coloring
Edge Coloring
Graph Isomorphism
Steiner Tree
Feedback Edge / Vertex Set

Numerical Problems

Random Number Generation

We should avoid messing up the random number generation in general. A pseudo-random algorithm is linear congruential generator:

$R*n = (\alpha R*{n-1} + c) \mod m$

Combinatorial Problems

The combinatorial problems deal with the combinatorial objects, such as permutations, partitions, subsets, calendars, and schedules.

Sorting and searching
Median and selection
Permutations
Subsets
Partitions
Generating graphs
Calendrical calculations
Job scheduling
Satisfiability

Median and Selection

The sorting and searching has been discussed above. The median and selection problem can be simplified by sorting the unordered sequence if $O(n\log{n})$ is accepted. Otherwise, we can apply the similar algorithm as quicksort:

Pick a pivot, and partition the sequence into two: elements smaller than pivot, and elements larger or equal to the pivot.
Pick the partition which contains the median, and repeat the above step until the pivot is the median.

The complexity of the above algorithm is $O(n)$ .

For the data stream, it is impossible to store the massive data to find the median. We can sample the data by flipping a coin, and likely the sample will be close to the underlying data.

Set and String Problems

We will discuss Suffix tree in the future.

Computational Geometry

These problems arose from the computer graphics, CAD / CAM, such as:

Convex Hull
Triangulation
Range Search
Intersection Detection

Probability Programming

TBD