# I read*The Algorithm Design Manual*by*Steven S Skiena*on*Apr 11, 2018*

★★★★☆Thanks SurveyMonkey’s *Take 4* policy, I took a 4-week sabbatical to recharge
after serving the company more than 4 years. During my visit to China, I also
spent some time to hone my problem-solving skills. The journey is mainly guided
with The Algorithm Design Manual, *Manual* thereafter, and use the
CLRS as a reference.

## Introduction to Algorithm Design

Algorithm is designed to solve a problem, thus a clear problem definition acts as the corner stone toward the solution:

- Ill-defined problem without clear metrics. For example, the
`best`

is always evaluated with a context. - Compound goals lead to scope creeping.
- Specialization evolves optimization.

After the solution is proposed, we need to prove its correctness; or seeking an
*counter-example* to prove its incorrectness:

- Think small.
- Think exhaustively.
- Hunt for the weakness.
- Go for a tie.
- Seek extremes.

Induction and recursion are widely used to build the proof chain.

## Algorithm Analysis

The *Big Oh notation* defines function bound as:

- $\Theta-notation$: asymptotically
**tight**bound - $\Omega-notation$: asymptotic
**lower**bound - $O-notation$: asymptotic
**upper**bound

The master theorem: assume $T(n) = a T(n/b) + f(n)$ for $a \ge 1$ and $b > 1$,

- If $f(n) = O(n^{\log_b{ a - \epsilon }})$ for some constant $\epsilon > 0$, then $T(n) = \Theta(n^{\log_b{a}})$.
- If $f(n) = \Theta(n^{\log_b{a}})$, then $T(n) = \Theta(n^{\log_b{a}}\lg{n})$.
- If $f(n) = \Omega(n^{\log_b{a+\epsilon}})$ for some constant $\epsilon > 0$, and if $a f(n/b) \le cf(n)$ for some $c < 1$, then $T(n) = \Theta(f(n))$.

## Data Structures

The Manual categorize the data structures into basic data structures such as
*Array* and *concepts*, such as *Dictionaries*. The concepts expose interfaces
to manipulate the internal states, the implementations MAY vary for different
use cases.

### Array

Array is continuously-allocated data structure, its simplicity pays back as:

- Space efficiency, no pointer overhead.
- Memory locality, easily fit into the cache line.
- Constant-time for random access, required by some algorithms, such as quicksort.

The amortized time complexity of dynamical array is the same as $O(n)$ as single array.

### Hashing

The Manual sees the hashing from the information theory perspective: it extracts features from a high dimensional data, think about PCA.

### Binary Search Tree(BST)

BST is an very effective data structure for data retrieval if the *height* of
the tree is $O(\log{n})$. The performance MAY degrade if the tree is
not-balanced. Maintaining the balance in the insertion and deletion operation is
not free, there are many balanced BST variances to strike the balance between
the perfect balance and operation cost, notably:

- AVL tree
- red-black tree
- splay-tree

### Dictionary

It is tricky to pick the most efficient implementation for specialized use cases:

- Scalability
- Read / Write ratio
- Access pattern, uniform, random or skewed, aka hot zone?
- Latency vs. throughput.

However, in practice, it is more important to avoid using a bad data structure than to identify the single best option available.

Rule of thumbs:

*Hash table*is probably the right way to go.*Balanced BST*, such as*red-black tree*, or*splay tree*.*B-Tree*if the data cannot fit into the memory.*Skip lists*shine for the relative easier implementation compared to BST.

### Priority Queue

We can use the following data structures to implement the priority queue:

*Binary heap*if we know the upper bound of the maximum number of tasks in advance, this can be mitigated by the dynamic array though.*Binary search tree*.*Bounded height priority queue*. If we know the range of the task priority, we can use an array of linked lists to put the tasks with the same priority in the same bucket; — pretty much the same idea as the LFU cache.*Fibonacci and pairing heaps*. [Citation required]

### Set

The *Set* data structure is alike the Dictionary, but focusing on the
set-specific operations, such as:

- test whether $u_i \in S_j$.
- union operation
- intersect and delete operation

The implementation might be:

*Bit vector*, using*n-bit*array to represent any subset of*U*containing*n*items. Suitable for fixed universal set.*Dictionary**Bloom filter*, tolerate small probability of error.

## Sorting and Searching

Sorting is the cornerstone to solve other problems with the price tag $O(n\log{n})$

- Searching
- Closest pair
- Element uniqueness
- Frequency distribution
- Selection, such as median 75pct tile.
- Convex hulls

Most $O(n\log{n})$ complexity sort algorithm are **NOT** *stable*:

- QuickSort is by far the fastest in-memory sorting algorithm, we can randomize the array to avoid the worst case.
- InsertSort might be useful if the array is roughly sorted.
- MergeSort requires $n$ extra spaces for array, but suite the linked list very well; on the contrary, the QuickSort does not work for the linked list.
- HeapSort is an elegant algorithm out of the slick data structure,
*heap*, aka the priority queue.

The heap compress the binary tree perfectly in the array, and it allows us to insert new elements and maintain the order.

**Q**: The upper bound of the sorting algorithm is $O(n\log{n})$, why?

**A**: Assume there exists no duplicated keys, the sorting algorithm aims to
find one and only one ordering in all $n!$ permutations, which is bound by the
$O(\log{n!})$:

$\log{n!} = \sum\limits_{i=1}^n \log{i} = O(n\log{n})$

## Dynamic Programming (DP)

Dynamic Programming, aka DP seeks the global optimization. Unlike the greedy
algorithm, which achieve the global optimal by summing up the local optimal; the
DP stores the *consequences of all possible decision* to achieve the global
optimal.

The DP is typically implemented as the following by CLRS:

- Characterize the structure of an optimal solution.
- Recursively define the value of an optimal solution.
- Compute the value of an optimal solution, typically in a bottom-up fashion.
- Construct an optimal solution from computed information.

In practice, I find the **top-down with memoization** approach is more
intuitive, but the top-down approach requires $O(n)$ extra space to store the
intermediate results, and invokes $O(n)$ recursive calls. It MAY cause stack
overflow, and incurs more overhead. See my deep dive
here.

## Backtracking

Backtracking tries to find all(or some) solution by exploring the configuration
space with a depth-first search approach. It constructs a tree of *partial*
solutions, then *incrementally extends* the candidate towards the solution, and
abandons(*prune*) a candidate as soon as the candidate cannot possibly be
completed to a valid solution.

The general steps for backtracking are:

- Iterate all candidates, make one step of move to integrate the candidate into the solution.
- Recursively call the backtracking algorithm until either a solution is found.
- Or rollback the last move, and try another candidate.

We should pay attention to the following details:

- Explore the symmetry of the problem to reduce the scope.
- Prune the branch with more likely to fail more aggressively.

## Graph

The graph, $G = (V, E)$ consists a set of vertices *V* together with a set of
*E* of edges. It is natural to model the real-world relationship with graph, the
graph theory also help to explore the solution space implicitly. In practice, it
is more important to recognize and categorize a unknown problem to an *known
graph algorithm*

The data structures to store graph are:

- Adjacent Matrix: a $n \times n$ matrix for the adjacent relationship.
- Adjacency Lists: for each vertex, we store the connected vertices as linked list.

### Traversing a Graph

Traversing a graph is particularly interesting as some complicated problems can be modeled as whether there exists a path from the initial state to the expected state.

Breadth-first search, aka BFS takes the following steps to traverse a graph:

- Initialize the queue,
*Q*: pick the vertex,*v*, mark it*discovered*, and enqueue it to Q. - Dequeue the vertex,
*v*from Q, process the vertex, and enqueue all its adjacent vertex which has not been marked as*discovered*. - Repeat step 2, until the Q is empty.

Depth-first search, aka DFS takes the following steps:

- Given the vertex,
*v*, iterate all its adjacent vertices,*x* - Visit
*x*, mark it*visited*, and recursively invoke DFS on*x*

In practice, DFS is preferred if the ancestor-descendant matters. The BFS MAY require more memory(see backtracking), while the DFS incurs more recursion overhead.

### Graph Problems: Polynomial-Time

We study the characteristic of the graph itself to have a better understanding of the network, city traffic. And here are some examples:

#### Connectivity

Are vertices in the graph are connected? If not, how are they clustered as
*components*?

- Mark all vertices with
`component = 0`

, initialize`i = 1`

. - Pick one vertex,
*v*with`component == 0`

, mark all its connected vertices with`component = i`

. - Bump the value of
`i`

, redo step 2 until there exist no vertex with`component == 0`

.

Does this graph have *articulation vertex* for single point of failure?

We can brute-force the problem by removing every vertex from the graph, then check the graph’s connectivity. Or we can use the following linear-complexity algorithm.

- DFS traverse the graph to construct the DFS tree.
- If the root has more than two children, the root is an articulation vertex.
- If the vertex
*u*has a child*v*with no back edge from*v*’s subtree, u is an articulation vertex.

To validate the case 2, we keep the access counter to track the ancestor(lower)
and descendant(higher) relationship in the pre-order traverse, uses an extra
*low* to track the lowest access counter reachable from *v* with back edge in
the post-order traverse. Thus the case 2 is interpreted as
`v.low >= u.access_counter`

. The pre-order and post-order traverse can be
combined as one in the DFS.

#### Minimum Spanning Trees

The minimum spanning trees connected all vertices in most inexpensive way, used in the network routing. The Prim’s algorithm is greedy:

- Pick an arbitrary vertex,
*v*as the start point of the tree. - Pick the shortest edge between tree and non-tree vertex
- Add the selected edge and vertex to the tree. Repeat step 2 until all vertices are included in the tree.

So it is with Kruskal’s algorithms:

- Sort the edge in the ascending order, pick $n - 1$ edges
- For each edge, if the edge does not belong to the same component, add to the tree, and merge the components Shortest Paths

#### Shortest Path

If there exists more than one path from point *A* to point *B*, what is the
shortest path? The Dijkstra’s shortest path algorithm works in the dynamic
programming fashion:

- if the shortest path from
`s`

to`t`

via intermediate vertex`x`

, the path from`s`

to`x`

MUST be shortest. - if we get the shortest path from
`s`

to*ALL*other vertices, clearly we also get the shortest path from`s`

to`t`

.

Here is the algorithm:

- For any non-visited vertex
`x`

, there exists a visited vertex,`v`

that can minimize the`dist[x]`

, then we update`dist[x] = min(dist[x], dist[v] + w[v, x])`

- Add the vertex
`x`

to the known vertices. - Repeat the above steps until we reach
`t`

.

### Graph Problems: Hard Problems

These problems look intuitive, because the human being digest the graph in a different approach, but they are NP-complete hard. We may deal them with combinatorial search, heuristics, approximation algorithms, or algorithms for restricted instances.

- Clique
- Independent Set
- Vertex Cover
- Traveling Salesman Problem
- Hamiltonian Cycle
- Graph Partition
- Vertex Coloring
- Edge Coloring
- Graph Isomorphism
- Steiner Tree
- Feedback Edge / Vertex Set

## Numerical Problems

### Random Number Generation

We should avoid messing up the random number generation in general. A
pseudo-random algorithm is *linear congruential generator*:

$R_n = (\alpha R_{n-1} + c) \mod m$

## Combinatorial Problems

The combinatorial problems deal with the combinatorial objects, such as permutations, partitions, subsets, calendars, and schedules.

- Sorting and searching
- Median and selection
- Permutations
- Subsets
- Partitions
- Generating graphs
- Calendrical calculations
- Job scheduling
- Satisfiability

### Median and Selection

The sorting and searching has been discussed above. The median and selection problem can be simplified by sorting the unordered sequence if $O(n\log{n})$ is accepted. Otherwise, we can apply the similar algorithm as quicksort:

- Pick a pivot, and partition the sequence into two: elements smaller than pivot, and elements larger or equal to the pivot.
- Pick the partition which contains the median, and repeat the above step until the pivot is the median.

The complexity of the above algorithm is $O(n)$.

For the data stream, it is impossible to store the massive data to find the median. We can sample the data by flipping a coin, and likely the sample will be close to the underlying data.

## Set and String Problems

We will discuss Suffix tree in the future.

## Computational Geometry

These problems arose from the computer graphics, CAD / CAM, such as:

- Convex Hull
- Triangulation
- Range Search
- Intersection Detection

## Probability Programming

TBD