# DFS, BFS and Topological Sort

algorithminterviewIn this post, we extend the discussion of graph traverse algorithms:
breadth-first search, aka *bfs*; and depth-first search, aka *dfs*. They try to
solve the problem from different angles, more intuitively:

- bfs circulates the neighborhood until our goal is met, we MAY also find the shortest path with DP, see Dijkstra’s shortest path algorithm.
- dfs picks one direction in every crossing until we hits the wall, with
appropriate state push / pop, we can backtracking
*ALL*possible solution.

Either way, we build the adjacent list first using `collections.defaultdict`

:

```
from collections import defaultdict
graph = defaultdict(list)
for start, end in directed_edges:
graph[start].append(end)
# For non-directed edges:
# graph[end].append(start)
```

It is worthy noting that there exist three states for each vertex:

*not visited*: the vertex is unknown in the current state.*visiting*: the vertex is*being visited*, but its dependencies are not handled.*visited*: all done.

dfs is a typical *post-order traversal*: the node `v`

is marked as `visiting`

at
first encounter, and set as `visited`

**only if** all its successors are
visited. Thus, we can use the dfs to detect the cycle.

```
def dfs(graph):
visited = defaultdict(int) # 0: not visited, -1: visiting, 1: visited.
return all(do_dfs(graph, visited, v) for v in graph)
def do_dfs(graph, visited, v):
if visited[v] == -1:
return False # cycle detected
if visited[v] == 1:
return True # visited already
visited[v] = -1
if all(do_dfs(graph, visited, n) for n in graph.get(v, []):
visited[v] = 1
return True
return False
```

Note we use `graph.get(v, [])`

during the traversal, as `graph[v]`

may mutate
the `graph`

object if `v`

is not in `graph`

. Also if the graph is not
fully-connected, we may also need to track how many vertices has been visited.

We can apply the same state transition in bfs, aka the three-color encoding in CLRS P594:

```
def bfs(graph):
visited = defaultdict(int) # 0: not visited, -1: visiting, 1: visited.
for v in graph:
if visited[v] == 0:
do_bfs(graph, visited, v)
def do_bfs(graph, visited, v):
todo = [v]
visited[v] = -1 # visiting
while todo:
vertex = todo.pop(0)
for c in graph.get(vertex. []):
if visited[c] == 0:
visited[c] = -1
todo.append(c)
visited[vertex] = 1
```

The intermediate `visiting`

state does not help the cycle detection, thus we can
simplify the state by visiting the vertex’s children immediately after they are
enqueued:

```
def bfs(graph):
visited = set()
for v in graph:
if v not in visited:
do_bfs(graph, visited, v)
def do_bfs(graph, visited, v):
todo = [v]
visited.add(v) # visit v
while todo:
vertex = todo.pop(0)
for c in graph.get(vertex, []):
if c not in visited:
visited.add(c) # visit child
todo.append(c)
```

In general, bfs is a better choice for graph traverse due to that:

- No recursion, so the size of the problem $|N|$ is no longer bound by the maximum stack limit.
- No intermediate
`visiting`

state, just`visited`

or not.

## Topological Sorting

The topological ordering is defined as reordering the vertices, $u$ and $v$, $u$ comes before $v$ for every directed edge $uv$. More concretely, if vertex $v$ depends on $u$, then $u$ must be placed before $v$. There MAY exist more than one solutions, and obviously, the graph MUST not contain cycles.

Topological sorting can be used to fine the critical path in the scheduling problem, and we can attack the problem with the following algorithms:

### Depth-first algorithm

This algorithm leverages the dfs: since all my dependencies MUST be placed
*after* me; it is safe to place non-visited vertex $u$ to the head after
visiting all its children in the dfs fashion.

We pass the `orders`

parameter to the `do_dfs`

method for harvest:

```
orders = deque()
def do_dfs(graph, visited, v, orders):
if visited[v] == -1:
return False # cycle detected
if visited[v] == 1:
return True # visited already
visited[v] = -1
if all(do_dfs(graph, visited, n, order) for n in graph.get(v, []):
visited[v] = 1
order.appendleft(v)
return True
return False
```

### Kahn’s algorithm

The Kahn’s algorithm takes the bfs approach:

- Pick a vertex, $u$ without reverse dependencies, or incoming edge.
- Find all of adjacent vertices, $v_i$, and remove the edge
- If vertex $v_k$ no longer has any reverse dependency, add it to the candidate pool.
- Repeat until the candidate pool is empty. Otherwise, fail due to circular dependencies.