DFS, BFS and Topological Sort
algorithminterviewIn this post, we extend the discussion of graph traverse algorithms: breadth-first search, aka bfs; and depth-first search, aka dfs. They try to solve the problem from different angles, more intuitively:
- bfs circulates the neighborhood until our goal is met, we MAY also find the shortest path with DP, see Dijkstra’s shortest path algorithm.
- dfs picks one direction in every crossing until we hits the wall, with appropriate state push / pop, we can backtracking ALL possible solution.
Either way, we build the adjacent list first using collections.defaultdict
:
from collections import defaultdict
graph = defaultdict(list)
for start, end in directed_edges:
graph[start].append(end)
# For non-directed edges:
# graph[end].append(start)
It is worthy noting that there exist three states for each vertex:
- not visited: the vertex is unknown in the current state.
- visiting: the vertex is being visited, but its dependencies are not handled.
- visited: all done.
dfs is a typical post-order traversal: the node v
is marked as visiting
at
first encounter, and set as visited
only if all its successors are
visited. Thus, we can use the dfs to detect the cycle.
def dfs(graph):
visited = defaultdict(int) # 0: not visited, -1: visiting, 1: visited.
return all(do_dfs(graph, visited, v) for v in graph)
def do_dfs(graph, visited, v):
if visited[v] == -1:
return False # cycle detected
if visited[v] == 1:
return True # visited already
visited[v] = -1
if all(do_dfs(graph, visited, n) for n in graph.get(v, []):
visited[v] = 1
return True
return False
Note we use graph.get(v, [])
during the traversal, as graph[v]
may mutate
the graph
object if v
is not in graph
. Also if the graph is not
fully-connected, we may also need to track how many vertices has been visited.
We can apply the same state transition in bfs, aka the three-color encoding in CLRS P594:
def bfs(graph):
visited = defaultdict(int) # 0: not visited, -1: visiting, 1: visited.
for v in graph:
if visited[v] == 0:
do_bfs(graph, visited, v)
def do_bfs(graph, visited, v):
todo = [v]
visited[v] = -1 # visiting
while todo:
vertex = todo.pop(0)
for c in graph.get(vertex. []):
if visited[c] == 0:
visited[c] = -1
todo.append(c)
visited[vertex] = 1
The intermediate visiting
state does not help the cycle detection, thus we can
simplify the state by visiting the vertex’s children immediately after they are
enqueued:
def bfs(graph):
visited = set()
for v in graph:
if v not in visited:
do_bfs(graph, visited, v)
def do_bfs(graph, visited, v):
todo = [v]
visited.add(v) # visit v
while todo:
vertex = todo.pop(0)
for c in graph.get(vertex, []):
if c not in visited:
visited.add(c) # visit child
todo.append(c)
In general, bfs is a better choice for graph traverse due to that:
- No recursion, so the size of the problem is no longer bound by the maximum stack limit.
- No intermediate
visiting
state, justvisited
or not.
Topological Sorting
The topological ordering is defined as reordering the vertices, and , comes before for every directed edge . More concretely, if vertex depends on , then must be placed before . There MAY exist more than one solutions, and obviously, the graph MUST not contain cycles.
Topological sorting can be used to fine the critical path in the scheduling problem, and we can attack the problem with the following algorithms:
Depth-first algorithm
This algorithm leverages the dfs: since all my dependencies MUST be placed after me; it is safe to place non-visited vertex to the head after visiting all its children in the dfs fashion.
We pass the orders
parameter to the do_dfs
method for harvest:
orders = deque()
def do_dfs(graph, visited, v, orders):
if visited[v] == -1:
return False # cycle detected
if visited[v] == 1:
return True # visited already
visited[v] = -1
if all(do_dfs(graph, visited, n, order) for n in graph.get(v, []):
visited[v] = 1
order.appendleft(v)
return True
return False
Kahn’s algorithm
The Kahn’s algorithm takes the bfs approach:
- Pick a vertex, without reverse dependencies, or incoming edge.
- Find all of adjacent vertices, , and remove the edge
- If vertex no longer has any reverse dependency, add it to the candidate pool.
- Repeat until the candidate pool is empty. Otherwise, fail due to circular dependencies.