In this post, we extend the discussion of graph traverse algorithms: breadth-first search, aka bfs; and depth-first search, aka dfs. They try to solve the problem from different angles, more intuitively:
- bfs circulates the neighborhood until our goal is met, we MAY also find the shortest path with DP, see Dijkstra’s shortest path algorithm.
- dfs picks one direction in every crossing until we hits the wall, with appropriate state push / pop, we can backtracking ALL possible solution.
Either way, we build the adjacent list first using
from collections import defaultdict graph = defaultdict(list) for start, end in directed_edges: graph[start].append(end) # For non-directed edges: # graph[end].append(start)
It is worthy noting that there exist three states for each vertex:
- not visited: the vertex is unknown in the current state.
- visiting: the vertex is being visited, but its dependencies are not handled.
- visited: all done.
dfs is a typical post-order traversal: the node
v is marked as
first encounter, and set as
visited only if all its successors are
visited. Thus, we can use the dfs to detect the cycle.
def dfs(graph): visited = defaultdict(int) # 0: not visited, -1: visiting, 1: visited. return all(do_dfs(graph, visited, v) for v in graph) def do_dfs(graph, visited, v): if visited[v] == -1: return False # cycle detected if visited[v] == 1: return True # visited already visited[v] = -1 if all(do_dfs(graph, visited, n) for n in graph.get(v, ): visited[v] = 1 return True return False
Note we use
graph.get(v, ) during the traversal, as
graph[v] may mutate
graph object if
v is not in
graph. Also if the graph is not
fully-connected, we may also need to track how many vertices has been visited.
We can apply the same state transition in bfs, aka the three-color encoding in CLRS P594:
def bfs(graph): visited = defaultdict(int) # 0: not visited, -1: visiting, 1: visited. for v in graph: if visited[v] == 0: do_bfs(graph, visited, v) def do_bfs(graph, visited, v): todo = [v] visited[v] = -1 # visiting while todo: vertex = todo.pop(0) for c in graph.get(vertex. ): if visited[c] == 0: visited[c] = -1 todo.append(c) visited[vertex] = 1
visiting state does not help the cycle detection, thus we can
simplify the state by visiting the vertex’s children immediately after they are
def bfs(graph): visited = set() for v in graph: if v not in visited: do_bfs(graph, visited, v) def do_bfs(graph, visited, v): todo = [v] visited.add(v) # visit v while todo: vertex = todo.pop(0) for c in graph.get(vertex, ): if c not in visited: visited.add(c) # visit child todo.append(c)
In general, bfs is a better choice for graph traverse due to that:
- No recursion, so the size of the problem is no longer bound by the maximum stack limit.
- No intermediate
The topological ordering is defined as reordering the vertices, and , comes before for every directed edge . More concretely, if vertex depends on , then must be placed before . There MAY exist more than one solutions, and obviously, the graph MUST not contain cycles.
Topological sorting can be used to fine the critical path in the scheduling problem, and we can attack the problem with the following algorithms:
This algorithm leverages the dfs: since all my dependencies MUST be placed after me; it is safe to place non-visited vertex to the head after visiting all its children in the dfs fashion.
We pass the
orders parameter to the
do_dfs method for harvest:
orders = deque() def do_dfs(graph, visited, v, orders): if visited[v] == -1: return False # cycle detected if visited[v] == 1: return True # visited already visited[v] = -1 if all(do_dfs(graph, visited, n, order) for n in graph.get(v, ): visited[v] = 1 order.appendleft(v) return True return False
The Kahn’s algorithm takes the bfs approach:
- Pick a vertex, without reverse dependencies, or incoming edge.
- Find all of adjacent vertices, , and remove the edge
- If vertex no longer has any reverse dependency, add it to the candidate pool.
- Repeat until the candidate pool is empty. Otherwise, fail due to circular dependencies.