DFS, BFS and Topological Sort

Jul 12, 2018 algorithm interview

In this post, we extend the discussion of graph traverse algorithms: breadth-first search, aka bfs; and depth-first search, aka dfs. They try to solve the problem from different angles, more intuitively:

bfs circulates the neighborhood until our goal is met, we MAY also find the shortest path with DP, see Dijkstra’s shortest path algorithm.
dfs picks one direction in every crossing until we hits the wall, with appropriate state push / pop, we can backtracking ALL possible solution.

Either way, we build the adjacent list first using collections.defaultdict:

from collections import defaultdict

graph = defaultdict(list)
for start, end in directed_edges:
    graph[start].append(end)
    # For non-directed edges:
    # graph[end].append(start)

It is worthy noting that there exist three states for each vertex:

not visited: the vertex is unknown in the current state.
visiting: the vertex is being visited, but its dependencies are not handled.
visited: all done.

dfs is a typical post-order traversal: the node v is marked as visiting at first encounter, and set as visited only if all its successors are visited. Thus, we can use the dfs to detect the cycle.

def dfs(graph):
    visited = defaultdict(int)  # 0: not visited, -1: visiting, 1: visited.
    return all(do_dfs(graph, visited, v) for v in graph)

def do_dfs(graph, visited, v):
    if visited[v] == -1:
        return False  # cycle detected

    if visited[v] == 1:
        return True  # visited already

    visited[v] = -1
    if all(do_dfs(graph, visited, n) for n in graph.get(v, [])):
        visited[v] = 1
        return True
    return False

Note we use graph.get(v, []) during the traversal, as graph[v] may mutate the graph object if v is not in graph. Also if the graph is not fully-connected, we may also need to track how many vertices has been visited.

We can apply the same state transition in bfs, aka the three-color encoding in CLRS P594:

def bfs(graph):
    visited = defaultdict(int)  # 0: not visited, -1: visiting, 1: visited.
    for v in graph:
        do_bfs(graph, visited, v)

def do_bfs(graph, visited, v):
    if visited[v] == 0:
        return

    todo = [v]

    while todo:
        vertex = todo.pop(0)
        visited[vertex] = -1  # visiting
        for c in graph.get(vertex. []):
            if visited[c] == 0:
                todo.append(c)
        visited[vertex] = 1  # visited

The intermediate visiting state does not help the cycle detection, thus we can simplify the state by visiting the vertex’s children immediately after they are enqueued:

def bfs(graph):
    visited = set()
    for v in graph:
        do_bfs(graph, visited, v)

def do_bfs(graph, visited, v):
    if v in visited:
        return

    todo = [v]
    visited.add(v)  # visited

    while todo:
        vertex = todo.pop(0)
        for c in graph.get(vertex, []):
            if c not in visited:
                todo.append(c)

In general, bfs is a better choice for graph traverse due to that:

No recursion, so the size of the problem $|N|$ is no longer bound by the maximum stack limit.
No intermediate visiting state, just visited or not.

Topological Sorting

The topological ordering resolves the dependency between $u$ and $v$ . If $u$ MUST comes before $v$ for every directed edge, we call $v$ depends on $u$ . There MAY exist more than one solutions, and obviously, the graph MUST not contain cycles.

Topological sorting can be used to find the critical path in the scheduling problem, and we can attack the problem with the following algorithms:

Depth-first algorithm

This algorithm leverages the dfs: since all my dependencies MUST be placed before me; it is safe to append the vertex $v$ to the tail after all its children are visited in the depth-first search.

We pass the orders parameter to the do_dfs method for harvest:

orders = deque()

def do_dfs(graph, visited, v, orders):
    if visited[v] == -1:
        return False  # cycle detected

    if visited[v] == 1:
        return True  # visited already

    visited[v] = -1
    if all(do_dfs(graph, visited, n, order) for n in graph.get(v, []):
        visited[v] = 1
        order.append(v)
        return True
    return False

Kahn’s algorithm

The Kahn’s algorithm takes the bfs approach:

Pick a vertex, $u$ without reverse dependencies, or incoming edge.
Find all of adjacent vertices, $v_i$ , and remove the edge
If vertex $v_k$ no longer has any reverse dependency, add it to the candidate pool.
Repeat until the candidate pool is empty. Otherwise, fail due to circular dependencies.