DFS, BFS and Topological Sort

algorithm interview

In this post, we extend the discussion of graph traverse algorithms: breadth-first search, aka bfs; and depth-first search, aka dfs. They try to solve the problem from different angles, more intuitively:

Either way, we build the adjacent list first using collections.defaultdict:

from collections import defaultdict

graph = defaultdict(list)
for start, end in directed_edges:
    graph[start].append(end)
    # For non-directed edges:
    # graph[end].append(start)

It is worthy noting that there exist three states for each vertex:

dfs is a typical post-order traversal: the node v is marked as visiting at first encounter, and set as visited only if all its successors are visited. Thus, we can use the dfs to detect the cycle.

def dfs(graph):
    visited = defaultdict(int)  # 0: not visited, -1: visiting, 1: visited.
    return all(do_dfs(graph, visited, v) for v in graph)

def do_dfs(graph, visited, v):
    if visited[v] == -1:
        return False  # cycle detected

    if visited[v] == 1:
        return True  # visited already

    visited[v] = -1
    if all(do_dfs(graph, visited, n) for n in graph.get(v, []):
        visited[v] = 1
        return True
    return False

Note we use graph.get(v, []) during the traversal, as graph[v] may mutate the graph object if v is not in graph. Also if the graph is not fully-connected, we may also need to track how many vertices has been visited.

We can apply the same state transition in bfs, aka the three-color encoding in CLRS P594:

def bfs(graph):
    visited = defaultdict(int)  # 0: not visited, -1: visiting, 1: visited.
    for v in graph:
        if visited[v] == 0:
            do_bfs(graph, visited, v)

def do_bfs(graph, visited, v):
    todo = [v]
    visited[v] = -1  # visiting

    while todo:
        vertex = todo.pop(0)
        for c in graph.get(vertex. []):
            if visited[c] == 0:
                visited[c] = -1
                todo.append(c)
        visited[vertex] = 1

The intermediate visiting state does not help the cycle detection, thus we can simplify the state by visiting the vertex’s children immediately after they are enqueued:

def bfs(graph):
    visited = set()
    for v in graph:
        if v not in visited:
            do_bfs(graph, visited, v)

def do_bfs(graph, visited, v):
    todo = [v]
    visited.add(v)  # visit v

    while todo:
        vertex = todo.pop(0)
        for c in graph.get(vertex, []):
            if c not in visited:
                visited.add(c)  # visit child
                todo.append(c)

In general, bfs is a better choice for graph traverse due to that:

Topological Sorting

The topological ordering is defined as reordering the vertices, uu and vv, uu comes before vv for every directed edge uvuv. More concretely, if vertex vv depends on uu, then uu must be placed before vv. There MAY exist more than one solutions, and obviously, the graph MUST not contain cycles.

Topological sorting can be used to fine the critical path in the scheduling problem, and we can attack the problem with the following algorithms:

Depth-first algorithm

This algorithm leverages the dfs: since all my dependencies MUST be placed after me; it is safe to place non-visited vertex uu to the head after visiting all its children in the dfs fashion.

We pass the orders parameter to the do_dfs method for harvest:

orders = deque()

def do_dfs(graph, visited, v, orders):
    if visited[v] == -1:
        return False  # cycle detected

    if visited[v] == 1:
        return True  # visited already

    visited[v] = -1
    if all(do_dfs(graph, visited, n, order) for n in graph.get(v, []):
        visited[v] = 1
        order.appendleft(v)
        return True
    return False

Kahn’s algorithm

The Kahn’s algorithm takes the bfs approach:

  1. Pick a vertex, uu without reverse dependencies, or incoming edge.
  2. Find all of adjacent vertices, viv_i, and remove the edge
  3. If vertex vkv_k no longer has any reverse dependency, add it to the candidate pool.
  4. Repeat until the candidate pool is empty. Otherwise, fail due to circular dependencies.