Dynamic Programming Recap


Dynamic Programming, aka DP, “solves the problem by breaking it into sub-problems and then recursively finding the optimal solutions to the sub-problems is said to have optimal substructure”(see Wikipedia). More concretely, the problem suffice the Principle of Optimality:

Principle of Optimality: An optimal policy has the property that whatever the initial state and initial decision are, the remaining decisions must constitute an optimal policy with regard to the state resulting from the first decision. (See Bellman, 1957, Chap. III.3.)

Or mathematically, to solve the problem from t to T optimally, we can solve a sub-problem from t to s given t < s < T and extend s to T.

It is relatively intuitive to reason the optimal substructure for most problems, we will only focus how to build up the optimal solution in this post.

Top-down vs. bottom-up

We are trained to solve the problem in the top-down approach: break down a big problem to several smaller problems, recursively doing so until the smaller problems can be comfortably tackled. CLRS points out that the time complexity of top-down with memoization is the same as the bottom-up approach, but the latter is preferred as the top-down approach requires O(n)O(n) callstacks which caps the problem scope.

Using the #322 Coin Change problem as one example:

Given the amount with different coin denominations, find the fewest coins to make up the amount.

In the top-down approach, the problem can be decomposed as:

f(amount) = min(f(amount - c) for c in coins) + 1

We can convert the recursion to a forward loop with the lookup table, dp:

def coin_change(coins, amount):
    dp = [float('inf')] * (amount + 1)
    dp[0] = 0

    for index in range(1, amount + 1):
        for c in coins:
            if index >= c:
                dp[index] = min(dp[index], dp[index - c] + 1)

    return dp[-1] if dp[-1] < float('inf') else -1

Look back vs. Look forward

We solve the DP problems by looking back the sub-problems; on the other hand, some problems require us to look forward: the solution for the current sub-problem depends on the sub-problems in the future. Technically, they SHOULD be categorized as BFS, — the solution space is explored with the breadth first search. They are discussed here just for the reference.

For example, the #55 Jump Game:

Given an array of non-negative integers, each element specifies the maximum jumps at that position, determine whether you are able to reach last index from the first index*.

If we consider positions as a nodes, and the jump as the directed connection; the problem is transformed whether there exists a path from start to the end.

from collections import deque

def can_jump_bfs(nums):
    if len(nums) <= 1:
        return True

    # Push the first position to the queue, and mark it visited.
    queue = deque([0])
    bfs = [False] * len(nums)
    bfs[0] = True

    while queue:
        pos = queue.pop()  # optimization: pop from the end of queue
        for step in range(1, nums[pos] + 1):
            jump_to = pos + step
            if jump_to == len(nums) - 1:
                return True

            # Append to the queue if not visited.
            if not bfs[jump_to]:
                bfs[jump_to] = True

    return False

Multi-dimension DP

In practice, I’ve only seen the two types of problems attacked by the two-dimension DP:

  1. the problem space is defined as m×nm \times n, such as the Smith-Waterman algorithm.
  2. the problem space is linear, but the sub-problem MUST be solved from any arbitrary (i, j) range, such as multiply matrix product.

In this section, let’s discuss the second case with the #312. Burst Balloons:

Given n balloons with designated values, burst balloon i will get nums[left] * nums[i] * nums[right] coins, try to maximize the gain.

If we add two balloons with value of 1 on both sides, the original problem become: with n + 2 balloons from 0 to n + 1 and balloons[0] == balloons[n + 1] == 1, burst balloons from 1 to n, maximize the profit.

If we burst balloons[i] first, the balloons[i - 1] and balloons[i + 1] become adjacent. The solution to the sub-problem grows complicated due to the bookkeeping of adjacent balloons. Instead, we can divide-and-conquer the problem by focusing on the last balloon to burst: if balloons[i] is the last balloon to burst, the accumulated coins are:

f(0,i)+f(i,n+1)+nums[0]×nums[i]×nums[n+1]f(0, i) + f(i, n + 1) + nums[0] \times nums[i] \times nums[n + 1]

  • The ith balloons is the last balloon too burst, so the last trio is (0, i, n + 1).
  • Recall the f method will NOT burst the balloon on the boundary, so only balloon[1] to balloons[i - 1] are bursted in f(0, i).
  • The same rule applies to f(i , n + 1).

The problem can be solved by two sub-problems in the top-down fashion. If we build the DP table from the bottom up:

  • initialize the dp matrix (n+2)×(n+2)(n + 2) \times (n + 2)
  • we can expect minimum 3 balloons for any dp[i][j], apply the logic.
  • extend the range of (i, j) until we get (0, n + 1).
def max_coins(nums):
    n = len(nums)
    nums = [1] + nums + [1]
    # dp is (n + 2) * (n + 2) matrix
    dp = [ [0] * (n + 2) for i in range(n + 2)]

    # With nums[0] and nums[n + 1] present, we can expect 3 balloons as minimum
    for gap in range(2, n + 2):
        for first in range(0, n + 2 - gap):
            last = first + gap
            if last - first == 2:
                dp[first][last] = nums[first] * nums[first + 1] * nums[last]
                dp[first][last] = max(
                    dp[first][j] + dp[j][last] + nums[first] * nums[j] * nums[last]
                    for j in range(first + 1, last)
    return dp[0][n + 1]

Multiple-pass DP

In some special case, we can optimize the multiple-dimension DP to multiple-pass DP. For example, #123 Best Time to Buy and Sell Stock III:

Given the stock price for each day, maximize the profit with two transactions.

Consider we keep tracks the balance[i], we can maximize the balance by:

  • find the time to buy low, aka maximize balances[i] - prices[i]
  • find the time to sell high, aka maximize balances[i] + prices[i]

We can repeat k times(k = 2 in this case) to fully exploit the market.

def max_profits(prices):
    balances = [0] * len(prices)

    for t in range(2):
        best_buy = float('-inf')
        best_profit = 0

        for i, price in enumerate(prices):
            best_buy = max(best_buy, balances[i] - prices[i])
            best_profit = max(best_profit, best_buy + prices[i])
            balances[i] = best_profit
    return balances[-1]


In this post, we demonstrate several use cases of the dynamic programming from different angles for a better understanding. The cornerstone of DP is how to break down the problem to sub-problems for the optimal solution, then we can build the solution in the bottom-up fashion.