Dynamic Programming, aka DP, “solves the problem by breaking it into sub-problems and then recursively finding the optimal solutions to the sub-problems is said to have optimal substructure”(see Wikipedia). More concretely, the problem suffice the Principle of Optimality:
Principle of Optimality: An optimal policy has the property that whatever the initial state and initial decision are, the remaining decisions must constitute an optimal policy with regard to the state resulting from the first decision. (See Bellman, 1957, Chap. III.3.)
Or mathematically, to solve the problem from
T optimally, we
can solve a sub-problem from
t < s < T and extend
It is relatively intuitive to reason the optimal substructure for most problems, we will only focus how to build up the optimal solution in this post.
Top-down vs. bottom-up
We are trained to solve the problem in the top-down approach: break down a big problem to several smaller problems, recursively doing so until the smaller problems can be comfortably tackled. CLRS points out that the time complexity of top-down with memoization is the same as the bottom-up approach, but the latter is preferred as the top-down approach requires callstacks which caps the problem scope.
Using the #322 Coin Change problem as one example:
amountwith different coin denominations, find the fewest coins to make up the amount.
In the top-down approach, the problem can be decomposed as:
f(amount) = min(f(amount - c) for c in coins) + 1
We can convert the recursion to a forward loop with the lookup table,
def coin_change(coins, amount): dp = [float('inf')] * (amount + 1) dp = 0 for index in range(1, amount + 1): for c in coins: if index >= c: dp[index] = min(dp[index], dp[index - c] + 1) return dp[-1] if dp[-1] < float('inf') else -1
Look back vs. Look forward
We solve the DP problems by looking back the sub-problems; on the other hand, some problems require us to look forward: the solution for the current sub-problem depends on the sub-problems in the future. Technically, they SHOULD be categorized as BFS, — the solution space is explored with the breadth first search. They are discussed here just for the reference.
For example, the #55 Jump Game:
Given an array of non-negative integers, each element specifies the maximum jumps at that position, determine whether you are able to reach last index from the first index*.
If we consider positions as a nodes, and the jump as the directed connection;
the problem is transformed whether there exists a path from
start to the
from collections import deque def can_jump_bfs(nums): if len(nums) <= 1: return True # Push the first position to the queue, and mark it visited. queue = deque() bfs = [False] * len(nums) bfs = True while queue: pos = queue.pop() # optimization: pop from the end of queue for step in range(1, nums[pos] + 1): jump_to = pos + step if jump_to == len(nums) - 1: return True # Append to the queue if not visited. if not bfs[jump_to]: bfs[jump_to] = True queue.append(jump_to) return False
In practice, I’ve only seen the two types of problems attacked by the two-dimension DP:
- the problem space is defined as , such as the Smith-Waterman algorithm.
- the problem space is linear, but the sub-problem MUST be solved from any
(i, j)range, such as multiply matrix product.
In this section, let’s discuss the second case with the #312. Burst Balloons:
Given n balloons with designated values, burst balloon i will get
nums[left] * nums[i] * nums[right]coins, try to maximize the gain.
If we add two balloons with value of
1 on both sides, the
original problem become: with
n + 2 balloons from
n + 1 and
balloons == balloons[n + 1] == 1, burst balloons from
maximize the profit.
If we burst
balloons[i] first, the
balloons[i - 1] and
balloons[i + 1] become adjacent. The solution to the sub-problem
grows complicated due to the bookkeeping of adjacent balloons. Instead,
we can divide-and-conquer the problem by focusing on the last balloon
to burst: if
balloons[i] is the last balloon to burst, the accumulated
ithballoons is the last balloon too burst, so the last trio is (0, i, n + 1).
- Recall the
fmethod will NOT burst the balloon on the boundary, so only
balloons[i - 1]are bursted in
- The same rule applies to
f(i , n + 1).
The problem can be solved by two sub-problems in the top-down fashion. If we build the DP table from the bottom up:
- initialize the
- we can expect minimum 3 balloons for any
dp[i][j], apply the logic.
- extend the range of
(i, j)until we get
(0, n + 1).
def max_coins(nums): n = len(nums) nums =  + nums +  # dp is (n + 2) * (n + 2) matrix dp = [  * (n + 2) for i in range(n + 2)] # With nums and nums[n + 1] present, we can expect 3 balloons as minimum for gap in range(2, n + 2): for first in range(0, n + 2 - gap): last = first + gap if last - first == 2: dp[first][last] = nums[first] * nums[first + 1] * nums[last] else: dp[first][last] = max( dp[first][j] + dp[j][last] + nums[first] * nums[j] * nums[last] for j in range(first + 1, last) ) return dp[n + 1]
In some special case, we can optimize the multiple-dimension DP to multiple-pass DP. For example, #123 Best Time to Buy and Sell Stock III:
Given the stock price for each day, maximize the profit with two transactions.
Consider we keep tracks the
balance[i], we can maximize the balance by:
- find the time to buy low, aka maximize
balances[i] - prices[i]
- find the time to sell high, aka maximize
balances[i] + prices[i]
We can repeat
k times(k = 2 in this case) to fully exploit the market.
def max_profits(prices): balances =  * len(prices) for t in range(2): best_buy = float('-inf') best_profit = 0 for i, price in enumerate(prices): best_buy = max(best_buy, balances[i] - prices[i]) best_profit = max(best_profit, best_buy + prices[i]) balances[i] = best_profit return balances[-1]
In this post, we demonstrate several use cases of the dynamic programming from different angles for a better understanding. The cornerstone of DP is how to break down the problem to sub-problems for the optimal solution, then we can build the solution in the bottom-up fashion.