Dynamic Programming amazing tricks! — Part I — DP strategies

Never before seen tricks and perspective so that you can think and solve Dynamic Programming Problems!

Motivation

  • preparing coding interviews.
  • training for competitive programming.
  • just wanna learn how to come up with ideas to solve dynamic programming problems.

I will assume that you know how to code (and know recursive functions).

My goal is not to explain classic DP problems (knapsack, coin change, matrix chain multiplication, etc) but to explain the tricks that are behind many “originals” DP problems you can find in an interview or in a problem set, so you gain a “dynamic programming intuition”.

Finally, remember you will only truly gain it once you practice the ideas, so try solving many problems with these ideas.

Introduction — Computing Fibonacci

Computing it is very easy:

But this has a problem: the complexity is exponential O(1.41ⁿ). This is clear by seeing the recursion call tree:

The problem is clear: we call many times the same function with the same argument. We are calculating the same thing many times!

We can add a cache to avoid this problem: we create an array with an entry for each possible argument of the function. We conveniently initialize it in -1 (indicating “non-calculated value”)

As we have O(N) states (possible argument values), then now, as the calculation is done only once for each state, the time complexity it’s linear! Wow!

Other DP problem: min path on 2D board

In this example, the min cost path is the colored which has a cost of 29.

If we think the problem “let’s calculate the min cost path from a cell (x,y) to the down-right corner (cell (N-1, M-1))” we can do it by the following recursive function:

DANGER: The function is clearly exponential!

The general case f(i,j) does two function calls! This might be exponential! But notice that the possible argument values are justN x M states. We can apply the same trick we did with Fibonacci and “remember” the calculated values:

The memset instruction allows us to easily initialize a chunk of memory with -1 but we could use a for loop.

If you don’t like recursion you can calculate f(i,j) in a (2-dimensional) array iteratively:

Notice that we are calculating row by row from the last row to the first one. And in each row, we do the calculations from the rightmost to the leftmost position. This works as a state (i,j) always will use the value of a right state (i,j+1) or down state (i+1,j), that will be already calculated.

We could also calculate column by column. In that case, each column would be calculated from downmost to the uppermost position.

Top-Down vs Bottom-Up DP

  • Top-Down is easy to write from a recursive math definition.
  • Bottom-Up is faster because recursion has a considerable computing cost (it does many stack operations).
  • Top-Down is faster in cases where there are many table’s states that are never calculated (it only calculates what it needs, it calculates “on-demand”).

Trick: From math to top-down DP

  • Creating an cache array with as many states as the function’s arguments.
  • Initialitzing it in a non-usable variable indicating “non computed yet”.
  • Checking on each function call if the value was already calculated. If it’s not calculate it using the formula and store its value on the cache.

As each state is computed only once, the resultant complexity is:

Trick: from Top-Down DP to Bottom-Up

  • They visit all the states.
  • For each state they visit, the “used” states (the one used in the formula of the one we are calculating) were already visited.

Reconstructing the path

In this case, the colored path would be represented by the string “RDRRDD”.

If we already coded the Top-Down solution, we can easily reconstruct the solution using the following backtracking function:

Essentially what we do is to start in the first state and move across transitions we know are optimal. For doing this, we use the already provided Top-Down function to calculate the cost of each state. As we have memoization the cost of doing this is amortized (can be considered O(1) in the backtracking as it will be in the memory after running the calculation of the first state).

Trick: Recipe for reconstructing the solution to any DP problem

  • Solve the problem with a recursive math function
  • Implement it and add memoization. We have a Top-Down!
  • Make backtracking by copying the Top-Down implementation and checking on each transition if it’s optimal. If it is, then move the backtracking recursively through this transition.
  • DON’T FORGET THE RETURNS after the recursive calls: we don’t want to visit all the states, only the one on (any) optimal solution.

Reconstructing the k-th solution

At a glance, this looks like a really hard problem. But it’s easy and there is a recipe (I love recipes!) that will allow you to “reconstruct the k-th solution for any DP”. First, let’s solve it for this problem.

Let’s calculate the number of minimal paths from any given position (x,y) to the end (down-right corner). We can do it by doing other DP:

Here we use the original DP, f, to check if a transition is optimal. Given that, we sum the amount of that transition to the current solution or not.

Now, for reconstructing the k-th solution we just modify the backtracking we already had:

The key idea behind that is that we want the k-th lexicographical solution to include K in the backtracking: if we are in a certain state of the backtracking we are deciding if moving “D” (down) or “R” (right). If we want the k-th lexicographically and the “D” transition is optimal (moving to (i+1, j) is optimal) we can use the information of how many optimal paths are in that transition, amountMin(i+1, j). If they are less than k it indicates we need to move downward and calculate the k-th on (i+1, j). If not, we need to “pass” all these optimal solutions, take other transitions and calculate the k-amountMin(i+1, j) — th solution (we subtract the number of solutions we are “passing”).

Trick Reconstructs the k-th solution for any DP problem

  • Solve the problem with a recursive math function.
  • Implement it and add memoization. We have a Top-Down!
  • Calculate the number of optimal solutions for each state. You can do it by writing other DP based on the code of the Top-Down and using it for checking if a transition is optimal. If it’s optimal we need to accumulate its results.
  • Make the reconstruction backtracking including K on the arguments. Inspect the transitions that comes lexicographically first. If they are optimal decide if we need to choose this transition or pass all the solutions that imply taking this transition. We decide this by comparing K and the number of solutions over this transition. Remember that if we pass, we need to subtract the number of passed solutions from k. For that we calculated the number of optimal solution for each state.

Reducing memory complexity from O(N²) to O(N)

There we are overwriting the (i+2)-th values with the i-th (as we will not need them anymore!).

This way we are reducing by N factor the memory complexity! Wow!

Trick: Saving memory by overwriting states on DP

  • Have a Bottom-Up implementation (remember that in general, we can easily translate any recursive math definition in Bottom-Up implementation if we are cautious about the order we iterate the states).
  • For overwriting certain states we can use % (modulo operator) over all the usages of certain dimensions. Of course, we will modify the array declaration to declare less memory.
  • Notice here we are using %2 because we are only maintaining a “strip” of memory of size 2. This can be easily generalized to greater strip sizes (imagine the problem where we can take steps of size two).
  • We have to be careful to apply the trick to the dimension that it’s iterated by the outer for loop from the Bottom-Up. A good tip is to make a picture to visualize the order we are filling the table for the purpose of choosing which dimension to overwrite.
  • REMEMBER the % when you are showing/using the result!

Adding a flag to the state

It’s easy to carry K in the recursion state and adapt the definition to consider the possibility of moving as a bishop:

What’s the complexity here? Remember the complexity of any DP is:

The number of states is: N x N x K.

Now we have a for loop inside the calculating function so the complexity of calculating one state is O(K).

So the total complexity is O(N² K²).

But we can erase the for loop by adding a flag (we will call it “b”) to our implementation that indicates if “the last movement was a diagonal movement”.

Another interpretation of b is that indicates if “we were moving as a bishop”

Notice that now, the number of states is: N x N x K x 2

But the complexity of calculating one state is O(1)

So the total complexity is O(N² K)

We reduced the calculation complexity by a linear factor and only increased the state size by a constant.

Trick: Adding a flag to the state for codifying certain information about the past

So it’s very typical to add flags/numbers/states with the necessary information from the past.

Note: The fact that we are as agnostic of the past as possible it’s the source of the power of the DP: the overlapping of the subproblems. Each state could be associated to “many pasts”.

Trick: recovering a parameter

27 2 30 1 2 31 = (27–30)² + (30–31)² + (2–1)² + (1–2)² = 12

We can solve it by the following recursive function:

r, b, i indicates lastRed, lastBlue and currentPosition respectively.

f is calculating the cost of splitting the sequence of [currentPosition, N) (from currentPosition to the end), considering that in the past lastRed was the latest position painted by red (and lastBlue was the latest position painted by blue).

It’s clear we have only two options for currentPosition: painting it with blue or red. These are the options represented in the min. If we paint it with red we will need to add the cost of the square of the difference with the value that was in lastRed. That’s why we need this information from the past (lastRed and lastBlue): because when painting currentPosition we need to match it with a value that painted in the past.

(We didn’t write the corner cases where currentPosition = n just because it’s trivially 0).

If we make a trivial implementation the complexity would be O(N³) (N³ states and O(1) complexity on the state calculation).

But observe that:

This constraint tells us that there are many states from the N³ we don’t really need. Notice that a Top-Down implementation will never visit a state that doesn’t hold the condition.

We can only use two of the three parameters in our memoization:

Also, we don’t need to calculate one parameter from another, as we can carry it on the recursion, but not using it for accessing the memorization:

Homework

End of Part I

About me

If you liked this post / my style follow me.

I will try to keep on posting regularly about SE, Web, and ML always with these objectives in mind:

  • keep it short
  • being meaningful
  • explain with examples

--

--

NLP & Computer Vision Engineer 🧠 Only posts that add VALUE to GROWTH as SWE AI / ML 📈 ICPC LATAM Champion 🏆 (defeated Berkeley, Stanford & ETH) Ex-@Google

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Mariano Crosetti

NLP & Computer Vision Engineer 🧠 Only posts that add VALUE to GROWTH as SWE AI / ML 📈 ICPC LATAM Champion 🏆 (defeated Berkeley, Stanford & ETH) Ex-@Google