Python: Probably the Best Language for Interview

pythoninterview

I have watched candidates coding in the whiteboard with C++, Java, JavaScript, ES6 JavaScript, Python, Ruby, — looking forward to Rust, Clojure, Erlang etc, programers! I think Python is probably the best programming language for technical interview, especially whiteboard programming.

High expressiveness

Python use indention to express control blocks, which has zero cost in the whiteboard programming. No other languages can beat doing nothing!

Joke aside, Python absorbs many traits from functional languages for the better expressiveness, more notably:

  1. List comprehension
  2. Higher-order functions
  3. Generator and iterator

List Comprehension

I think the list comprehension might be the most intuitive and powerful way to manipulate the iterable data.

We can flatten the nested list with even-number predict in one-liner:

In [6]: nested_list = [[1, 2], (3, 4, 5)]
In [7]: [x for sublist in nested_list for x in sublist if x % 2 == 0]
Out[7]: [2, 4]

We can explore the neighbors of position (i, j) with predicts for breadth-first search in the 2D grid, such as 695. Max Area of Island.

candidates = [
    (x, y) for x, y in explore(i, j, rows, cols)
    if grid[x][y] and (x, y) not in visited
]

Higher-order functions

I prefer the list comprehension over map and filter composition for its readability. The filter shines when constructing the sequence from scratch. For example, the following explore method returns adjacent positions within bounds:

def explore(i, j, rows, cols):
    return filter(
        lambda d: 0 <= d[0] < rows and 0 <= d[1] < cols,
        [(i - 1, j), (i, j + 1), (i, j - 1), (i + 1, j)]
    )

You can also invoke sort, sorted, min, max method with extra key function to override the default ordering logic. For example, we can sort the DNA sequence based on the occurrences:

>>> counter = Counter('acacctatgccagggttgagggtgcgacggga')
>>> sorted(counter, key=lambda k: counter[k], reverse=True)
['g', 'a', 'c', 't']

The module operator also provides alternative methods for the operators, such as operator.add(x, y) is equivalent to lambda x, y: x + y. It is merely a matter of personal taste.

Generator and Iterator

It is quite common to harvest the results from the recursion, such as tree traversal. Alternatively, we can use generator to emit results, then collect results externally. It is more readable, and consumes less memory in general.

def pre_order_traverse(root: TreeNode):
    if root is None: return
    yield root.val
    yield from root.left
    yield from root.right

# Materialize the generator with list
list(pre_order_traverse(root))

On the other hand, we might want to consume the iterable data progressively. For example, in the n-way merge, the next element in the sequence is pushed to the heap only when the preceeding element is popped. Instead of tracking the sequence’s cursor, we use the iterator to simplify the book keeping.

from heapq import heapify, heappush, heappop

def merge(seqs: List[List[int]]):
    heap = [iter(s) for s in seqs]
    heap = [(next(it), it) for it in heap]
    heapify(heap)

    while heap:
        val, it = heappop(heap)
        yield val
        try:
            heappush(heap, (next(it), it))
        except StopIteration:
            pass

Batteries included

Python is reputable for the batteries include philosophy. Out of the box, Python programmers are equipped with:

  • set, aka HashSet in Java
  • dict, aka HashMap in Java
  • list, a generic vector or array with random access
  • heapq, a min-heap, aka PriorityQueue
  • collections.Counter, similar ideas as MultiSet
  • collections.deque, a double-ended queue to support efficient appends and pops from both sides.
  • collections.defaultdict, can be used for MultiHashMap in java.
  • collections.OrderedDict

Let’s see how they make our lives easier, :-).

Tuple

The built-in data structure tuple is an essential build block despite it is often underestimated.

We can store arbitrary data in a tuple instead of declaring a class, — quick and dirty but suitable for the interview context. Here is a more sophisticated example to use tuple for double-linked list.

Recall the n-way merge example: the iterator is stored with the value, then pushed back to min-heap, and the heap-sort still works! This is due to that the tuple is compared from left to right:

In [47]: (1, 2) < (1, 3) < (2, 1)
Out[47]: True

If we encapsulate the data into a class, such as:

class Point:
    def __init__(self, x, y):
        self.x = x
        self.y = y

The comparison no longer works since how to compare Point instance is non-determined yet. Hint: implement __lt__ method.

In [49]: Point(1, 3) < Point(2, 1)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-49-40188351117f> in <module>
----> 1 Point(1, 3) < Point(2, 1)

TypeError: '<' not supported between instances of 'Point' and 'Point'

If you holds a strong opinion about type, the collections.namedtuple might be a good middle-ground:

In [51]: Point = namedtuple('Point', ['x', 'y'])

In [52]: Point(1, 2) < Point(1, 3) < Point(2, 1)
Out[52]: True

Personally, I would still prefer the tuple approach just to avoid the explanation of magic of namedtuple.

The tuple instance is immutable, so it can be used for the dict key. For the 2D dynamic programming problem, instead of building the 2d matrix tediously (see below), we can use dict, such as dp[(i, j)] instead.

Trie

We can build the Trie with defaultdict:

from collections import defaultdict
from pprint import pprint

def default_factory():
    return defaultdict(default_factory)

trie = defaultdict(default_factory)
for word in ['hello', 'halt', 'hat']:
    node = trie
    for c in word:
        node = node[c]
    node['$'] = None

pprint(trie)

Each node of the Trie is defaultdict thanks to the default_factory. The dump shows the memory layouts as:

defaultdict(<function default_factory at 0x7fcacd51d6a8>,
    {'h': defaultdict(<function default_factory at 0x7fcacd51d6a8>,
        {'a': defaultdict(<function default_factory at 0x7fcacd51d6a8>,
            {'l': defaultdict(<function default_factory at 0x7fcacd51d6a8>,
                {'t': defaultdict(<function default_factory at 0x7fcacd51d6a8>,
                    {'$': None})}),
            't': defaultdict(<function default_factory at 0x7fcacd51d6a8>,
                {'$': None})}),
        'e': defaultdict(<function default_factory at 0x7fcacd51d6a8>,
            {'l': defaultdict(<function default_factory at 0x7fcacd51d6a8>,
                {'l': defaultdict(<function default_factory at 0x7fcacd51d6a8>,
                    {'o': defaultdict(<function default_factory at 0x7fcacd51d6a8>,
                        {'$': None})})})})})})

LRU Cache

We can implement the LRU cache with OrderedDict, see the official document.

Parsing wth stack

The data structure, stack is frequently used for expression parsing. The list can be modeled as stack with append and pop methods. As python is dynamic-typed, it is quite handy to store heterogenous operands and operators in the stack.

Syntax sugar

In this section, I will share some tips which may save a minute or two in the interview.

Compare every adjacent elements in the list. The magic is zip can aggregate two sequences and stops when the shortest is exhausted.

[x > y for x, y in zip(nums, nums[1:])]

Build the lookup table for the element to its index.

lut = dict((x, index) for index, x in enumerate(nums))

Pre-allocate the list with desired capacity.

nums = [0] * size

It is worthy noting that python uses pass-by-reference, so the following snippet does not work as expected:

In [2]: rows, cols = 2, 3
   ...: row = [0] * cols
   ...: matrix = [row] * rows
   ...: matrix[0][0] = 3

In [3]: print(matrix)
[[3, 0, 0], [3, 0, 0]]

In [21]: row
Out[21]: [3, 0, 0]

Why? Because the matrix is instantiated with same row reference, thus matrix[0], matrix[1] points to the same row!

The right approach1 is to proactively allocate each row:

matrix = [row[:] for _ in range(row)]

The row[:] creates a copy of row, so each row of the matrix is independent.

With this in mind, we can modify a list during the iteration!

In [31]: l = [1, 2, 3, 4, 5]

In [32]: for x in l[:]:
    ...:     if x % 2 == 0: l.remove(x)

In [33]: l
Out[33]: [1, 3, 5]

For demo purpose only, there are faster, more readable way to remove even elements.

Conclusion

Python gains its reputation for its versatility. It is clearly not the fastest programming language, — this rarely matters in the live coding interview, let alone the whiteboard programming. Its concise grammar, high expressiveness and built-in data structures will help you get the job(done).


  1. Please ping me you if you have better ideas to pre-allocate the matrix.