Python: Probably the Best Language for Interview
python interviewI have watched candidates coding in the whiteboard with C++, Java, JavaScript, ES6 JavaScript, Python, Ruby, — looking forward to Rust, Clojure, Erlang etc, programers! I think Python is probably the best programming language for technical interview, especially whiteboard programming.
High expressiveness
Python use indention to express control blocks, which has zero cost in the whiteboard programming. No other languages can beat doing nothing!
Joke aside, Python absorbs many traits from functional languages for the better expressiveness, more notably:
- List comprehension
- Higher-order functions
- Generator and iterator
List Comprehension
I think the list comprehension might be the most intuitive and powerful way to manipulate the iterable data.
We can flatten the nested list with even-number predict in one-liner:
In [6]: nested_list = [[1, 2], (3, 4, 5)]
In [7]: [x for sublist in nested_list for x in sublist if x % 2 == 0]
Out[7]: [2, 4]
We can explore the neighbors of position (i, j)
with predicts for
breadth-first search in the 2D grid, such as 695. Max Area of Island.
candidates = [
(x, y) for x, y in explore(i, j, rows, cols)
if grid[x][y] and (x, y) not in visited
]
Higher-order functions
I prefer the list comprehension over map
and filter
composition for its
readability. The filter
shines when constructing the sequence from scratch.
For example, the following explore
method returns adjacent positions within
bounds:
def explore(i, j, rows, cols):
return filter(
lambda d: 0 <= d[0] < rows and 0 <= d[1] < cols,
[(i - 1, j), (i, j + 1), (i, j - 1), (i + 1, j)]
)
You can also invoke sort
, sorted
, min
, max
method with extra key
function to override the default ordering logic. For example, we can sort the
DNA sequence based on the occurrences:
>>> counter = Counter('acacctatgccagggttgagggtgcgacggga')
>>> sorted(counter, key=lambda k: counter[k], reverse=True)
['g', 'a', 'c', 't']
The module operator
also provides alternative methods for the operators, such
as operator.add(x, y)
is equivalent to lambda x, y: x + y
. It is merely a
matter of personal taste.
Generator and Iterator
It is quite common to harvest the results from the recursion, such as tree traversal. Alternatively, we can use generator to emit results, then collect results externally. It is more readable, and consumes less memory in general.
def pre_order_traverse(root: TreeNode):
if root is None: return
yield root.val
yield from root.left
yield from root.right
# Materialize the generator with list
list(pre_order_traverse(root))
On the other hand, we might want to consume the iterable data progressively. For example, in the n-way merge, the next element in the sequence is pushed to the heap only when the preceeding element is popped. Instead of tracking the sequence’s cursor, we use the iterator to simplify the book keeping.
from heapq import heapify, heappush, heappop
def merge(seqs: List[List[int]]):
heap = [iter(s) for s in seqs]
heap = [(next(it), it) for it in heap]
heapify(heap)
while heap:
val, it = heappop(heap)
yield val
try:
heappush(heap, (next(it), it))
except StopIteration:
pass
Batteries included
Python is reputable for the batteries include philosophy. Out of the box, Python programmers are equipped with:
set
, akaHashSet
in Javadict
, akaHashMap
in Javalist
, a generic vector or array with random accessheapq
, a min-heap, akaPriorityQueue
collections.Counter
, similar ideas asMultiSet
collections.deque
, a double-ended queue to support efficient appends and pops from both sides.collections.defaultdict
, can be used forMultiHashMap
in java.collections.OrderedDict
Let’s see how they make our lives easier, :-).
Tuple
The built-in data structure tuple
is an essential build block despite it is
often underestimated.
We can store arbitrary data in a tuple
instead of declaring a class, — quick
and dirty but suitable for the interview context. Here is a more sophisticated
example to use tuple for double-linked list.
Recall the n-way merge example: the iterator is stored with the value, then pushed back to min-heap, and the heap-sort still works! This is due to that the tuple is compared from left to right:
In [47]: (1, 2) < (1, 3) < (2, 1)
Out[47]: True
If we encapsulate the data into a class, such as:
class Point:
def __init__(self, x, y):
self.x = x
self.y = y
The comparison no longer works since how to compare Point
instance is
non-determined yet. Hint: implement __lt__
method.
In [49]: Point(1, 3) < Point(2, 1)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-49-40188351117f> in <module>
----> 1 Point(1, 3) < Point(2, 1)
TypeError: '<' not supported between instances of 'Point' and 'Point'
If you holds a strong opinion about type, the collections.namedtuple
might be
a good middle-ground:
In [51]: Point = namedtuple('Point', ['x', 'y'])
In [52]: Point(1, 2) < Point(1, 3) < Point(2, 1)
Out[52]: True
Personally, I would still prefer the tuple
approach just to avoid the
explanation of magic of namedtuple
.
The tuple
instance is immutable, so it can be used for the dict
key. For the
2D dynamic programming problem, instead of building the 2d matrix tediously (see
below), we can use dict
, such as dp[(i, j)]
instead.
Trie
We can build the Trie with defaultdict
:
from collections import defaultdict
from pprint import pprint
def default_factory():
return defaultdict(default_factory)
trie = defaultdict(default_factory)
for word in ['hello', 'halt', 'hat']:
node = trie
for c in word:
node = node[c]
node['$'] = None
pprint(trie)
Each node of the Trie
is defaultdict
thanks to the default_factory
. The
dump shows the memory layouts as:
defaultdict(<function default_factory at 0x7fcacd51d6a8>,
{'h': defaultdict(<function default_factory at 0x7fcacd51d6a8>,
{'a': defaultdict(<function default_factory at 0x7fcacd51d6a8>,
{'l': defaultdict(<function default_factory at 0x7fcacd51d6a8>,
{'t': defaultdict(<function default_factory at 0x7fcacd51d6a8>,
{'$': None})}),
't': defaultdict(<function default_factory at 0x7fcacd51d6a8>,
{'$': None})}),
'e': defaultdict(<function default_factory at 0x7fcacd51d6a8>,
{'l': defaultdict(<function default_factory at 0x7fcacd51d6a8>,
{'l': defaultdict(<function default_factory at 0x7fcacd51d6a8>,
{'o': defaultdict(<function default_factory at 0x7fcacd51d6a8>,
{'$': None})})})})})})
LRU Cache
We can implement the LRU cache with OrderedDict
, see the official
document.
Parsing wth stack
The data structure, stack
is frequently used for expression parsing. The
list
can be modeled as stack with append
and pop
methods. As python is
dynamic-typed, it is quite handy to store heterogenous operands and operators in
the stack.
Syntax sugar
In this section, I will share some tips which may save a minute or two in the interview.
Compare every adjacent elements in the list. The magic is zip
can aggregate
two sequences and stops when the shortest is exhausted.
[x > y for x, y in zip(nums, nums[1:])]
Build the lookup table for the element to its index.
lut = dict((x, index) for index, x in enumerate(nums))
Pre-allocate the list with desired capacity.
nums = [0] * size
It is worthy noting that python uses pass-by-reference, so the following snippet does not work as expected:
In [2]: rows, cols = 2, 3
...: row = [0] * cols
...: matrix = [row] * rows
...: matrix[0][0] = 3
In [3]: print(matrix)
[[3, 0, 0], [3, 0, 0]]
In [21]: row
Out[21]: [3, 0, 0]
Why? Because the matrix
is instantiated with same row
reference, thus
matrix[0]
, matrix[1]
points to the same row
!
The right approach1 is to proactively allocate each row:
matrix = [row[:] for _ in range(row)]
The row[:]
creates a copy of row, so each row of the matrix is independent.
With this in mind, we can modify a list during the iteration!
In [31]: l = [1, 2, 3, 4, 5]
In [32]: for x in l[:]:
...: if x % 2 == 0: l.remove(x)
In [33]: l
Out[33]: [1, 3, 5]
For demo purpose only, there are faster, more readable way to remove even elements.
Conclusion
Python gains its reputation for its versatility. It is clearly not the fastest programming language, — this rarely matters in the live coding interview, let alone the whiteboard programming. Its concise grammar, high expressiveness and built-in data structures will help you get the job(done).
Footnotes
-
Please ping me you if you have better ideas to pre-allocate the matrix. ↩