My "Quora ducts" solution

review of some old code

Tuesday, May 26, 2020 permalink
tags: programming - backtracking - hiring

Problem statement
Backtracking algorithm
What I would do differently today

In 2012, I was an electrical engineering grad student looking to change tracks and get a job as a software developer. I found a coding challenge posted by the website Quora.

The full problem is below, but to summarize, the challenge is to find the number of possible ways to lay a duct that went through every room in a grid in a timely fashion. They stated that their best solution solved the problem in under 5 seconds, but that an acceptable solution would run within 1-2 orders of magnitude. My initial brute force implementation ran in around half an hour, and my final result ran in under 2 seconds.

I submitted my solution in February and heard back in May, and by that time I had already accepted an offer elsewhere. There’s a profound commentary somewhere in there, but anyway. The purpose of this post isn’t to “name and shame” but it’s worth mentioning that knocking the socks off a code challenge wasn’t enough to get an interview (well, in a timely matter).

The quick summary is that it’s a backtracking problem, but you can prune the search tree to avoid paths that won’t find solutions and save time. I also had modest gains with other optimizations.

I first worked on this when I was in grad school and C was the main language I knew; this may be one of the last things I worked on in C besides my research project. I thought it would be fun to take a look at it with a new perspective. What did I do wrong, what did I do right, etc… Did I do something smart that I wouldn’t think of today? Did I write horrible spaghetti or follow anti-patterns?

You can see for yourself or just read below.

Problem statement

back to table of contents

I didn’t save the problem statement but found one that looks right at businessinsider.com. Frankly it’s doubtful to me that any true business insider would find this interesting, but I digress:

We have some rooms in our datacenter, and we need to connect them all with a single cooling duct.

Here are the rules:

The datacenter is represented by a 2D grid.

Rooms we own are represented by a 0.

Rooms we do not own are represented by a 1.

The duct has to start at the air intake valve, which is represented by a 2.

The duct has to end at the air conditioner, which is represented by a 3.

The duct cannot go in multiple directions out of the intake or the AC - they must be the two endpoints of the duct.

The duct must pass through each room exactly once.

The duct cannot pass through rooms we do not own.

The duct can connect between rooms horizontally or vertically but not diagonally.

Here is an example datacenter:

There are two possible ways to run the duct here:

2--0--0--0
         |
0--0--0--0
|
0--0--3  1

or

2  0--0--0
|  |     |
0  0  0--0
|  |  |
0--0  3  1

Write a program to compute the number of possible ways to run the duct. For the above example, the correct answer is 2.

Input format: Your program should read from stdin. The input will be a series of integers separated by whitespace.

The first two integers will be W and H, with width and height of the datacenter. These will be followed by W*H more integers, specifying the 2D grid representing the datacenter.

Output format: Your program should write a single integer out to stdout: the number of possible ways to run the duct.

See how fast you can make it.

Our best solution (written in C) can solve the following test case in under 5 seconds on a 2.4GHz Pentium 4, but it’s pretty good if you can get to within 1-2 orders of magnitude of that.

7 8
2 0 0 0 0 0 0
0 0 0 0 0 0 0
0 0 0 0 0 0 0
0 0 0 0 0 0 0
0 0 0 0 0 0 0
0 0 0 0 0 0 0
0 0 0 0 0 0 0
3 0 0 0 0 1 1

Backtracking algorithm

back to table of contents

Before I get to code review, I want to discuss algorithm.

Getting a solution to this problem is fairly simple if you’re familiar with recursive backtracking problems. Basically in Python/pseudocode:

def path_finder(grid, coordinate, num_visited, final_path_length):
    if grid[coordinate] == FINISH:
        return num_visited == final_path_length
    if grid[coordinate] != NOT_VISITED:
        return 0
    count = 0
    grid[coordinate] = VISITED
    for direction in DIRECTIONS:
        count += path_finder(
            grid,
            coordinate+direction,
            count+1,
            target_count
        ))
    grid[coordinate] = NOT_VISITED
    return count

When you visit a room, if it’s the end room, you’ve successfully found a solution if the length of your duct is equal to the total number of initially empty rooms, which is represented above as final_path_length.

Optimizations

back to table of contents

Backtracking algorithms like this operate in exponential time; basically, every recursive call performs up to four additional recursive calls, and each solution is n calls deep, where n is the number of squares initially open on the grid.

I used two basic techniques to optimize this solution:

pruning the search tree to try to eliminate paths that were not going to be successful
cheap tricks to steal clock cycles out of the critical path. Because the inner function will be called a lot, little tricks can really add up.

One thing I didn’t bother with at the time was parallelism, but as with all backtracking algorithms, you can parallelize searching multiple paths at once.

Representing the data

back to table of contents

One of the most basic things I had to choose was how to store and represent the grid internally; I used a char[][], a 2D array stored on the stack.

Each cell could be represented by the following characters:

'X' - occupied room, connected to current path
'S' - starting point (represented by '2' in the input)
'E' - ending point (represented by '3' in the input)
' ' - empty room (represented by '0' in the input)

While the above values are enough to solve the problem, I also ended up adding

'U' - room I don’t own, unknown if connected to current path

For readability, when printing the state of a grid below, I’ll also use

'C' - the current room in question

The above roughly translates to the values from the input; I couldn’t remember what '0', '1', '2', and '3' meant so I chose chars that meant something to me. If I were to redo this today, I’d probably use an enum that mapped to the input values, but to be fair to my old self, enums weren’t in K&R.

I also made a choice to create a border around the grid like so:

X X X X X X
X S _ _ _ X
X _ _ _ _ X
X _ _ E X X
X X X X X X

The main reason I did this was to avoid having to do bounds checks, but it came in handy when pruning the search tree…

Pruning the search tree

back to table of contents

In order to code the solution correctly, there are three final cases to consider:

The final square has been reached, and the duct is the correct length (all squares have been visited).
The final square has been reached but the duct is shorter than it should be (some rooms were not reached).
All surrounding squares have been visited or are invalid and the current square is not the final one.

In the first scenario, we have found a solution and increment the counter. In the other cases, we did not find a solution. In all cases, we backtrack to look for other solutions.

After thinking about the problem a bit, I recognized that there were certain patterns that, if they existed, would mean that the current path is not valid, and thus stop continuing down bad paths earlier.

The simplest way to think about it is that if the current traversal splits the search space in two non-contiguous groups, then there is no valid solution and we can backtrack.

Here’s an example:

X X X X X X
X S X _ _ X
X _ X X C X
X _ _ _ _ X
X _ _ _ E X
X X X X X X

In the above example, we can either go up or down, but either choice would mean the area we don’t choose would no longer be reachable.

Contiguity optimization

back to table of contents

I mentioned the 'U' value earlier but didn’t really describe what it does. 'U' is never needed if the start, end, and “non-owned rooms” are not on the border, but it’s possible that the input may have looked like:

This would may to an internal representation of

U U U U U U U
U _ _ _ _ _ U
U _ _ _ U _ U
U _ S _ U _ U
U _ E _ U _ U
U _ _ _ _ _ U
U U U U U U U

An assumption I made above is there’s no difference between a node we’ve visited and border/non-owned node, but that’s not quite true because it’s possible our visited nodes don’t touch those nodes, and that’s important for some of the pruning optimizations discussed later. If the border/non-owned nodes are contiguous with the start, it’s as if they’re just visited nodes, but if not, they’re given this 'U' label to ensure we handle this properly.

Then, if while traversing the grid we find a 'U', I implemented a u_to_x() function to convert all touching 'U' squares to 'X' via DFS, and an x_to_u() to undo that transformation when backtracking.

So if I moved right from the start, the representation would be:

U U U U U U U
U _ _ _ _ _ U
U _ _ _ X _ U
U _ S C X _ U
U _ E _ X _ U
U _ _ _ _ _ U
U U U U U U U

If instead I moved left, it would be:

X X X X X X X
X _ _ _ _ _ X
X _ _ _ U _ X
X C S _ U _ X
X _ E _ U _ X
X _ _ _ _ _ X
X X X X X X X

I wasn’t sure if I could assume that the start, end, and unowned rooms all touched the border, so I added a CONT_OPT (or contiguity optimization) macro to turn on this feature; I wasn’t sure if the solution was correct without it due to unclear specification, but I didn’t want to include this code unnecessarily in the final binary if it was not.

Sliding windows

back to table of contents

I found two patterns that are basically sliding windows that invalidate the current path. Both of these basically mean that I’ve split the remaining, unvisited rooms in two, non-connected groups and we can bail out now.

I’ll paraphrase the descriptions I wrote in comments in the source code:

key:
X is an occupied, ineligible space
_ is a vacant space
C is the current space
| is Do Not Care.

/* OPTIMIZE1
This optimization checks to see if the last move created two non-contiguous
regions.  The following scenarios should lead to losses and should return
immediately (further branches are not necessary):

|X|  |_|
_C_  XCX
|X|  |_|
*/

/* OPTIMIZE2
This optimization checks to see if the last move created two non-contiguous
regions.  This checks if the boundary created is diagonal.  The following
scenarios should lead to losses and should return immediately (further
branches are not necessary):

_X  X_  _C  C_
C_  _C  X_  _X

*/

Hopefully I’ve made it clear why the above optimizations work, but there’s another important detail: actually performing these checks are expensive. For the first optimization we’re doing up to four reads and comparisons for each visited cell, and in the second, we’re doing up to three.

With a backtracking algorithm, you generally want to do as little as possible in the inner loop, as we may run that loop billions of times. However, if doing more work reduces the number of paths you have to try by billions, it may be worth it.

I added macros to control whether or not to use these optimizations were used and tested with and without. It turned out these patterns pulled their weight, while others I tried did not.

Tricks are for kids

back to table of contents

There were other little things I tried to boost performance. Little things like checking the value before the recursive DFS call to see if the space was already visited, rather than doing it immediately after performing the DFS call, turned out to have a noticeable performance impact, because functions calls are not free. They may be cheap, but again, in the inner loop of a backtracking algorithm it matters.

I tweaked the order that I called the optimizations which seemed to make a difference as well.

I’m not going to recreate these experiments today, but I remember trying really hard to squeeze all the performance I could out of it. Basically the loop is try something, measure, repeat ad nauseam.

What I would do differently today

back to table of contents

Ok, a silly answer is I wouldn’t use tabs. It’s surprising to me that I used to use tabs instead of spaces, but I guess people change. It’s funny to me that I don’t even remember when I made the switch; maybe when I figured out how to configure that in vim?

I used a lot of C preprocessor macros. The fastest code is code that isn’t there, and I probably did this with optimization in mind, but in retrospect it just seems kind of nuts and makes the code hard to read.

I implemented (well… copy/pasted) a timer in C code; I almost certainly would just use the shell builtin time instead today. I was in the habit of using timers in grad school since I was usually only interested in the performance of particular snippets and would have ignored I/O and whatnot, but in this case I/O is negligible.

I didn’t use source control at the time; in fact, I was only able to find my solution by searching my old e-mails for the tarball I submitted. Using git with good commits would make it easier for me to understand exactly what I tried if I were to look back at my old code decades later.

I almost certainly would not use C if I were given this task today. I would probably use C++ if I had to do something like this that was performance critical for work, or Rust if I were doing this as a side project at home (I’m very much a n00b at Rust but like what I’ve seen so far!).

I mentioned earlier that this was a code assignment whose submission was ignored for months; this has actually happened to me a couple of times since then. In 2012 though I as looking to break into the industry and was desperate for a real lead. It’s not that I would never do a code challenge again, but I am more aware today of the risks than I was then.

I actually did parallelize my solution with pthreads at the time, but didn’t get good performance. I don’t have access to the code or remember what I did; while copying the grid is potentially expensive, traversing different search trees should otherwise be embarassingly parallel.