Hacking a ProgComp with Side Channels

As of this writing, I have the best score on this Game of Codes problem by a factor of 10. I cheated to get it. Here’s how.

Game of Codes is a programming competition that follows the typical progcomp format. There are 10 problems, each with two or three test cases. When you submit your code, it is fed inputs via stdin, and writes solutions to stdout. Each correct answer earns you a point; the leaderboard ranks the submitted programs firstly by points, and secondly by the lowest elapsed time for your program to complete the test cases.

That last bit is important, because it means that the programmer must not be allowed to see the test cases. If they could, it would be trivial to hard-code the correct answer for each test input. This isn’t a problem in progcomps like Advent of Code, because they don’t score you based on execution time, only submission time. But when speed matters, gaining knowledge of the test inputs allows you to write the theoretically-optimal program for a given problem: read input, consult answer table, write output. Two syscalls and a few microseconds of array lookups.

Now, Game of Codes tries very hard to hide the test inputs. They don’t allow you to see the output of your program; only whether it succeeded or failed. This prevents you from simply printing the test inputs to stdout. Furthermore, your program runs in a restricted environment: the filesystem is read-only, and network calls are blocked. So you can’t sneak the inputs out via HTTP request or other network-based means.

But one hole is left unplugged: the leaderboard lists the execution time of each submission. And that’s a side channel. Specifically, we can use a timing attack to recover information about the test inputs, using only the recorded execution time of the program. This is the attack I used to achieve a (near-) optimal time on the “Invasion” problem linked above. But first, to understand the method, let’s look at a simpler problem.

Islands

Here’s the problem description for “Islands,” a program that is easy prey for the timing attack. When conducting the attack, we want to minimize the amount of information that we have to transmit. Notice that although the example input is fairly verbose, the example output is a small set of small numbers. So in this case we will leak the output rather than the input. The distinction doesn’t matter much; either way, we’ll be able to construct our optimal solution.

First, we need to solve the problem, since the Game of Codes server only tells you the execution time of successful tests. Once we have a working program, we can establish baseline timings for each test case. My program ran each test case runs in about 30ms. We then introduce our leak code by adding a sleep call. We’ll sleep for n seconds, where n is the value we want to leak, starting with the first output:

func main() {
    // ... solution code goes here ...
    
    time.Sleep(time.Second * outputs[0])
}

When we push this code to the Game of Codes server, we get a response that looks like this (actual values changed):

Compiling...
> go build -o run
Compilation success in 267.644988ms
Running examples...
Example 1 success in 2.032274213s
Running tests...
Test 1 success in 4.045206722s
Test 2 success in 7.037433879s
1 examples succeeded.
2 tests succeeded in 14.082640601s.

Boom, we just leaked our first outputs! Test 1 ran in 4 seconds, so its first output is 4; likewise, the first output of Test 2 is 7. The example ran in 2 seconds, so the first output should be a 2 – and we can confirm from the problem description that it is.

Now it’s just a matter of repeating this process for output[1], output[2], and so on. We’ll know when we’ve found all of them when the tests start failing – that means the program crashed due to accessing an index of outputs that doesn’t exist. We can also check our progress as we go by overwriting the contents of outputs with our leaked values before writing them to stdout. If our leaked values are incorrect, overwriting the actual values will cause the test to fail.

Once we know all the outputs, we need to tackle another problem: matching each input with its output. In other words, we need a way for our program to identify which input it’s processing. Unsurprisingly, we can leak this information the same way as before. In the case of “Islands,” the first number of the input is sufficient to distinguish between the test cases. But since the input numbers are large, we can’t multiply them by a full second like we did for the outputs – the test would take too long to run. Instead, we’ll sleep for inputs[0] * time.Millisecond. But when we do this, we run into a new complication: variance. If the execution time varies by more than 1ms compared to our baseline, then our leaked value will be wrong. It could take quite a few tries to recover the correct input.

Fortunately, we can improve upon this by using a two-stage approach. First, we’ll use a 1ms multiplier to get an “initial guess” for the input value. Then we’ll run the program again, but this time subtract our guess from the input, and multiply the remainder by 1s. For example, let’s say our baseline execution time is 26ms. On our first run, we multiply by 1ms, and the execution time is 854ms. That gives us an initial guess of 828. On the second run, we subtract 828 from input and multiply by 1s, resulting in an execution time of 4.024s. This tells us (with high confidence) that the actual value is 828 + 4 = 832.

Once we’ve matched each test input to its outputs, we can write the “optimal program” (actual values changed):

func main() {
    // read 4 bytes to determine which test we're running
    fst := make([]byte, 4)
    os.Stdin.Read(fst[:])

    switch string(fst) {
    case "3000": // Example
        os.Stdout.WriteString("2\n1\n0\n4\n3\n5\n")
    case "1234": // Test 1
        os.Stdout.WriteString("1\n2\n3\n4\n5\n6\n7\n")
    case "5678 ": // Test2
        os.Stdout.WriteString("1\n2\n3\n4\n5\n6\n7\n")
    }
}

This performs quite well – but surprisingly, it isn’t fast enough to top the scoreboard. I’m not sure why this is, but somehow the top program runs 5ms faster than this code. Fortunately, the same is not true of…

Invasion

This problem presents some new challenges, but I knew that I could get the top leaderboard position if I solved it. When I started, the fastest program ran in about 500ms. The nice thing about these optimal programs is that their speed is independent of the computational complexity of the actual problem; the code is always just a lookup table. So I expected that I could submit a solution that ran approximately as fast as my Islands solution: 30ms.

The problem description indicates that this problem boils down to computing a bunch of hashes and subsequently matching messages to hashes in order to reconstruct the original “message chain.” I wrote a straightforward (unoptimized) solution, which ran in about 25ms for the first two test inputs, and 1.5s for the third. Then I set about recovering the outputs.

The expected output for this problem is a bunch of concatenated message strings. Recovering all that string data would be tedious, but fortunately there’s an easy way to compress it. Since each message comes from an element in the input array, we can simply leak the index of each message, rather than the messages themselves. To check whether this was reasonable, the first thing I leaked was the number of messages in each test output. The first test had 4 messages; the second had 19; and the third had 501. I could recover the first two manually, but no way was I going to leak 501 values by hand. To compound the problem, the values I intended to leak were large, which meant that there’s no way I could multiply them by 1s; I would need to use the guess-and-check strategy mentioned earlier for every value. This called for some automation.

I wrote a small Go program that repeatedly generated new code, pushed it to the Game of Codes server, parsed the response, and used the timing information to generate the next iteration. The generated code included a check so that it would fail if it guessed the wrong value; the observer could then notice this failure, revert the last commit, and attempt to guess the value again. This meant I could mostly let the program run unsupervised.

But when I fired up the program, it quickly became apparent that the frequency of these “retries” was unacceptably high. My baseline execution time of 1.5s had significant variance, causing lots of incorrect guesses even when using the two-phase guess-and-check approach. Tests also took a long time to run – sometimes as long as 50 seconds – meaning that a wrong guess could cause a delay of over 2 minutes. I needed to refine my approach.

After a lot of head-scratching, I had an epiphany that in retrospect seemed embarrassingly obvious. By padding the baseline execution time, I could reduce the variance, allowing me to decrease the sleep multiplier. In code:

func main() {
    start := time.Now()
    
    // ... solution code goes here ...
    
    time.Sleep(5 * time.Second - time.Since(start))
    time.Sleep(sleepMultiplier * leakedValue)
}

I measured, and indeed, this approach clamped the variance to a few milliseconds. As a result, I was able to drop the sleep multiplier to 10ms and do away with guess-and-check; most of the time, the first guess was accurate enough that a second pass wasn’t needed. I gleefully deployed this new code and watched it chug away. This was a slow process, since the Game of Codes server was rate-limited to enforce a 10s delay between each push. But at 15s per value, it only took two hours to calculate all 501.

Once all the values were calculated, I constructed a near-optimal solution for the problem (actual values changed):

var Example = []int{2, 1, 3}
var Test1 = []int{1, 2, 3, 4}
var Test2 = []int{1, 2, 3, ...} // 16 values omitted
var Test3 = []int{1, 2, 3, ...} // 498 values omitted

func main() {
    var js [][2]string
    json.NewDecoder(os.Stdin).Decode(&js)

    var indices []int
    switch len(js) {
    case 6:
        indices = Example
    case 1:
        indices = Test1
    case 2:
        indices = Test2
    case 3:
        indices = Test3
    }
    messages := make([]string, len(indices))
    for i := range indices {
        messages[i] = js[indices[i]][1]
    }
    os.Stdout.WriteString(strings.Join(messages, " "))
}

Fingers crossed, push it to the server…

> go build -o run
Compilation success in 210.389187ms
Running examples...
Example 1 success in 15.855545ms
Running tests...
Test 1 success in 14.883978ms
Test 2 success in 15.048268ms
Test 3 success in 16.664773ms
1 examples succeeded.
3 tests succeeded in 46.597019ms.

SUCCESS! We lost some time here due to slow JSON parsing, but this solution is still 10x faster than anyone else’s!

Conclusion

Pulling off this hack was pretty fun. People seem to be split on whether it counts as cheating or not. It’s true that I only used publically-available information – I didn’t exploit a buffer overflow in the server code or anything like that – but this is obviously not how the problems were intended to be solved. I feel a bit guilty about stealing the top leaderboard position from someone who spent a lot of time optimizing their code. But hey, I invested a fair amount of time in my approach too, so… it’s a wash. For what it’s worth, I don’t plan to execute this attack on any other Game of Codes problems, and I discourage anyone else from doing so. I hope the Game of Codes team found my antics amusing (rather than threatening), and I hope this post piqued your interest in timing attacks and other side channels.


Update: The Game of Codes team removed my submission from the leaderboard, but it is still viewable here. They also posted an announcement clarifying that hard-coded solutions will not be eligible to win prizes. I support their decision.