Show HN: Betting game puzzle (Hamming neighbor sum in linear time)
In Spain, there's a betting game called La Quiniela: https://es.wikipedia.org/wiki/La_Quiniela_(Espa%C3%B1a)
Players predict the outcome of 14 football matches (home win, draw, away win). You win money if you get at least 10 correct, and the prize amount depends on the number of winners. Since all bets are public, the number of winners and the corresponding payouts can be estimated for each of the 3^14 possible outcomes. We can also estimate their probabilities using bookmaker odds, allowing us to compute the expected value for each prediction.
As a side project, I wanted to analyze this, but ran into a computational bottleneck: to evaluate a prediction, I had to sum the values of all its Hamming neighbors up to distance 4. That’s nearly 20,000 neighbors per prediction (1+28+364+2912+16016=19321):
S_naive = sum from k=0 to r of [(d! / ((d-k)! * k!)) * (q-1)^k] (d=14, q=3, r=4)
This took days to run in my first implementation. Optimizing and doing it with matrices brought it down to 20 minutes—still too slow (im running it in GAS with 6 minutes limit). For a while, I used a heuristic: start from a random prediction, check its 28 nearest neighbors, move to the highest-value one, and repeat until no improvement is possible within distance 3. It worked surprisingly well.
But I kept thinking about how to solve the problem properly. Eventually, I realized that partial sums could be accumulated efficiently by exploiting overlaps: if two predictions A and B share neighbors, their shared neighbors can be computed once and reused. This is achieved through a basic transformation that I implemented using reshape, roll, and flatten (it is probably not the most efficient implementation but it is the clearest), which realigns the matrix by applying an offset in dimension i. This transformation has two key properties that enable reducing the number of summations from 19,321 to just 101:
- T(T(space, d1), d2) = T(T(space, d2), d1)
- T(space1, d) + T(space2, d) = T(space1+space2, d)
Number of sums would be the result of this expression:
S_PSA = 1 + (d - (r-1)/2) * r * (q-1)
I've generalized the algorithm for any number of dimensions, elements per dimension, and summation radius. The implementation is in pure NumPy. I have uploaded the code to colab, github and an explanation in my blog.
Apparently, this falls under Hamming neighbor summation, but I haven't found similar approaches elsewhere (maybe I'm searching poorly). If you know or you've worked on something similar, I'd love to hear your thoughts!
colab: https://colab.research.google.com/drive/1aENKd7eemGqmjdB8Y6y...
github: https://github.com/petopello/PSA
blog: https://sudapollismo.substack.com/p/partial-sum-accumulation...
Sir, this is a Wendy's ;P
It might be worth posting this on https://math.stackexchange.com/
Excuse me sir, to me all topics here feel like a wendy's so I thought this was the right place. Thanks for the suggestion, I’ll post it there too!
It is definitely the right place!
> Optimizing and doing it with matrices brought it down to 20 minutes
What was the final run time?
11 seconds in colab implementing the basic transformation with reshape, roll, and flatten
4 seconds in google apps script by handling indices in loops instead of rearranging elements
Are you computing a single expectation value, or are you searching for the bet of maximum expected value among all possible bets ?
How long does it take you to compute a single expectation value ?
I have written some single threaded non-simd c++ code for this : https://gist.github.com/unrealwill/d1bc68d1f5c7ee6fe72b76dc5...
When there are 14 matches, there are 3^14 = 4782969 different possible results and the same number of different possible bets.
Summing naively a single expectation takes 0.1s Summing on the hamming neighborhood to compute a single expectation takes 7.6e-5s,
Computing all expectation values and computing the maximum takes 153s (with g++9 but 390s with g++-10 (I don't know why)).
I am not sure whether or not I used the same trick as you. Here are the tricks I use : when probabilities sum to one, the number of iteration to compute a single expectation is not dependent of the number of possible results for a match, because the sum can be factored by grouping all negative results for a single event together by applying the weight as 1-proba.
Some tricks used are representing the result for the 14 matches as a single int32_t and working in base 3. It can probably be even faster, if you work in base 4 instead and replace integer division by bit shifts.
Iterating and in particular iterating combination is excruciatingly slow in python even when using itertools. So do yourself a favor, and write c++ code for these kind of things so that you don't have to have memory allocations inside loops.
My algorithm reduces the number of summations by computing all expected values at once, so I can't compute just a single value individually. For a single value, I think the naive approach is the only option. In my first naive implementation, computing 5,000–10,000 values took around 5 minutes, and I didn’t improve it much beyond that.
Once all values are computed, one approach is to search for the highest expected value, but that’s not the only criterion to manage: (1) there's a huge variance issue (you’ll win a lot, but with very low probability), (2) if you're placing multiple bets overlap reduces their combined value.
Great work on your approach! I’ll try to understand the code you linked, but I suspect it’s not doing exactly the same thing (or only for a very specific case). In my code, there's a function called SHN_boxes (which takes ~1s in Colab for this problem), and it’s a shortcut applicable only in some cases (not in this one). Did you use a similar approach?