Weighty Choices

I was tasked at work last week with coming up with a 2 million+ record dataset for some load tests we’re running on our application. I had a day’s worth of production data that I need to extrapolate to six months. Another opportunity to use my favorite python module: random

What I wanted was something like the choice function, but one to which I could pass a dictionary with the keys representing the choice list and the values representing their relative weights so that I could get a representative distribution. Since the random module doesn’t offer one itself, I was left to my own devices, which are admittedly clumsy and slow. The situation begged for a stupid lambda trick, but I was pressed for time and so I just threw this together:

def weighted_choice(ChoiceDict):
    wsum = sum([w for w in ChoiceDict.values()])
    n = random.uniform(0, wsum)
    for k in ChoiceDict:
        if n < ChoiceDict[k]: break
        n = n - ChoiceDict[k]
    return k

A little later, I passed the lambda challenge off to a couple of my colleagues. Later in the day, I had a chance to rattle off my own:

weighted_choice = lambda d: random.choice(reduce(list.__add__, [[a for a in k for n in range(d[k])] for k in d.keys()]))

But the Letterman spot goes to my colleague Leonard, for the conciseness and nice symmetry of his solution:

weighted_choice = lambda d: random.choice([k_ for s in [[k]*w for k,w in d.iteritems()] for k_ in s])

Find a script testing these and a couple other variations on the function here:


Weighty Choices

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s