I have created a small python script that will generate test sets for my project.

The script generates 2 datasets with the same dimensions `n*m`

. One contains 0.1 binary values and the other contains floats.

```
# Probabilities must sum to 1
AMOUNT1 = {0.6 : get_10_20,
0.4 : get_20_30}
AMOUNT2 = {0.4 : get_10_20,
0.6 : get_20_30}
OUTCOMES = (AMOUNT1, AMOUNT2)
def pick_random(prob_dict):
'''
Given a probability dictionary, with the first argument being the probability,
Returns a random number given the probability dictionary
'''
r, s = random.random(), 0
for num in prob_dict:
s += num
if s >= r:
return prob_dict(num)()
def compute_trade_amount(action):
'''
Select with a probability, depending on the action.
'''
return pick_random(OUTCOMES(action))
ACTIONS = pd.DataFrame(np.random.randint(2, size=(n, m)))
AMOUNTS = CLIENT_ACTIONS.applymap(compute_trade_amount)
```

The script runs correctly and generates the output that I need, but if I want to scale to for many dimensions the for in loop `pick_random()`

slows down my computation time.

How can I get rid of it? Maybe with some understanding of the table using `numpy`

?

What throws my reasoning is the `if-stmt`

. Because sampling has to happen with probability.