
Stochastic Dynamic Programing
Stochastic Dynamic Programing-an Analysis of the St. Petersburg Paradox
Professor Kevin Reffett
Mathematical Economics
April 29th, 2021
A casino offers a new game of chance. The concept is simple, if one flips a fair coin and it lands on heads, you will double your money. If the coin lands on tails, then the game ends and one wins whatever was put into the pot. The initial bet is 2$, thus the player wins 2 dollars if tails appears on the first toss, 4 dollars if heads only appears on the second toss and 8 dollars if heads only appears on the thirds toss. The conundrum is clear, the player will win 2k dollars with k being the number of consecutive tosses of heads. The expected value appears to be infinite, and yet what would be a fair price for the casino to enter this game.
The expected value can be found by multiplying the probability by the payout where the equation is such.
E=½*2+¼*4+⅛*8+...
E=1+1+1+...
E=∞
The sum of the expected value will grow without a bound and the expected value for an individual is an infinite amount of money. This is under the assumption that the casino has an infinite amount of money that is not a constraint.
The paradox is such. With an expected value being infinite, people should play the game at any price to enter. However, with the probability decreasing for every consecutive head, there is a decreasing chance of achieving a return on your investment (cost to enter). The paradox is the difference between the limited amount people may be willing to spend to enter the game and the infinite expected value.
Before moving into potential solutions that individuals have attempted to develop for this, let us observe some basics relating to the probability here. Since the money rewarded is doubled, it does not take long for an individual to amass a significant amount of money. After 8 consecutive heads, one would receive 256 dollars as a reward. After 16 consecutive heads, one would receive 65,536 dollars, and after 32 consecutive heads, one would receive 4.29 billion dollars. However, the probability of achieving 32 consecutive heads is equally unlikely with a probability of 1.33e-10.
It is interesting to note how rare such a probability would occur. While yes, once every 4 billion attempts, you should achieve this spin, how often do we see this rare of a probability within the real world. According to Thomas Cover, the probability of getting struck by lighting is 1 in a million, and winning a lottery ticket is 1 in 100 million.1 Based off this alone one would assume that with a probability of 1 in 4 billion, such an occurrence would be incredibly rare. Despite this, we see occurrences in probability based games much higher than this. In 2009, an event occurred in a game of craps, where an individual managed to avoid rolling a 7 (which would have ended the game) over 154 times.[1] This results in a probability of 1 in 1.56 trillion. Correlating this to our paradox, we need to roll heads 40 consecutive times to achieve a probability of 1.099 trillion. And despite these mounting pressures, each consecutive flip still has a probability of ½.
One must be very careful with the gambler's fallacy when observing probabilities. The gambler's fallacy is the belief that if a particular event occurs more frequently in the past, then it is less likely to occur in the future. It is curious that we do not see a high world record for things such as flipping coins where the world record is 8 consecutive heads made in 2003.[2] There are two possibilities here, with the most obvious possibility is that many more people play craps and with more attempts there is a certainty that people will get results with incredibly low probabilities. The other possibility is that the nature of probabilities is fundamentally misunderstood where the probability of 1/2 is not equal to a game where one probability is 12/24. More than likely, we simply do not see incredibly low probabilities events simply because of the lack of time that we dedicate to achieving them.
We know of 80 elements that have steady isotopes that are primordial-meaning their half-life is comparable to the half-life of the earth. So if humanity decided to dedicate everything to observe the decay of a super stable element, then we may only find one detection every million years or so. The fundamental basis is such: as long as the probability can be verified to be infinitesimally small, then there is always a probability of it happening. Since the expected value of this paradox is infinity, one would eventually reach an outcome that is greater than all the wealth on earth. Additionally, since we have proven that the probability cannot be approximated to 0, there is a serious concern of such an event occurring.
One of the discussed solutions that has been put forth is the expected utility theory. This theory was initially propagated by Daniel Bernoulli who is credited for the discovery of this paradox. Bernoulli states " One asks the reason for the difference between the mathematical calculation and the common value. I believe that it comes from this that the mathematicians value money in proportion to its quantity and men of good sense in proportion to the usage that they may make of it."[3] Meaning that the value of an item is not based upon the price but instead on the utility to the individual.
Bernoulli rather states that the common utility model should be based on a log utility function where U(w) = ln(w) and w is the total wealth of the individual. This is made to show the diminishing marginal utility of money. The expected utility hypothesis posits for each possible event, the log of the change in utility will be weighted by the probability of that event occurring. If c is the cost to enter the game, then the expected marginal utility of the lottery converges to the following:
This formula shows the relationship between one's wealth and the maximum amount they would pay where expected utility remains positive. Under this formula, a millionaire would be willing to pay $20.88 each game, and a person with $1000 would pay 10.95. Interestingly, a person with 2 dollars should borrow $1.35 and pay $3.35.
Cramer, in response to Bernoulli, attempts to rationalize parts of the infinite expected value with two main thoughts. Initially, he states "If the side of [tails] falls only very late, the 100th or 1000th toss...as a sensible man...[this] does not make more pleasure for me...[than] if it would be only 10 or 20 million coins."[3] He further states that the total sum beyond 224 consecutive heads would be equivalent to him of winning 224 dollars as the increasing utility from 224 to 225 is increasingly marginal. This is visually shown by the following proof.
Under this assumption, we say the expectation is reduced to 13 dollars which makes more sense than an infinite expected outcome. To reiterate, the basic proof shown here is that 100 million dollars is not 10 times more pleasurable than winning 10 million, there is a decreasing marginal utility.
The second idea that Cramer supposed is one can equate the moral value of goods to a square root function. So the moral marginal utility could be:
However, this is not equivalent as the equivalence must be such that the sadness of the loss is equal to the expectation of the pleasure in winning. Therefore, by assumption, the equivalence must be:
However, this new equivalence is significantly understated for several reasons. Initially, he does not consider the total wealth of the individual, and assumes the moral value of gain as a function of each successive gain from the coin flip. Additionally, Cramer himself states that he believes the approach must be nearer to this value than 13. [3] But the entire point Cramer attempted to make was that it can be shown to a sensible man that the expected value is finite. Whether the estimate is closer to either approximation is a wholly different concern assuming that the assumptions made can be made.
While Daniel Bernoulli is the one who discovered the paradox, his cousin Nicholas Bernoulli had his own resolution involving probability weighting. Bernoulli states that Cramer's approach is sufficient in showing that there does not need to be an infinite expected value, but the proof fails to demonstrate the true reason for the difference between the mathematical expectation and common estimate. Bernoulli follows this by conjecturing that people will neglect unlikely events as only the most unlikely events yield the high prizes that lead to an infinite expected value.
Bernoullis' logic is fairly simple, as coins are tossed and continue to land on heads, the overall probability continues to decrease. The individual is not induced to accept nor to refuse the game based on the potential stakes of the game, but they are induced solely by the degree of a probability of win versus loss. An increasingly small probability to win a greater sum does not counterbalance a great probability to lose a small sum. Under this train of thought, in order to settle the equivalence justly, one must determine where the probability diminishes to an almost null value. However, an assumption must be made in this thought as the limit of a small probability is not precise and a probability of 1/100 should not be considered null when a probability of 1/99 is considered certain.
The hypothesis that Bernoulli states is the assumption that a logical man would not give $20 because he believes that for certain that the sum that will be returned will be less than $20. If it is morally impossible that he obtains $20, then it would be logically morally impossible that he obtain 32 or higher coins. Since probabilities less than 1/32 is considered null, and probabilities greater are considered certain, the expected value is calculated to be:
1/2*1+1/4*2+1/8*4+1/16*8+0*16+0*32...=2
This theory of probability weighting has led to the modern cumulative prospect theory where decisions with risk and uncertainty are decided based on a cumulative distribution function. This theory does not fully restore the paradox as small probability events can be overweighed which can reintroduce the paradox. Specifically, according to Blavatskyy "The power coefficient of an individual's utility function must be lower than the power coefficient of an individual's probability weighting function."[4] Intuitively the utility function must not be concave but be concave relative to the probability weighting function to not reinstate the paradox. Despite this, it should be seen that this prospect theory is not omnipotent and the formulas for prospect theory are obtained in the region of less than $400 not necessarily applicable to the infinite expected value. [5]
Many have criticized the paradox for the basis of it being impossible to occur within the real world. Since there is an infinite expected value, with enough spins, the value won quickly becomes greater than the GDP of the world. Let us look at an occurrence where there is a finite amount that can be won. If the total resources of the casino is W dollars then the maximum number of times this can be played is
L = floor(log2(W))
where the expected value changes to
This solves the finite expected value quite simply, where the bankroll is $1,050,000 then the expected value is $20. Likewise, if the bankroll is the size of the US GDP then the expected value is $44. While satisfying a part of the expected value, this does not solve the paradox.
Arrow suggested that the utility function of a rational agent should be a bounded function to avoid the St. Petersburg paradox. [6] This is shown by traditional axiomatic accounts of the expected utility principle. If we can assume that the utility function is bounded, then the expected utility will be finite. The assumption under the expected utility principle is that rationally permissible preferences over lotteries are continuous. Let us state {pA,(1-p)B} as the lottery which results in A with probability p and B with 1-p. Under the continuity axiom, we can suppose that ABC where there is a probability p∊ [0,1] such that {pA,(1-p)C}~ B.
Let us observe a theoretical where A is worth $1, B is worth $2, and C represents infinite utility. The preferences would be A ≼
B ≼
C but there is no probability where {pA,(1-p)C}~
B exists. If p is nonzero, then the rational individual will favor {pA,(1-p)C} to B. If p is zero then the individual will prefer B. Since an object cannot have infinite value, then a utility function is defined by the utilities it is assigned and thus the utility function is bounded.
However, this only solves the paradox if we believe that a rational individual must accept the continuity principle. The continuity principle seems essential for defining utility as utility can only be defined in relation to the utility of other choices. If these choices cannot be compared to see which has a higher utility then it seems we would need a different method of defining utility.
Okabe elaborated on this idea by his theory of a median based resolution of the St. Petersburg paradox. The basis of his ideas is that people feel irrational to pay more than a few times as the smallest payoff despite the theoretical expected value is infinite.7 Another factor that is brought up is that the expected utility cannot derive the utility function analytically, and we need a different objection criteria to achieve a solution. This objective criteria is based upon the median of the probability distribution of the possible outcome. The median shows the "characteristic scale of central tendency" even when the expected value is infinite.7 Additionally, this theory is independent of other factors extrinsic to the paradox, such as wealth, making it a more objective decision criteria.
The following equations are introduced:
where PW and PL is the probability of winning and losing, EW and EL are the conditional expectations when the payout is more than (wn > M) and less than cost M (wn < M). The net gain of EW - M with probability PW with PW + PL = 1. E is defined as the expected payout, and the paradox still exists if a finite cost M is equated to infinite payout E. As M increases, the probability to win (PW) approaches 0 as PL≃1 while EW remains equal to ∞.
To rectify this, the first principle is introduced. This states that the probability to lose should not exceed the probability to win (PL ≤ PW) and a 1:1 ratio is preferred. For M < 1, we see that PL = 0 and PW =1; for 1 < M < 2, we see that PL = 1/2 and PW = 1/2; for 2 < M < 4 we see that PL = 3/4 and PW = 1/4. Thus, the first principle is valid for M<2 and has equal probability for 1 < M < 2. So
is the median of the payout where M is the fair cost.
The second principle that is introduced is to balance the expected loss with the expected gain. i.e.,
or,
The sum can only be found when nmax is finite and by condition the right side is positive where,
to obtain
To meet the first principle, where the probability of losing must not exceed the probability of winning, nmax must remain equal to 3. From this, we gather that M = 1.7 dollars which is the fair cost to enter the game. It should be noted, that under this criterion, we forgo probabilities all low as 1/16. [7]
The median of a single run of the St. Petersburg game is not uniquely defined as the payouts and probabilities are discrete. As multiple trials are run, we see discontinuous step heights in the probability distribution. This is seen in figure 3 from Okabe:
We see the median of running 3 trails is uniquely 6 and running 4 trials gives us a unique mean of 9. We also see that the median for 1 and 2 trials are not unique and occur at several payouts. From this we can gather that the median cost per game does weakly depend on the number of times the game is played. From a logarithmic fit of the median payout to the number of games played, we can state that the reasonable cost is $2 per game. It is also shown that we find a convergence to a stable distribution which is consistent with a limit theorem of N repeated games. The total payout of N repeated games is N2log2 N with probability.
By showing that an unbounded utility is inappropriate, we have shown that this solution gives finite results to all cases. The main contention is if infinity is an unreasonable expected value, then there can be no fair price to play. If we make this assumption, then we can say that the median is robust against slight modifications where expected value is not, and the median is a good estimator with no dependence on outside factors. The robustness is shown, in part, by the left sided skewness of the frequency versus payout chart.7 This skewness is shown by the equation 1/2 log2 N which is indicative of a skewed distribution with a long tail. Since median is not very affected by long tails, it is robust where we cannot assume a large enough number of repetitions to ensure convergence. Based on the robustness to the median indicator, and the fact it follows the expected value, the median is a better estimator. Based on this, we would reasonably assume that the player would follow the median outcome, while the 'house' requires the expected value. Okabe shows this logic by stating the player attaches more importance to probability than to the expected value. [7]
From an outside perspective, this conclusion seems misguided. Within the real world, it has been shown that lotteries changed the algorithms used in order to decrease the probability of winning and causing much larger prize pools. It would reason that many people are ignorant of the probabilities of very common events and instead rely on the hope that they will achieve the impossible. After getting a lottery ticket, many individuals dream of what they will do with the prize money. On the other hand, in cases of low probability danger, such as animal predation, many individuals do not believe it will happen to them. The conclusion that individuals pay more importance to probability than expected value is given credibility based on the fair cost that was calculated. However, the presence of several colloquial examples show the opposite. This makes it appear that this conclusion was made to match the data and is not robust enough to stand alone.
After looking at over 300 years of work on a single paradox, several theories have been developed and tested. For oneself to look at all of the solutions retrospectively, we must set out all the knowns. The closest analog to a highly chance dependent game is craps. In craps, we saw an individual not roll a 7 for 154 separate rolls. The average number of rolls is 7-8 in craps, making this an interesting comparison to the paradox. 1 For our paradox, we would likely see an average number closer to 2 spins. Where in craps you have a 32/36 chance of not getting a 7, you get a chance of 1/2 chance from our paradox. Therefore, for 7 rolls we see a 20.1% chance of not rolling a 7, and we see a 25% of not getting a tail for 2 spins in the paradox. Based on this, we can say that the craps game can be compared to the St. Petersburg paradox with relation to the probability comparison. As mentioned, we must get 40 consecutive heads in order to get a probability close to the world record craps game. Such a run would net the individual 1.0999 trillion dollars.
From an expected value perspective, these two games are wholly different. Not rolling a 7 does not guarantee one money in craps as this is a game of skill where one's bets dictate the risk and money made. For this reason, it should be known that the world record holder in craps likely made less than 1 million dollars. This paradox is made on the basis of a game that is entirely luck/probability based, and it should be treated as such. The difference between the economic rationalization of a game partly based on skill versus a game wholly based on chance cannot be compared unadulterated. Additionally, craps does not have a buy in cost, it is entirely based on bets.
One can approach a solution to this paradox from a novel approach. Since the exponent here is 2X, it has a slow start and only starts accelerating faster at a certain point. Although the point where it starts accelerating is arbitrary and cannot be proven, a reasonable person may think that getting 14 versus 15 heads is the point where the exponential starts accelerating. We see an increase of 16,384 dollars for a single coin flip. So, in order to reach a number nearing a large return on investment, one would logically need to reach 14 consecutive heads. This results in approximately 1 in 16384 games, making one need to play 16384 games to achieve this once. Such a probability is not necessarily difficult to reach and many people break this probability daily in lotteries and gambling. Now this has been stated, one should observe the profit model of this casino.
In order to offset the large costs of the greatest payouts, the casino would need to make its profit model on games that end in less than 6 consecutive heads. This number is another arbitrary value, but the ideal number can change based on the profit structure. So let us say that 64 people play this game and 1 person goes all the way until 6 consecutive heads. The break even part would again depend on where every other individual ended their game. The break even is displayed by the following equation
(n+1)*2^(n-1)
where the profitability until 6 consecutive heads 7 * 25 = 224, so there is a cost of 3.5 dollars per person to make 0 profit. If we consider this until 10 consecutive heads then we will get a pay out of 5632 over 1024 people making it a cost of 5.5 dollars. As we can see, we again see an infinitely rising cost as we consider further and further out odds. If we make the posit that the probability made in 2009 in craps was the lowest probability event that could conceivably happen within a normal period of time, then one would expect a cost of 20.5 dollars per game which matches the original posit that Bernoulli stated. This cost per game would have a small markup as the house's edge since the cost calculated is for the break even point.
With the cost per game being
the cost will be positive if n≥1. It is reasonable to say that the numerator is greater than the denominator. As n approaches infinity, this value will be ∞/∞, but the numerators' infinity will be greater than the denominators. This is an ever expanding line as the cost per game can be simplified to (n+1) / 2.
It makes little sense to cap the maximum amount that can be earned or put arbitrary limitations to hinder the paradox from occurring. At the end of the day, this is mostly a question of what would a rational being do. Should a person put all of their money into the game since it technically has an infinite expected value. Or should someone stay away from the game since 99% of the time they will lose most of their money. This game seems very much like the lottery. No one would involve significant money in an attempt to win the lottery. If the cost was $10,000 to buy in then probability wise no one would win.
There is a reason that this paradox was created 300 years ago and remains unsolved. Without putting arbitrary conditions, there is no solution as the human brain cannot comprehend a potential infinite expected value with a buy in cost of 20 dollars. Okabe has an interesting perspective by not using the expected value and using the median to solve the paradox, but even he introduces arbitrary criteria to satisfy his solution. The best we can do is to give an approximate value that this cost should be. An individual has 31622400 seconds in a year. Let us say that half of his time is dedicated to starting a new game. He can start 1.58e7 games every year. Let us say that he does this for 50 years of his life (poor health due to playing the game all day), this would result in 7.91e8 games being started for his lifetime. Let us say that half of the people currently on earth do the same, this would result in 3.12e18 games being started. This most extreme example would lead to one person, once, winning a grand total of 2 quintillion dollars after 61 consecutive heads. Even at this grand prize, the cost to enter would only be 31 dollars for the 'house' to break even.
Even though the prize can reach new limits, the cost will always be reasonable at a rational prize limit. The nature of infinite expected values causes the cost to go higher and higher as larger sums are considered. At the root of, scoring 61 consecutive heads is such a rare feat, that there are other more serious calamities with microscopic probabilities that would occur before this occurs. Unless we can solve a function with two differently sized infinities, then this problem is considered unsolvable
I will leave you with an excerpt of a book called The Fabric of Reality from my memory. The idea of a multiverse is quite simple, for every small difference between universes, for every different decision made, a new multiverse forms. By the nature of entropy, there would be an infinite amount of completely different and an infinite amount of virtually identical universes at all locations. These multiverses do not interact with each other with very few exceptions. In the world where a 'house' is offering the St. Petersburg paradox, there are always two different multiverses that are created. One multiverse where the coin lands on heads, and one multiverse that lands on tails. At every subsequent toss, another multiverse is formed with two different outcomes. There may be a multiverse where the individual playing dies of old age instead of ever getting a tails. This individual never gets their prize, they never get to spend the almost infinite amount of money they have acquired. One can only collect their winnings if they toss a tail at least once.
Works Cited
[1] Suddath, C. (2009, May 29). Holy craps! how a gambling grandma broke the record. Retrieved April 25, 2021, from https://content.time.com/time/nation/article/0,8599,1901663,00.html
[2] Most consecutive coin heads or tails. (2003, July 9). Retrieved April 25, 2021, from https://www.guinnessworldrecords.com/search/applicationrecordsearch?term=consecutive%2Bcoin%2Bflips&contentType=record
[3] de Montmort, Pierre Remond (1713). Essay d'analyse sur les jeux de hazard [Essays on the analysis of games of chance] (Reprinted in 2006) (in French) (Second ed.). Providence, Rhode Island: American Mathematical Society. ISBN 978-0-8218-3781-8. as translated by Bernoulli, Jakob. "Correspondence of Nicolas Bernoulli concerning the St. Petersburg Game"
[4] Blavatskyy, Pavlo (April 2005). "Back to the St. Petersburg Paradox?" (PDF). Management Science. 51 (4): 677-678. doi:10.1287/mnsc.1040.0352.
[5] Tversky, Amos; Kahneman (1992). "Advances in prospect theory: Cumulative representation of uncertainty". Journal of Risk and Uncertainty. 5 (4): 297-323. doi:10.1007/bf00122574. S2CID 8456150.
[6] Peterson, Martin (July 30, 2019). "The St. Petersburg Paradox (2020 ver.)". In Edward N. Zalta (ed.). Stanford Encyclopedia of Philosophy (Fall 2020 ed.).
[7] Okabe, T.; Nii, M.; Yoshimura, J. (2019). "The median-based resolution of the St. Petersburg paradox". Physics Letters A. 383 (26): 125838. Bibcode:2019PhLA..38325838O. doi:10.1016/j.physleta.2019.125838.