It is tough to predict an election. Both camps taunt the other, and claim their likelihood of victory to be high, but in actuality, giving a precise estimate of both the final margin of an election and the win probability of an election is very difficult.
In both the 2016 and 2020 U.S. Presidential elections, polling was at least somewhat inaccurate. In most cases, and certainly on a national level, the results were well within the margin of error. But this was not always the case, whether in the Rust Belt in 2016 or Florida in 2020, where polling results fell outside of the margin of error by a wide margin (even if, in some cases, this did not impact the actual state winner).
Models which aggregated polls, meanwhile, have had mixed results. The Huffington Post's model famously gave Hillary Clinton a 98% chance of winning the Presidential election in 2016, while the stats blog FiveThirtyEight gave her a less confident 71% chance of victory.
Prediction markets though, are a very interesting case, as in theory their implied predictions should to correlate to concrete factors which give data about the election, yet they seem to have outperformed the polls and even many models in both [2016] and 2020. But how does this work? As one article puts it: "The idea is straightforward: trade contingent claims in a market where the claims pay off as a function of something one is interested in forecasting. If structured correctly, the prices should reflect the expected payoffs to the claims, and therefore the expected outcome of the event of interest." In other words, if a betting market is structured well, its pricing will reflect the probability of some event occuring. Similarly to how wisdom of the crowd can effectively determine the number of jelly beans in a jar, the thinking goes, perhaps wisdom of the crowd can also determine the true probability of some event in politics.
In this tutorial, we will examine the accuracy and pricing of one specific political betting website, PredictIt, in their markets for the victors in the thirty-five 2020 U.S. Senate races. In these markets, we will examine PredictIt's overall accuracy, explore how PredictIt's pricing and accuracy relate to other factors (such as time and trade volume), and compare PredictIt's accuracy to a statistical model's accuracy (FiveThirtyEight). If you'd like to delve into this topic with prediction markets more generally, though, I highly recommend reading this article.
In many sports-betting markets, prices are set by "Vegas" - centralized sportsbooks who offer odds at a certain price based on, ultimately, what they believe will enable them to make the highest profit.
PredictIt, though, works much more similarly to the stock market than to conventional sports-betting markets - users make buy/sell offers at a price they want to buy/sell a contract at, and then are matched by other users who want to perform the opposite action (sell/buy) at the same price (we will call this (1)); or, in the case of a buy, by users who want to purchase the complimentary contract at a complimentary price (we will call this (2)). We will explain (1) and (2) in more detail.
Contracts are for a given event E, are priced between 1 cent and 99 cents (inclusive), and are binary. If you purchase a "Yes" contract for event E, and E occurs, you receive one dollar (though this isn't entirely true, as I will explain shortly), but if E does not occur, your contract resolves to 0 cents, and you lose the money which you used to purchase the contract. And the inverse is true for the "No" contract (which is complimentary to the "Yes" contract) - if you purchase the "No" contract for event E, and E does not occur, you receive one dollar, but if E occurs, your contract resolves to 0 cents, and you lose the money which you used to purchase the contract. These are the basics, but the full rules for PredictIt can be found [here].(https://www.predictit.org/support/how-to-trade-on-predictit).
Let's give examples of each of the two types of transactions mentioned above.
(1) - this is when a buyer/seller is matched by another user who wants to perform the opposite action at the same price: I believe that Democrats have a 52% chance of winning North Carolina, and I want to make a small profit, so I put in a bid to buy a "Yes" contract in that market at 50 cents. You believe that Democrats only have a 48% chance of winning North Carolina, and you own "Yes" contracts in that market; when you see my offer to buy a "Yes" contract at 50 cents, you offer to sell at 50 cents. The transaction goes through - I get your contract, and you get 50 cents from me.
(2) - this is when users want to purchase the complimentary contract at a complimentary price: I believe that Democrats have a 52% chance of winning North Carolina, and I want to make a small profit, so I put in a bid to buy a "Yes" contract in that market at 50 cents. You believe that Democrats only have a 48% chance of winning North Carolina, and you want to make a small profit, so you put in a bid to buy a "No" contract in that market at 50 cents. The transaction goes through - we each get contracts for 50 cents, and our money goes to PredictIt (most of it will come back as payout at the end).
However, there is an important factor which has been omitted up until now - PredictIt takes 10% of profits made on each contract. This is significant - suppose I am considering buying a "Yes" contract for some event at 50 cents, because I believe the event's true probability of occuring is 52%. However, I can quickly deduce that, because PredictIt takes 10% of my profits, it is not worth it for me to buy the contract - suppose I am right that the event has a 52% chance of occuring. Then, the expected value of purchasing the contract is (.52 .95) + (.48 0.0) = 0.494, as the winnings would only be 95 cents, not a full dollar, because PredictIt would take 10% of my 50 cent profit. So, I would not purchase the contract, as the expected value of the contract is less than the price of the contract.
Further, there can be arbitrage in PredictIt, even within one market. For example, consider again the North Carolina Senate market:
Suppose Alice believes Republicans are at a disadvantage, and Bob believes they heavily advantaged; so, when Bob offers to purchase a Republican "Yes" contract at 60 cents, Alice purchases a Republican "No" contract at 40 cents; both type (2) transactions go through. Meanwhile, Candace believes Democrats are at a disadvantage, while Dennis does not; so, when Dennis offers to purchase a Democratic "Yes" contract at 60 cents, Candace purchases a Democratic "No" contract at 40 cents; both type (2) transactions go through.
Now note: the price for a Democratic "Yes" contract is at 60 cents, while the price for a Republican "Yes" contract is also at 60 cents, and a huge opportunity for arbitrage exists - one could buy a 40 cent "No" contract for both the Democrats and Republicans, and be nearly guarenteed a profit of 14 cents when one of the two parties wins.
As we will see, such high arbitrage is incredibly rare, and most often arbitrage opportunities fall inside the range where it is not profitable because PredictIt takes 10% of the profits. But here is the bottom line - even when one of several events is guarenteed to occur, the sum of the "Yes" contract prices of those events is not necessarily a dollar - it can be less, and it can be more. The issue is discussed in greater depth in this paper.pdf).
This then begs the question - does price alone tell us the implied probability that the market gives to an event's occurence? The answer is: certainly not! If the Democratic "Yes" price is 60 cents and the Republican "Yes" price is 60 cents, this does not reflect an implied 60% probability of the Democrats winning the seat in question. Though there are other methods in other contexts for calculating implied proability, what I believe will work best here is the following definition: assuming one of the two major parties is going to win a given Senate seat (which was the case in these 35 races), to calculate the implied probability of a given party winning that Senate seat, we simply take the price on that party's "Yes" contract and divide it by the sum of the prices of both parties' "Yes" contracts. So, in the 60 cents/60 cents example above, we would say that each party has an implied probability of 50% of winning the Senate seat.
Note: all prices given in the upcoming analysis of PredictIt's Senate races data are for the "Yes" shares of a given contract. When I say "the contract for the Democrats winning North Carolina was priced at 50 cents on 11/1/20", I am effectively using shorthand to say "the "Yes" contract for the Democrats winning North Carolina was priced at 50 cents on 11/1/20."
To obtain this data, I reached out to PredictIt. After some discussion, they sent me a folder containing data on all thirty-five 2020 U.S. Senate races markets. I am incredibly appreciative of their providing me their data for this project.
There are four files containing our data in the folder which PredictIt shared with me. As they are in the "PredictIt-data" folder, these files are of the form: "PredictIt-data/Price History By Market -NoahFinei.xlsx", where the "i" in "NoahFinei" is either 1, 2, 3, or 4. We can more effectively examine what each of these files contains by looking at the first few lines of one of them after bringing in the data. We can do this, in turn, using Pandas to both read in and store the data (in a dataframe):
#importing pandas, so that we can use it
import pandas
#reading in the data into a dataframe
data = pandas.read_excel("PredictIt-data/Price History By Market -NoahFine1.xlsx")
#displaying the first few rows of the data to make sure it looks as expected
data.head()
Market ID | Market Name | Contract ID | Contract Name | Date (ET) | Open Share Price | Close Share Price | Low Share Price | High Share Price | Average Trade Price | Trade Volume | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | 5808 | Which party will win the U.S. Senate race in N... | 17017 | Democratic | 2019-08-22 | 0.37 | 0.49 | 0.37 | 0.49 | 0.4514 | 80 |
1 | 5808 | Which party will win the U.S. Senate race in N... | 17017 | Democratic | 2019-08-23 | 0.49 | 0.40 | 0.40 | 0.49 | 0.4000 | 1 |
2 | 5808 | Which party will win the U.S. Senate race in N... | 17017 | Democratic | 2019-08-24 | 0.40 | 0.40 | 0.40 | 0.40 | 0.0000 | 0 |
3 | 5808 | Which party will win the U.S. Senate race in N... | 17017 | Democratic | 2019-08-25 | 0.40 | 0.40 | 0.40 | 0.40 | 0.0000 | 0 |
4 | 5808 | Which party will win the U.S. Senate race in N... | 17017 | Democratic | 2019-08-26 | 0.40 | 0.40 | 0.40 | 0.40 | 0.0000 | 0 |
Each row represents information about a market on a given date. Let's go through what each column tells us:
Market ID: An internal marker PredictIt uses; each one corresponds to a unique market; for example, "5808" corresponds to the North Carolina U.S. Senate race market.
Market Name: The market's actual name, which is displayed to users of PredictIt.
Contract ID: An internal marker PredictIt uses; each one corresonds to a unique contract of a market; for example, "17017" corresponds to the Democratic contract for the North Carolina U.S. Senate race market.
Market Name: The contract's actual name, which was displayed to users of PredictIt.
Date (ET): The date, in eastern time. If date is 11/01/20, for example, then the information in the next few columns applies to the time range from 12:00:00AM to 11:59:59PM in New York on 11/01/20.
Open Share Price: The price of the share at the time the market opened; on the day when the market was made publicly accessible, this time is whenever trading was opened to the public, but on other days, this time is at 12:00:00AM.
Close Share Price: The price of the share at the time the market closed; this occurs at 11:59:59PM every day.
Low Share Price: The lowest price of the contract on that day.
High Share Price: The highest price of the contract on that day.
Average Trade Price: The average price of a trade on that day.
Trade Volume: The number of trades on that day.
With this explained, we will continue using Pandas to add the rest of the data to our "data" dataframe.
#adding the remaining data to the data dataframe
data=data.append(pandas.read_excel("PredictIt-data/Price History By Market -NoahFine2.xlsx"))
data=data.append(pandas.read_excel("PredictIt-data/Price History By Market -NoahFine3.xlsx"))
data=data.append(pandas.read_excel("PredictIt-data/Price History By Market -NoahFine4.xlsx"))
#resetting indices, as the original indices are at first kept, and we have multiple of the
#same index
data=data.reset_index()
#now, there is still an "index" column with the original indices, but we don't need these
data.drop(columns='index', inplace=True)
#displaying the first few rows of the data to make sure it looks as expected
data.head()
Market ID | Market Name | Contract ID | Contract Name | Date (ET) | Open Share Price | Close Share Price | Low Share Price | High Share Price | Average Trade Price | Trade Volume | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | 5808 | Which party will win the U.S. Senate race in N... | 17017 | Democratic | 2019-08-22 | 0.37 | 0.49 | 0.37 | 0.49 | 0.4514 | 80 |
1 | 5808 | Which party will win the U.S. Senate race in N... | 17017 | Democratic | 2019-08-23 | 0.49 | 0.40 | 0.40 | 0.49 | 0.4000 | 1 |
2 | 5808 | Which party will win the U.S. Senate race in N... | 17017 | Democratic | 2019-08-24 | 0.40 | 0.40 | 0.40 | 0.40 | 0.0000 | 0 |
3 | 5808 | Which party will win the U.S. Senate race in N... | 17017 | Democratic | 2019-08-25 | 0.40 | 0.40 | 0.40 | 0.40 | 0.0000 | 0 |
4 | 5808 | Which party will win the U.S. Senate race in N... | 17017 | Democratic | 2019-08-26 | 0.40 | 0.40 | 0.40 | 0.40 | 0.0000 | 0 |
Now that we have our data, we need to figure out if there are any major issues with it, and if any cleaning or parsing needs to be done. And indeed, even from the output above, we can see that we have a few potential issues with our data.
First, if we want to measure correlations between how far out from the election a given data point is and accuracy of pricing in a meaningful way, we should make note of that somewhere in the dataframe; this is currently not noted. Second, though we know that all of these markets are for Senate races in specific U.S. states, we currently have no easy way of classifying which state a given market is for; the market name needs to be parsed to obtain this data. Third, the rows do not note who the eventual winner was in their markets; this will make checking price accuracy (which we will define later on) impossible. Finally, the rows do not note the implied probability of victory of the party in the contract concerned; as discussed above, this often differs from the actual price (and will be necessary in hypothesis testing).
So, we will fix these issues. We will fix the first issue by adding a "Days Out" column which measures how far out the day in a given row is from the election, in days. Note that election day in 2020 occured on November 3rd, 2020, in all states concerned (though the two Georgia races did head to a runoff, we will ignore this for now, and treat their election day as November 3rd, 2020, the date of Georgia's first round of its general election; seeing as no runoff was guarenteed, this doesn't seem too unreasonable).
#labelling each entry with how far out it is from election day
far_out = []
for i in range(0, data.shape[0]):
far_out.append((pandas.Timestamp('20201103') - data.at[i,'Date (ET)']).days)
#adding these labellings to the data
data['Days Out'] = far_out
data.head()
Market ID | Market Name | Contract ID | Contract Name | Date (ET) | Open Share Price | Close Share Price | Low Share Price | High Share Price | Average Trade Price | Trade Volume | Days Out | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 5808 | Which party will win the U.S. Senate race in N... | 17017 | Democratic | 2019-08-22 | 0.37 | 0.49 | 0.37 | 0.49 | 0.4514 | 80 | 439 |
1 | 5808 | Which party will win the U.S. Senate race in N... | 17017 | Democratic | 2019-08-23 | 0.49 | 0.40 | 0.40 | 0.49 | 0.4000 | 1 | 438 |
2 | 5808 | Which party will win the U.S. Senate race in N... | 17017 | Democratic | 2019-08-24 | 0.40 | 0.40 | 0.40 | 0.40 | 0.0000 | 0 | 437 |
3 | 5808 | Which party will win the U.S. Senate race in N... | 17017 | Democratic | 2019-08-25 | 0.40 | 0.40 | 0.40 | 0.40 | 0.0000 | 0 | 436 |
4 | 5808 | Which party will win the U.S. Senate race in N... | 17017 | Democratic | 2019-08-26 | 0.40 | 0.40 | 0.40 | 0.40 | 0.0000 | 0 | 435 |
Now, we want to parse out the state's name from each market name, to resolve issue two. The best way to do this is with regular expressions, and by noticing a few things:
I found all of these properties by looking through the files that PredictIt sent to me, but I will check after the fact that this worked as expected as well. For now, let's use these properties to parse out the state names from the market names, and add them to our data frame:
#importing re, which allows use of regular expressions
import re
#a state_names list, which will later be used as the appropriate column in the dataframe
state_names = []
for i in range(0, data.shape[0]):
line = data.at[i,'Market Name']
#all market names are structured as "Which party will win the U.S. Senate (race|election) in (state_name) in 2020?"
#first, we will find the start index of the state name
#looking for "race"
s = re.search('race', line)
if s is None:
#since race is not present, the state name will be after "election"
s = re.search('election', line)
#assigning s to everything after "election"; we will then remove excess junk in the next step
s = line[(s.start()+12):]
else:
#assigning s to everything after "race"; we will then remove excess junk in the next step
s = line[(s.start()+8):]
#now, s is the state name and some junk at the end; we need to check if the state is one word or two words, and
#then we can remove junk
words = re.findall('\S+', s)
#we will assign the proper state name to "state"
state = ""
if words[0] == 'North' or words[0] == 'South' or words[0] == 'West' or words[0] == 'New' or words[0] == 'Rhode':
state = words[0] + ' ' + words[1]
else:
state = words[0]
#now, we have the correct state name, but we need to deal with the Georgia regular versus Georgia special edge case
georgia = re.search("Georgia", state)
if georgia is not None:
#we know that any Georgia market which refers to the special election has "special" in it, so we utilize this
special = re.search("special", line)
if special is not None:
state = "Georgia (special)"
else:
state = "Georgia (regular)"
#finally, adding the state name in the list
state_names.append(state)
#Adding the "State" column to the data, which contains state names
data['State'] = state_names
data.head()
Market ID | Market Name | Contract ID | Contract Name | Date (ET) | Open Share Price | Close Share Price | Low Share Price | High Share Price | Average Trade Price | Trade Volume | Days Out | State | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 5808 | Which party will win the U.S. Senate race in N... | 17017 | Democratic | 2019-08-22 | 0.37 | 0.49 | 0.37 | 0.49 | 0.4514 | 80 | 439 | North Carolina |
1 | 5808 | Which party will win the U.S. Senate race in N... | 17017 | Democratic | 2019-08-23 | 0.49 | 0.40 | 0.40 | 0.49 | 0.4000 | 1 | 438 | North Carolina |
2 | 5808 | Which party will win the U.S. Senate race in N... | 17017 | Democratic | 2019-08-24 | 0.40 | 0.40 | 0.40 | 0.40 | 0.0000 | 0 | 437 | North Carolina |
3 | 5808 | Which party will win the U.S. Senate race in N... | 17017 | Democratic | 2019-08-25 | 0.40 | 0.40 | 0.40 | 0.40 | 0.0000 | 0 | 436 | North Carolina |
4 | 5808 | Which party will win the U.S. Senate race in N... | 17017 | Democratic | 2019-08-26 | 0.40 | 0.40 | 0.40 | 0.40 | 0.0000 | 0 | 435 | North Carolina |
We can also quickly check that these properties produced 35 unique state names (for the 35 unique Senate races), and that none of them are nonsensical:
#creating a dictionary to use functionally as a hashset:
#keys are state names, values are True booleans, but meaningless
all_state_names = {}
for i in range(0, data.shape[0]):
all_state_names[data.at[i, 'State']] = True
#printing the final keyset of the dictionary,
#which contains all unique state names present in data
n = 1
for k in all_state_names.keys():
print(str(n) + ": " + k)
n += 1
1: North Carolina 2: Arizona 3: Maine 4: Alabama 5: Georgia (special) 6: Kentucky 7: Michigan 8: Georgia (regular) 9: Alaska 10: South Carolina 11: Kansas 12: Colorado 13: Montana 14: Iowa 15: New Mexico 16: New Jersey 17: Louisiana 18: Virginia 19: Illinois 20: Mississippi 21: Minnesota 22: Oregon 23: Texas 24: Tennessee 25: New Hampshire 26: Nebraska 27: West Virginia 28: Massachusetts 29: Oklahoma 30: Rhode Island 31: Wyoming 32: Delaware 33: Idaho 34: South Dakota 35: Arkansas
As we see above, there are 35 unique state names, as expected, and they are all reasonable/real state names. So, we have resolved issue two.
Now, we need to deal with the issue of marking the winner in each market. To do this, I made an excel file which had state name in one column and the winning party (Republican or Democratic) in another. After reading in the data, it just took a dictionary and a list to add the appropriate column into the dataframe. I also decided to add a column called "538 District", which indicates how 538 refers to the state in their data; as we shall see, this will be relevant later. To do this, I used the same method that I used above to mark the winners in each market, adding a column in my excel sheet called "538 District".
#reading in my excel sheet
winner_data = pandas.read_excel("PredictIt-data/Election Winners.xlsx")
#winners dictionary; keys are state names, values are winners (Republican or Democratic)
winners = {}
#fte_district dictionary; keys are state names, values are 538 district names
fte_district = {}
#iterating through winner_data to set up the dictionaries
for i in range(0, winner_data.shape[0]):
winners[winner_data.at[i,'State']] = winner_data.at[i,'Winner']
fte_district[winner_data.at[i,'State']] = winner_data.at[i,'538 District']
#party_winners list, which we be populated with the winner in each row of data
party_winners = []
#fte_districts list, which we be populated with the 538 district in each row of data
fte_districts = []
for i in range(0, data.shape[0]):
party_winners.append(winners[data.at[i, 'State']])
fte_districts.append(fte_district[data.at[i, 'State']])
#adding the lists as the appropriate columns
data['Winner'] = party_winners
data['538 District'] = fte_districts
data.head()
Market ID | Market Name | Contract ID | Contract Name | Date (ET) | Open Share Price | Close Share Price | Low Share Price | High Share Price | Average Trade Price | Trade Volume | Days Out | State | Winner | 538 District | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 5808 | Which party will win the U.S. Senate race in N... | 17017 | Democratic | 2019-08-22 | 0.37 | 0.49 | 0.37 | 0.49 | 0.4514 | 80 | 439 | North Carolina | Republican | NC-S2 |
1 | 5808 | Which party will win the U.S. Senate race in N... | 17017 | Democratic | 2019-08-23 | 0.49 | 0.40 | 0.40 | 0.49 | 0.4000 | 1 | 438 | North Carolina | Republican | NC-S2 |
2 | 5808 | Which party will win the U.S. Senate race in N... | 17017 | Democratic | 2019-08-24 | 0.40 | 0.40 | 0.40 | 0.40 | 0.0000 | 0 | 437 | North Carolina | Republican | NC-S2 |
3 | 5808 | Which party will win the U.S. Senate race in N... | 17017 | Democratic | 2019-08-25 | 0.40 | 0.40 | 0.40 | 0.40 | 0.0000 | 0 | 436 | North Carolina | Republican | NC-S2 |
4 | 5808 | Which party will win the U.S. Senate race in N... | 17017 | Democratic | 2019-08-26 | 0.40 | 0.40 | 0.40 | 0.40 | 0.0000 | 0 | 435 | North Carolina | Republican | NC-S2 |
This issue is resolved.
Finally, we need to add the implied probability of victory in each row. We will add a few caveats, though. First, the implied probability will be strictly between the Democratic and Republican contracts (as noted in a previous section), to keep things simple; meaning, the value in the implied probability column will reflect the answer to the question: "Assuming either the Democratic or Republican party wins this race, what is the implied probability of the party in this market winning this race?" and not: "what is the implied probability of the party in this market winning this race?" Accordingly, markets for non-Democratic and non-Republican contracts will have a value of 0 in this column. Additionally, if only one of the two major parties has a contract present in a given market on a given date (which happens very infrequently), the implied probability will be set to the contract's actual price. Further, the value in this column will reflect the implied probability of the closing price, as the opening price can pose problems when it reflects the market's opening to the public (often, the initial prices, which are set by PredictIt, are somewhat arbitrary and can skew our data), and the average trade price is by default 0 on a day with no trades. Let's add this column to the dataframe:
#closing_prices dictionary: keys will be strings containing the state,
#contract name (party concerned), and date; values will be closing prices of the contracts
closing_prices = {}
#populating the dictionary accordingly
for i in range(0, data.shape[0]):
k = data.at[i, 'State'] + data.at[i, 'Contract Name'] + str(data.at[i, 'Date (ET)'])
closing_prices[k] = data.at[i, 'Close Share Price']
#a list containing the implied probability at each row
implied_probs = []
#populating the list as described above
for i in range(0, data.shape[0]):
if (data.at[i, 'State'] + "Democratic" + str(data.at[i, 'Date (ET)']) not in closing_prices or
data.at[i, 'State'] + "Republican" + str(data.at[i, 'Date (ET)']) not in closing_prices):
#if there isn't a value for both Democrats and Republicans in the closing_prices
#dictionary, the implied probability is the closing price
implied_probs.append(data.at[i, 'Close Share Price'])
else:
#implied probability is calculated as described in the "how PredictIt works" section
priceD = closing_prices[data.at[i, 'State'] + "Democratic" + str(data.at[i, 'Date (ET)'])]
priceR = closing_prices[data.at[i, 'State'] + "Republican" + str(data.at[i, 'Date (ET)'])]
if data.at[i, 'Contract Name'] == "Democratic":
implied_probs.append(priceD/(priceD + priceR))
else:
if data.at[i, 'Contract Name'] == "Republican":
implied_probs.append(priceR/(priceD + priceR))
else:
implied_probs.append(0.0)
#adding the column to the dataframe
data['Closing Implied Probability'] = implied_probs
data.head()
Market ID | Market Name | Contract ID | Contract Name | Date (ET) | Open Share Price | Close Share Price | Low Share Price | High Share Price | Average Trade Price | Trade Volume | Days Out | State | Winner | 538 District | Closing Implied Probability | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 5808 | Which party will win the U.S. Senate race in N... | 17017 | Democratic | 2019-08-22 | 0.37 | 0.49 | 0.37 | 0.49 | 0.4514 | 80 | 439 | North Carolina | Republican | NC-S2 | 0.485149 |
1 | 5808 | Which party will win the U.S. Senate race in N... | 17017 | Democratic | 2019-08-23 | 0.49 | 0.40 | 0.40 | 0.49 | 0.4000 | 1 | 438 | North Carolina | Republican | NC-S2 | 0.408163 |
2 | 5808 | Which party will win the U.S. Senate race in N... | 17017 | Democratic | 2019-08-24 | 0.40 | 0.40 | 0.40 | 0.40 | 0.0000 | 0 | 437 | North Carolina | Republican | NC-S2 | 0.430108 |
3 | 5808 | Which party will win the U.S. Senate race in N... | 17017 | Democratic | 2019-08-25 | 0.40 | 0.40 | 0.40 | 0.40 | 0.0000 | 0 | 436 | North Carolina | Republican | NC-S2 | 0.430108 |
4 | 5808 | Which party will win the U.S. Senate race in N... | 17017 | Democratic | 2019-08-26 | 0.40 | 0.40 | 0.40 | 0.40 | 0.0000 | 0 | 435 | North Carolina | Republican | NC-S2 | 0.430108 |
To start the analysis, let's examine how well price correlated to observed outcome without considering any other factors. In general for these sort of analyses, we will be using the closing share price (for the reason described above), and only considering markets at least a day out from the election (as afterwards, they tend to either hover around 0.99 cents for the winner until the contract is paid off, or they irrationally don't, and this behavior deserves a separate analysis in a different project). Further, unless specified otherwise, "price" on a given day will refer to the closing share price on that day.
We can create an interesting bar chart using the numpy and matplotlib libraries, which are commonly used in data analysis. In this chart, we will chart two things: the frequency at which a contract priced in a given range resolved to "Yes", and the frequency at which a contract priced in that range would have been expected to resolve to yes if the price were reflective of true probability (which we will say is the median value of that range, but could reasonably be labelled as any value in that range).
#importing the necessary libraries
import matplotlib.pyplot as plt
import numpy as np
#Since I love FiveThirtyEight's graphics, I will use their plot style for the remainder of this project
plt.style.use('fivethirtyeight')
#list for obtaining prices corresponding to each row
prices = []
#list for obtaining winners corresponding to each row; a value of 0 means that the winner did not match the contract name,
#while a value of 1 means that it did
won = []
#populating the lists appropriately
for i in range(0, data.shape[0]):
if data.at[i, 'Days Out'] > 0:
prices.append(data.at[i, 'Close Share Price'])
if data.at[i, 'Winner'] == data.at[i, 'Contract Name']:
won.append(1)
else:
won.append(0)
#labels for the ranges on the x-axis
labels = []
#for each range, will contain expected frequency at which a contract in that range would resolve to "Yes"
expected = []
#populating accordingly
for i in range(0,10):
s = '0.' + str(i) + '0-0.' + str(i) + '9'
labels.append(s)
expected.append(i/10 + 0.045)
#index 0 will contain the total number of contracts from $0.00 to $0.09 which resolved to yes,
#index 1 will contain the total number of contracts from $0.10 to $0.19 which resolved to yes,
#and so on
total_wins = 10*[0]
#index 0 will contain the total number of contracts from $0.00 to $0.09,
#index 1 will contain the total number of contracts from $0.10 to $0.19,
#and so on
total = 10*[0]
#populating total_wins and total appropriately
for i in range(0,len(won)):
ind = int(prices[i]*10)
total_wins[ind] = total_wins[ind] + won[i]
total[ind] = total[ind] + 1
#index 0 will contain the proportion of contracts from $0.00 to $0.09 which resolved to yes,
#index 1 will contain the proportion of contracts from $0.10 to $0.19 which resolved to yes,
#and so on
ys = 10*[-1] #-1 is the default value, as that will enable us to detect regions with no markets visually
#populating ys appropriately
for i in range(0,10):
if total[i] != 0:
ys[i] = total_wins[i]/total[i]
#creating the plot
x = np.arange(len(labels)) # the label locations
width = 0.45 # the width of the bars
fig = plt.figure()
ax = fig.add_axes([0,0,1.8,1])
ax.bar(x - width/2, ys, width, label='Observed', color='b')
ax.bar(x + width/2, expected, width, label='Expected', color='g')
#adding some text for labels, title, custom x-axis tick labels, etc.
ax.set_ylabel('Proportion Resolved to \"Yes\"')
ax.set_xlabel('Contract Price, in Dollars')
ax.set_title('Results versus Prices, pre-election')
ax.set_xticks(x)
ax.set_xticklabels(labels)
ax.legend()
plt.show()
Though the observed bars mostly follow the expected, there are some areas of significant deviation, such as from 30-39 cents and from 60-69 cents. We can visualize this in a more granular way via a scatterplot:
#this library will allow us to label lines nicely in our graphs
import matplotlib.patches as mpatches
#price_to_proportion library: keys are prices, values will be lists containing two values - at index 0, the total number of
#resolutions to "Yes" for contracts of the price in the key, and at index 1, the total number of contracts of the price
#in the key; this will allow us to get the proportion which resolved to "Yes" later on
price_to_proportion = {}
#populating appropriately
for i in range(0,data.shape[0]-1):
if data.at[i, 'Days Out'] > 0:
price = data.at[i,'Close Share Price']
if price not in price_to_proportion:
price_to_proportion[price] = 2*[0]
if data.at[i, 'Winner'] == data.at[i, 'Contract Name']:
price_to_proportion[price][0] = price_to_proportion[price][0] + 1
price_to_proportion[price][1] = price_to_proportion[price][1] + 1
#lists which will be used to make the plot
#xs are the x-coordinates, which will go from 0.01 to 0.99
xs=[]
#ys are the y-coordinates, which will have the value of the proportion of contracts priced at the x-value
#which resolved to yes
ys=[]
#populating xs and ys appropriately
for k in price_to_proportion.keys():
xs.append(k)
ys.append(price_to_proportion[k][0]/price_to_proportion[k][1])
#creating the plot
fig, ax = plt.subplots()
ax.scatter(xs, ys)
line = x
plt.plot(x, line, 'b', label='y={:.2f}x+{:.2f}'.format(1,0))
#adding some text for labels, title, custom x-axis tick labels, etc.
plt.xlim(-0.01, 1.01)
plt.ylim(-0.01, 1.01)
ax.set_ylabel('Proportion Resolved to \"Yes\"')
ax.set_xlabel('Contract Price, in Dollars')
ax.set_title('Results versus Prices, pre-election')
blue_patch = mpatches.Patch(color='b', label='y=x line')
plt.legend(handles=[blue_patch], loc = 'best', bbox_to_anchor=(1.1, -0.2))
plt.show()
Indeed, even on a more granualar level, we see PredictIt pricing is not so representative of the true probability, at least in this sample - very few values fall even close to where they would fall if they represented true probability. That said, we will see if any meaning can be extracted from the scatterplot above in our hypothesis testing, specifically in the context of correlation.
Meanwhile, we can visualize the difference between the proportion resolved to "Yes" and the contract price more effectively in the plot below:
#price_to_proportion library: keys are prices, values will be lists containing two values - at index 0, the total number of
#resolutions to "Yes" for contracts of the price in the key, and at index 1, the total number of contracts of the price
#in the key; this will allow us to get the proportion which resolved to "Yes" later on
price_to_proportion = {}
#populating appropriately
for i in range(0,data.shape[0]-1):
if data.at[i, 'Days Out'] > 0:
price = data.at[i,'Close Share Price']
if price not in price_to_proportion:
price_to_proportion[price] = 2*[0]
if data.at[i, 'Winner'] == data.at[i, 'Contract Name']:
price_to_proportion[price][0] = price_to_proportion[price][0] + 1
price_to_proportion[price][1] = price_to_proportion[price][1] + 1
#lists which will be used to make the plot
#xs are the x-coordinates, which will go from 0.01 to 0.99
xs=[]
#ys are the y-coordinates, which will have the value of the observed proportion of contracts priced at the x-value
#which resolved to yes minus the expected proportion of contracts priced at the x-value which resolved to yes
ys=[]
#populating xs and ys appropriately
for k in price_to_proportion.keys():
xs.append(k)
ys.append(price_to_proportion[k][0]/price_to_proportion[k][1] - k)
#creating the plot
fig, ax = plt.subplots()
ax.scatter(xs, ys)
line = 0*x
plt.plot(x, line, 'b', label='y={:.2f}x+{:.2f}'.format(0,0))
#adding some text for labels, title, custom x-axis tick labels, etc.
plt.xlim(-0.01, 1.01)
plt.ylim(-0.50, 0.50)
ax.set_ylabel('Prop. Resolved to \"Yes\" - Expected')
ax.set_xlabel('Contract Price, in Dollars')
ax.set_title('Results versus Prices, pre-election')
blue_patch = mpatches.Patch(color='b', label='x-axis')
plt.legend(handles=[blue_patch], loc = 'best', bbox_to_anchor=(1.1, -0.2))
plt.show()
The spikes on this graph seem to mirror each other, and this makes sense - for each Democratic contract priced at x, there is usually a corresponding Republican contract priced around 1-x; further, for whichever contract resolves to yes, the other resolves to no (as only one of the two parties will win a given seat).
We can also attempt to see if time out from the election plays a role in pricing accuracy. In the bar chart below, we can see the frequency at which a contract priced in a given range resolved to "Yes" a specific amount of time out from the election (1 day out, 7 days out, 30 days out, and 90 days out), and the frequency at which a contract priced in that range would have been expected to resolve to yes if the price were reflective of true probability (which we will again say is the median value of that range, but could reasonably be labeled as any value in that range).
#the following list will contain the y-values for the bar chart;
#index 0 contains the values for each bucket (0.00-0.09, 0.10-0.19, ..., 0.90-0.99) for 90 days out,
#index 1 for 30 days out, index 2 for 7 days out, and index 3 for 1 day out
ys = [[],[],[],[]]
#setting the labels for each bucket and their correspondng expected values as appropriate
labels = []
expected = []
for i in range(0,10):
s = '0.' + str(i) + '0-0.' + str(i) + '9'
labels.append(s)
expected.append(i/10 + 0.045)
#the following list will be used in tandem with the ys 2D list sp that we can create the bar chart through iteration
#rather than copy pasting
days_out = [90, 30, 7, 1]
#populating ys as appropriate
for j in range(0,4):
#the current day we are setting up the ys for
day = days_out[j]
#list for obtaining prices corresponding to each row with the given number of days out
prices = []
#list for obtaining winners corresponding to each row; a value of 0 means that the winner did not match the
#contract name, while a value of 1 means that it did
won = []
for i in range(0, data.shape[0]):
if data.at[i, 'Days Out'] == day:
prices.append(data.at[i, 'Close Share Price'])
if data.at[i, 'Winner'] == data.at[i, 'Contract Name']:
won.append(1)
else:
won.append(0)
#index 0 will contain the total number of contracts from $0.00 to $0.09 which resolved to yes,
#index 1 will contain the total number of contracts from $0.10 to $0.19 which resolved to yes,
#and so on
total_wins = 10*[0]
#index 0 will contain the total number of contracts from $0.00 to $0.09,
#index 1 will contain the total number of contracts from $0.10 to $0.19,
#and so on
total = 10*[0]
for i in range(0,len(won)):
ind = int(prices[i]*10)
total_wins[ind] = total_wins[ind] + won[i]
total[ind] = total[ind] + 1
ys[j] = 10*[-1] #-1 is default; will enable us to detect regions with zero markets, though we see in the output
#that these do not occur
for i in range(0,10):
if total[i] != 0:
ys[j][i] = total_wins[i]/total[i]
#creating the plot
x = np.arange(len(labels)) # the label locations
width = 0.15 # the width of the bars
fig = plt.figure()
ax = fig.add_axes([0,0,2,1])
ax.bar(x - 2*width, ys[0], width, label='90 days out', color= (0.0, 0.8, 0.9))
ax.bar(x - width, ys[1], width, label='30 days out', color=(0.0, 0.6, 0.7))
ax.bar(x, ys[2], width, label='7 days out', color=(0.0, 0.4, 0.5))
ax.bar(x + width, ys[3], width, label='1 day out', color=(0.0, 0.2, 0.3))
ax.bar(x + 2*width, expected, width, label='Expected', color='green')
#adding some text for labels, title, custom x-axis tick labels, etc.
ax.set_ylabel('Proportion Resolved to \"Yes\"')
ax.set_xlabel('Contract Price, in Dollars')
ax.set_title('Results versus Prices, pre-election, by Days Out')
ax.set_xticks(x)
ax.set_xticklabels(labels) # labels
ax.legend()
plt.show()
#displaying if there was an area with no markets
for y in ys:
for elt in y:
if elt == -1:
print("Some area has no markets")
Looking at each of these ten buckets, it seems like there may be some correlation between time and pricing accuracy (namely, that pricing accuracy gets better as an election gets closer, but we will test this later).
We will now create a similar plot to the one above, but instead of making a different column for each time, we will make columns based on trade volume. In the bar chart below, we can see the frequency at which a contract priced in a given range resolved to "Yes" given a specific volume (0 trades, 1-9 trades, 10-99 trades, 100-999 trades, and 1000+ trades), and the frequency at which a contract priced in that range would have been expected to resolve to yes if the price were reflective of true probability (which we will say is the median value of that range, but could reasonably be labeled as any value in that range).
#the following list will contain the y-values for the bar chart;
#index 0 contains the values for each bucket (0.00-0.09, 0.10-0.19, ..., 0.90-0.99) for 0 trade volume,
#index 1 for 30 days out, index 2 for 7 days out, and index 3 for 1 day out
ys = [[],[],[],[],[]]
#setting the labels for each bucket and their correspondng expected values as appropriate
labels = []
expected = []
for i in range(0,10):
s = '0.' + str(i) + '0-0.' + str(i) + '9'
labels.append(s)
expected.append(i/10 + 0.045)
#the following list will be used in tandem with the ys 2D list sp that we can create the bar chart through iteration
#rather than copy pasting
vols = [0, 1, 10, 100, 1000]
#populating ys as appropriate
for j in range(0,5):
#list for obtaining prices corresponding to each row within the given trading volume range
prices = []
#list for obtaining winners corresponding to each row; a value of 0 means that the winner did not match the
#contract name, while a value of 1 means that it did
won = []
for i in range(0, data.shape[0]):
#the trading volume of the current row
vol = data.at[i, 'Trade Volume']
#adjusting prices and won accordingly
if (j < 4 and vol >= vols[j] and vol < vols[j+1]) or (vol >= vols[j]):
prices.append(data.at[i, 'Close Share Price'])
if data.at[i, 'Winner'] == data.at[i, 'Contract Name']:
won.append(1)
else:
won.append(0)
#index 0 will contain the total number of contracts from $0.00 to $0.09 which resolved to yes,
#index 1 will contain the total number of contracts from $0.10 to $0.19 which resolved to yes,
#and so on
total_wins = 10*[0]
#index 0 will contain the total number of contracts from $0.00 to $0.09,
#index 1 will contain the total number of contracts from $0.10 to $0.19,
#and so on
total = 10*[0]
for i in range(0,len(won)):
ind = int(prices[i]*10)
total_wins[ind] = total_wins[ind] + won[i]
total[ind] = total[ind] + 1
ys[j] = 10*[-1] #-1 is default; will enable us to detect regions with zero markets, though we see in the output
#that these do not occur
for i in range(0,10):
if total[i] != 0:
ys[j][i] = total_wins[i]/total[i]
#creating the plot
x = np.arange(len(labels)) # the label locations
width = 0.10 # the width of the bars
fig = plt.figure()
ax = fig.add_axes([0,0,2,1])
ax.bar(x - 2.5*width, ys[0], width, label='0', color= (0.0, 0.7, 0.8))
ax.bar(x - 1.5*width, ys[1], width, label='1-9', color= (0.0, 0.6, 0.7))
ax.bar(x - 0.5*width, ys[2], width, label='10-99', color=(0.0, 0.5, 0.6))
ax.bar(x + 0.5*width, ys[3], width, label='100-999', color=(0.0, 0.4, 0.5))
ax.bar(x + 1.5*width, ys[4], width, label='1000+', color=(0.0, 0.3, 0.4))
ax.bar(x + 2.5*width, expected, width, label='Expected', color='green')
#adding some text for labels, title, custom x-axis tick labels, etc.
ax.set_ylabel('Proportion Resolved to \"Yes\"')
ax.set_xlabel('Contract Price, in Dollars')
ax.set_title('Results versus Prices, pre-election, by Trading Volume')
ax.set_xticks(x)
ax.set_xticklabels(labels) # labels
ax.legend()
plt.show()
It is much harder to see from here if trading volume correlates in any way with pricing accuracy, but we can test this more thoroughly later.
Finally, one more thing I was interested to see was the sum of all contracts in a market over time; as mentioned earlier, contracts in a market may not always sum to a dollar, potentially presenting arbitrage opportunities, and I wanted to visualize this. As there are 35 markets, it would be too crowded to view the sums of all of the markets' contracts over time. So, the chart below shows the sum of all contracts in a market for a sample of the markets in our data across time, and the average sum of all contracts in a market across time, up to 90 days out.
#making a data frame
df=pandas.DataFrame({'x': range(1,91)}) #populating with the appropriate x-axis
#this for loop will fill the dataframe with a column of 90 entries of 0.0 for each market by default,
#with the i^th entry (indexed at 0) representing the sum of the contract prices on the (i+1)th day out from the election;
#the values are then updated accordingly
for i in range(0,data.shape[0]):
if data.at[i, 'Days Out'] > 0 and data.at[i, 'Days Out'] <= 90:
if data.at[i,'State'] not in df.keys():
df[data.at[i,'State']] = 90*[0.0]
df[data.at[i,'State']][data.at[i,'Days Out']-1] = (df[data.at[i,'State']][data.at[i,'Days Out']-1] +
data.at[i,'Close Share Price'])
#now, getting the average for each day out
averages = 90*[0.0]
for i in range(1,91):
for k in df.keys():
if k != 'x':
averages[i-1] = averages[i-1] + df[k][i-1]
averages[i-1] = averages[i-1]/35 #dividing by 35 because there are 35 markets each day, at least up to 90 days out
#using a built-in color palette to differentiate
palette = plt.get_cmap('Set1')
#plotting lines for the individual markets
num=0
for column in df.drop('x', axis=1):
num+=1
plt.plot(df['x'], df[column], marker='', color=palette(num), linewidth=1.0, alpha=0.9, label=column)
if num == 7:
break
#plotting average line
plt.plot(df['x'], averages, marker='', color='black', linewidth=1.0, alpha=0.9, label="average")
#adding a legend
plt.legend(loc='best', bbox_to_anchor=(1.1, -0.2), ncol=5)
#adding titles
plt.title("Sum of All Shares in a Market over Days Out", fontsize=14, fontweight=0, color='black')
plt.xlabel("Days Out")
plt.ylabel("Sum of All Shares in a Market", fontsize=10)
#showing the graph
plt.show()
<ipython-input-14-3cf78eb947fd>:10: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy df[data.at[i,'State']][data.at[i,'Days Out']-1] = (df[data.at[i,'State']][data.at[i,'Days Out']-1] +
As we can see (which will be important for a decision shortly), the prices of all contracts in a market usually sum to a value greater than a dollar.
First, we'll start with a simple hypothesis - that, at least in the U.S. Senate 2020 markets, the contract price and the observed probability (meaning, the proportions of the contracts that resolved to yes) are related. We can test this hypothesis by using the stats library to get both the p-value and the r-value of the relationship between contract price and observed probability in our dataset. The p-value tells us the likelihood of observing the sort of variance that we see in the data if the two variables (in our case, contract price and observed probability) were completely unrelated; a very low p-value (0.05 is a conventional cut off value for "very low") tells us that the observed data is unlikely to have occured if no relationship between the variables existed, and we can conclude that a relationship does exist between the two variables, and that what we are seeing is not just random noise. Meanwhile, the r-value tells us the strength of the relationship: the correlation between the two variables.
Though observed probability will be on the y-axis in the graph below, it is not really a dependent variable (as a variable on the y-axis usually is); in reality, both contract price and observed probability are dependent on other variables, such as the demographics of the state, the candidate's performance on the campaign trail, and early voting laws in the state, which all impact both gamblers' evalution of the odds and the true odds of victory (which translate into the observed odds) of a given party. However, since one variable needs to go on the x-axis and one on the y-axis, I am placing the contract price on the x-axis because it will be easier to code up that way.
#importing the stats library, which will allow us to obtain the p and r-values
from scipy import stats
#price_to_proportion library: keys are prices, values will be lists containing two values - at index 0, the total number of
#resolutions to "Yes" for contracts of the price in the key, and at index 1, the total number of contracts of the price
#in the key; this will allow us to get the proportion which resolved to "Yes" later on
price_to_proportion = {}
#populating appropriately
for i in range(0,data.shape[0]-1):
if data.at[i, 'Days Out'] > 0:
price = data.at[i,'Close Share Price']
if price not in price_to_proportion:
price_to_proportion[price] = 2*[0]
if data.at[i, 'Winner'] == data.at[i, 'Contract Name']:
price_to_proportion[price][0] = price_to_proportion[price][0] + 1
price_to_proportion[price][1] = price_to_proportion[price][1] + 1
#lists which will be used to make the plot
#xs are the x-coordinates, which will go from 0.01 to 0.99
xs=[]
#ys are the y-coordinates, which will have the value of the proportion of contracts priced at the x-value
#which resolved to yes
ys=[]
#populating xs and ys appropriately
for k in price_to_proportion.keys():
xs.append(k)
ys.append(price_to_proportion[k][0]/price_to_proportion[k][1])
#creating the scatterplot
fig, ax = plt.subplots()
ax.scatter(xs, ys)
#regression line
slope, intercept, r_value, p_value, std_err = stats.linregress(xs,ys)
line = slope*x+intercept
#plotting the regression line
plt.plot(x, line, 'b', label='y={:.2f}x+{:.2f}'.format(slope,intercept))
#adding some text for labels, title, custom x-axis tick labels, etc.
plt.xlim(-0.01, 1.01)
plt.ylim(-0.01, 1.01)
ax.set_ylabel('Proportion Resolved to \"Yes\"')
ax.set_xlabel('Contract Price, in Dollars')
ax.set_title('Results versus Prices, pre-election')
blue_patch = mpatches.Patch(color='b', label='Regression Line')
plt.legend(handles=[blue_patch], loc = 'best', bbox_to_anchor=(1.1, -0.2))
plt.show()
#printing some stats for the analysis
print('Regression line: y = ' + str(slope) + 'x + ' + str(intercept)) #regression line
print('p-value = ' + str(p_value)) #p-value
print('r-value = ' + str(r_value)) #r-value
Regression line: y = 1.1230516912061947x + -0.06614136824221906 p-value = 1.2556881768033598e-38 r-value = 0.9089169655478043
As we can see from the p-value, the probability of seeing points like this if there was no relationship between contract price and observed (true) probability is well under 0.05, and thus we conclude that such a relationship does in fact exist. Further, the correlation between the two variables is very high, at around 91%! So, there is a strong relationship between a contract's price and the observed probability of that contract resolving to "Yes".
Additionally, we can use this output to predict the probability of a contract resolving to "Yes", given its price, using the regression line: simply plug in the contract price as the x-value, and the y-value returned is the predicted probability of that contract resolving to "Yes". Of course, this isn't entirely accurate, as probabilities greater than 1 can be returned, but it is worth noting that, along the interval from 0.01 to 0.99, y never deviates from x by more than 0.07; meaning, for any given price, we would predict the probability of that contract resolving to yes to be within 0.07 of that price. We can also see that the difference between x and y on the line gets greater around the edges, and disappears near 0.50. Meaning, if PredictIt's price implies something has a very high chance of happening, it is generally even likelier than implied by the price, and if PredictIt's price implies something has a very low chance of happening, it is generally even less likely than implied by the price, at least according to this linear model. But as we will see, there are other factors we need to consider.
Before testing any of our hypotheses relating to pricing accuracy, whether over time or over trading volume, it is important to establish a metric with which we can measure pricing accuracy. Fortunately, there are plenty of strong options that we could use. Unfortunately, the number of options is somewhat overwhelming, and which one to choose is unclear (though they would likely all work for our purposes). So, to help with the decision, I turned to my favorite stats gurus at FiveThirtyEight, as I knew that they had done an autopsy of their 2020 election models; I figured that whatever metric they used to evaluate the accuracy of their model's predictions there, I could use to evaluate the accuracy of PredictIt's "predictions" here. But the level of analysis in FiveThirtyEight's publicly-available analysis of their predictions' performance is closer to the level of the EDA above than to the level of rigorous hypothesis testing (I assume that they have done or will do this internally, though even if that's the case, we still don't have access to their metric). This is not helpful for, say, comparing performances across time, so I needed to look elsewhere for suggestions.
Luckily, an excellent stats blogs came to the rescue; in fact, the same one: FiveThirtyEight. More specifically, their sports section. FiveThirtyEight allows readers to compete against their NFL forecasting model by assigning their own probabalistic predictions to games and comparing accuracy. Of course, in their case as much as in ours, to properly compare accuracy requires a metric, and there FiveThirtyEight settled on a system based on Brier scores as their metric. We will use this metric as well, and refer to a score given by this method as an FTE-score.
Here is how an FTE-score is calculated for a given market on a given day:
We use implied probabality, and not price, because, as noted above, this method of scoring punishes overconfidence; since the PredictIt prices more often sum to over 1.00 than they do not (see EDA), this will lead to PredictIt getting lower FTE-scores due to seeming overconfidence. Also, we only use the Democratic contracts in a given market because, since implied probability adds up to 1.00 for the two parties, the scores on both sides will simply mirror each other (this is easy to prove mathematically, but we won't do so here), so including both of Democratic and Republican contracts will unnecessarily double our score.
So our hypothesis structure now changes slightly from the previous one: we hypothesize that PredictIt's implied probability (not price) becomes more accurate as days out from the election decreases. We want to see in the plot below that, as election day gets closer, PredictIt's markets' average FTE-score, which measures the accuracy of the implied probabilities of the markets, rises.
#for reasons that will become clear later (when we reuse pred), the pred dictionary will work in the following manner:
#keys are a string comprised of the 538 District and Days Out; for each row, this will be unique
#values are an array where index 0 is the days out, index 1 is the closing implied probability, and index 2 is whether or
#not the contract resolved to yes (1 means resolved to yes, 0 means it did not)
pred = {}
#populating pred appropriately
for i in range(0, data.shape[0]):
if data.at[i, 'Days Out'] > 0 and data.at[i, 'Contract Name'] == 'Democratic':
if data.at[i, 'Contract Name'] == data.at[i, 'Winner']:
pred[str(data.at[i, '538 District']) + str(data.at[i, 'Days Out'])] = (
[data.at[i, 'Days Out'], data.at[i, 'Closing Implied Probability'], 1])
else:
pred[str(data.at[i, '538 District']) + str(data.at[i, 'Days Out'])] = (
[data.at[i, 'Days Out'], data.at[i, 'Closing Implied Probability'], 0])
scores = [] #contains the FTE-scores of each row
days_out = [] #days out corresponding to the score above
markets_on_day = {} #keys are days out, values are the total number of markets on that day
#populating the above lists and dictionary appropriately
for k in pred.keys():
days_out.append(pred[k][0])
if pred[k][0] not in markets_on_day:
markets_on_day[pred[k][0]] = 0
markets_on_day[pred[k][0]] += 1
sq_diff = (pred[k][1] - pred[k][2])**2
scores.append(25 - (100 * sq_diff))
#keys are days out, values are the average PredictIt score on that day
avg_scores = {}
#populating avg_scores appropriately
for i in range(0,len(days_out)):
if days_out[i] not in avg_scores:
avg_scores[days_out[i]] = 0
avg_scores[days_out[i]] = avg_scores[days_out[i]] + scores[i]/markets_on_day[days_out[i]]
#creating the coordinates for the scatterplot
xs = []
ys = []
#populating appropriately
for k in avg_scores.keys():
xs.append(k)
ys.append(avg_scores[k])
#creating the plot
fig, ax = plt.subplots()
ax.scatter(xs, ys, color = 'blue')
#adding a vertical black line on each day a market is added, and a vertical red line on each day that a market is removed
for k in markets_on_day.keys():
if k > 50:
if markets_on_day[k-1] > markets_on_day[k]:
plt.axvline(x=k-1, linewidth = 0.5, color = 'black')
if markets_on_day[k-1] < markets_on_day[k]:
plt.axvline(x=k-1, linewidth = 0.5, color = 'red')
#plotting the regression line
slope, intercept, r_value, p_value, std_err = stats.linregress(xs,ys)
x=np.linspace(min(xs), max(xs))
line = slope*x+intercept
plt.plot(x, line, 'blue', label='y={:.2f}x+{:.2f}'.format(slope,intercept))
#adding some text for labels, title, custom x-axis tick labels, etc.
plt.xlim(min(xs)-5, max(xs)+10)
plt.ylim(min(ys)-2, max(ys)+2)
ax.set_xlabel('Days Out')
ax.set_ylabel('Average PredictIt FTE-score')
ax.set_title('Average PredictIt FTE-score versus Days Out')
blue_patch = mpatches.Patch(color='blue', label='Regression line')
black_patch = mpatches.Patch(color='black', label='New market added')
plt.legend(handles=[blue_patch, black_patch], loc = 'best', bbox_to_anchor=(1.1, -0.2))
plt.show()
#printing some stats for the analysis
print('Regression line: y = ' + str(slope) + 'x + ' + str(intercept)) #regression line
print('p-value = ' + str(p_value)) #p value
Regression line: y = -0.011496526838507463x + 15.071951345398222 p-value = 2.8417495420322203e-19
As we can see above, our hypothesis has more than ample evidence; assuming no relationship between days out and Predict's markets' average FTE-score as the null hypothesis, we can see that our p-value is orders of magnitude below 0.05, thus enabling us to reject the null hypothesis (though I recognize the null hypothesis we would really have to reject is the one that the relationship is positive, this can also be done easily given the absurdly low p-value here; this is the case in the upcoming analyses as well). So, we can reasonably conclude that PredictIt's implied odds become more accurate as election day draws closer.
However, a reader may wonder what, exactly, is happening with those spikes >100 days out? Why was PredictIt so much more volatile further out from the election? We can assume that this was due to the lower number of Senate markets present on PredictIt, which increased the day-by-day variance of the score of the average market (smaller sample leads to higher variance of the average). As we can see, after all of the markets are added (around 100 days out), the average FTE-score stabilizes, but previously the smaller number of markets was enabling huge swings (with a smaller number of markets, a huge swing in one market's accuracy would have a greater influence on the average accuracy). So, perhaps it is worth analyzing if the relationship between days out and average FTE-score still exists among only points where all markets were already added. And we can do this below.
#creating the coordinates for the scatterplot
xs = []
ys = []
#populating appropriately
for k in avg_scores.keys():
if markets_on_day[k] == 35: #as there are 35 total markets, we use 35 here
xs.append(k)
ys.append(avg_scores[k])
#creating the plot
fig, ax = plt.subplots()
ax.scatter(xs, ys, color = 'blue')
#plotting the regression line
slope, intercept, r_value, p_value, std_err = stats.linregress(xs,ys)
x=np.linspace(min(xs), max(xs))
line = slope*x+intercept
plt.plot(x, line, 'blue', label='y={:.2f}x+{:.2f}'.format(slope,intercept))
#adding some text for labels, title, custom x-axis tick labels, etc.
plt.xlim(min(xs)-5, max(xs)+10)
plt.ylim(min(ys)-2, max(ys)+2)
ax.set_xlabel('Days Out')
ax.set_ylabel('Average PredictIt FTE-score')
ax.set_title('Average PredictIt FTE-score versus Days Out')
blue_patch = mpatches.Patch(color='blue', label='Regression line')
plt.legend(handles=[blue_patch], loc = 'best', bbox_to_anchor=(1.1, -0.2))
plt.show()
#printing some stats for the analysis
print('Regression line: y = ' + str(slope) + 'x + ' + str(intercept)) #regression line
print('p-value = ' + str(p_value)) #p value
Regression line: y = -0.01139972396687458x + 17.77907506096637 p-value = 8.569003440248834e-12
Sure enough, among only points which average out PredictIt's FTE-scores for all thirty-five 2020 U.S. Senate races, we still observe an increase in average FTE-score as the election draws nearer, meaning, at least by how we are measuring accuracy, the odds implied by PredictIt's prices become more accurate as election day draws nearer.
A friend of mine once mentioned that he had heard Nate Silver on the FiveThirtyEight politics podcast state that the polls aren't especially indicative of the election results until 20 days before the election. Whether or not this is true, his statement inspired the question: does PredictIt's accuracy actually improve before 20 days prior to the election, or is the regression line over the 100+ day period above simply being dragged upward by the points within 20 days of the election? We can see if accuracy really does improve closer to the election in the plots below, which separate out the plot above into two periods: one for points within 20 days of the election, and one for points more than 20 days out from the election.
#plot for close to the election
#creating the coordinates for the scatterplot
xs = []
ys = []
#populating appropriately
for k in avg_scores.keys():
if markets_on_day[k] == 35 and k <= 20: #taking points within 20 days, inclusively
xs.append(k)
ys.append(avg_scores[k])
#creating the plot
fig, ax = plt.subplots()
ax.scatter(xs, ys, color = 'blue')
#plotting the regression line
slope, intercept, r_value, p_value, std_err = stats.linregress(xs,ys)
x=np.linspace(min(xs), max(xs))
line = slope*x+intercept
plt.plot(x, line, 'blue', label='y={:.2f}x+{:.2f}'.format(slope,intercept))
#adding some text for labels, title, custom x-axis tick labels, etc.
plt.xlim(min(xs)-1, max(xs)+1)
plt.ylim(min(ys)-2, max(ys)+2)
ax.set_xlabel('Days Out')
ax.set_ylabel('Average PredictIt FTE-score')
ax.set_title('Average PredictIt FTE-score versus Days Out')
blue_patch = mpatches.Patch(color='blue', label='Regression line')
plt.legend(handles=[blue_patch], loc = 'best', bbox_to_anchor=(1.1, -0.2))
plt.show()
#printing some stats for the analysis
print('Regression line for the above plot: y = ' + str(slope) + 'x + ' + str(intercept)) #regression line
print('p-value for the above plot = ' + str(p_value)) #p value
#plot for farther from the election
#creating the coordinates for the scatterplot
xs = []
ys = []
#populating appropriately
for k in avg_scores.keys():
if markets_on_day[k] == 35 and k > 20: #this will get the points further than 20 days out and where all 35 markets exist
xs.append(k)
ys.append(avg_scores[k])
#creating the plot
fig, ax = plt.subplots()
ax.scatter(xs, ys, color = 'blue')
#plotting the regression line
slope, intercept, r_value, p_value, std_err = stats.linregress(xs,ys)
x=np.linspace(min(xs), max(xs))
line = slope*x+intercept
plt.plot(x, line, 'blue', label='y={:.2f}x+{:.2f}'.format(slope,intercept))
#adding some text for labels, title, custom x-axis tick labels, etc.
plt.xlim(min(xs)-1, max(xs)+1)
plt.ylim(min(ys)-2, max(ys)+2)
ax.set_xlabel('Days Out')
ax.set_ylabel('Average PredictIt FTE-score')
ax.set_title('Average PredictIt FTE-score versus Days Out')
blue_patch = mpatches.Patch(color='blue', label='Regression line')
plt.legend(handles=[blue_patch], loc = 'best', bbox_to_anchor=(1.1, -0.2))
plt.show()
#printing some stats for the analysis
print('Regression line for the above plot: y = ' + str(slope) + 'x + ' + str(intercept)) #regression line
print('p-value for the above plot = ' + str(p_value)) #p value
Regression line for the above plot: y = -0.11510913142861182x + 19.163977221197026 p-value for the above plot = 7.362587585689751e-11
Regression line for the above plot: y = -0.003327560941175766x + 17.210453283207254 p-value for the above plot = 0.011174489538679578
This first plot shows an expected result: in the 20 days leading up to the election, at least in our sample, PredictIt becomes more accurate, with a rising FTE-score in that timeframe. We see this in the very low p-value (less than 1 in 10 billion).
But, somewhat suprisingly, accuracy grows over time even further than 20 days out from the election. The second plot shows that, in the timeframe starting when all thirty-five markets were up and running until 21 days out from the election, the FTE-score still grows on average, albiet slower than in the timeframe from the first plot (by about 0.0033 points per day here versus by about 0.1151 points per day there). And with a p-value near 0.01, we can be fairly confident in this result.
So, we have shown another cool result: for the 2020 Senate races, PredictIt became more accurate, on average, as election day approached, and this growth in accuracy sped up as the election got especially close. And this intuitively makes sense, as uncertainty declines closer to the election - debates have already happened, and we know who won them; scandalous stories are less likely to break with so little time left; new national crises and issues are less likely to emerge in a smaller timeframe - leading the markets to become more confident in their assessments, thus raising the FTE-score relative to what it was weeks before when their predictions end up being correct. Occasionally, of course, the market will become more confident in the wrong outcome, but, as we see above, this is more than offset by the occasions where the market will become more confident in the correct outcome - at least, for the 2020 U.S. Senate races.
Before moving on to the next section, let's test one more hypothesis: namely, that a relationship exists between trade volume and accuracy. We can plot trade volume versus the average FTE-score of a market at that trade volume below.
#the pred2 dictionary will work in the following manner:
#keys are a string comprised of the 538 District, Days Out, and Contract Name; for each row, this will be unique
#values are an array: index 0 is the trade volume, index 1 is the closing implied probability, and index 2 is whether or
#not the contract resolved to yes (1 means resolved to yes, 0 means it did not)
pred2 = {}
#populating pred2 appropriately
for i in range(0, data.shape[0]):
if data.at[i, 'Days Out'] > 0 and data.at[i, 'Contract Name'] == 'Democratic':
if data.at[i, 'Contract Name'] == data.at[i, 'Winner']:
pred2[str(data.at[i, '538 District']) + str(data.at[i, 'Days Out'])] = (
[data.at[i, 'Trade Volume'], data.at[i, 'Closing Implied Probability'], 1])
else:
pred2[str(data.at[i, '538 District']) + str(data.at[i, 'Days Out'])] = (
[data.at[i, 'Trade Volume'], data.at[i, 'Closing Implied Probability'], 0])
scores = [] #contains the FTE-scores of each row
vols = [] #trade volumes corresponding to the score above
markets_at_vol = {} #keys are volumes, values are the total number of markets at that volume
#populating the above lists and dictionary appropriately
for k in pred2.keys():
vols.append(pred2[k][0])
if pred2[k][0] not in markets_at_vol:
markets_at_vol[pred2[k][0]] = 0
markets_at_vol[pred2[k][0]] += 1
sq_diff = (pred2[k][1] - pred2[k][2])**2
scores.append(25 - (100 * sq_diff))
#keys are days out, values are the average PredictIt score on that day
avg_scores = {}
#populating avg_scores appropriately
for i in range(0,len(vols)):
if vols[i] not in avg_scores:
avg_scores[vols[i]] = 0
avg_scores[vols[i]] = avg_scores[vols[i]] + scores[i]/markets_at_vol[vols[i]]
#creating the coordinates for the scatterplot
xs = []
ys = []
#populating appropriately
for k in avg_scores.keys():
xs.append(k)
ys.append(avg_scores[k])
#creating the plot
fig, ax = plt.subplots()
ax.scatter(xs, ys, color = 'blue')
#plotting the regression line
slope, intercept, r_value, p_value, std_err = stats.linregress(xs,ys)
x=np.linspace(min(xs), max(xs))
line = slope*x+intercept
plt.plot(x, line, 'blue', label='y={:.2f}x+{:.2f}'.format(slope,intercept))
#adding some text for labels, title, custom x-axis tick labels, etc.
plt.xlim(min(xs)-5, max(xs)+10)
plt.ylim(min(ys)-2, max(ys)+4)
ax.set_xlabel('Trade Volume')
ax.set_ylabel('Average PredictIt FTE-score')
ax.set_title('Average PredictIt FTE-score versus Trade Volume')
blue_patch = mpatches.Patch(color='blue', label='Regression line')
plt.legend(handles=[blue_patch], loc = 'best', bbox_to_anchor=(1.1, -0.2))
plt.show()
#printing some stats for the analysis
print('Regression line: y = ' + str(slope) + 'x + ' + str(intercept)) #regression line
print('p-value = ' + str(p_value)) #p value
Regression line: y = -2.15101058419969e-05x + 10.871882612978396 p-value = 0.7694872022901091
And, as we can see from the high p-value, it is very likely that we could see results like this if no relationship existed between FTE-score (accuracy) and trade volume; so, we cannot conclude that such a relationship exists.
I found analyzing the above data to be a both fun and informative excersize. But, as the reader may have noticed, this analysis was only step one of a larger plan. Step two: comparing PredictIt's performance to FiveThirtyEight's. When I began working on this project, I was not planning on this comparison, but once I realized how simple it was to obtain FiveThirtyEight's data, I became very excited to compare the performance of the two. So, let's do it.
First, we need to load in FiveThirtyEight's data.
#loading in 538 data
ftedata = pandas.read_csv("election-forecasts-2020/senate_state_toplines_2020.csv")
ftedata.head()
cycle | branch | district | forecastdate | expression | name_D1 | name_D2 | name_D3 | name_D4 | name_I1 | ... | wonrunoff_R2 | lostrunoff_R2 | wonrunoff_R3 | lostrunoff_R3 | wonrunoff_R4 | lostrunoff_R4 | wonrunoff_I1 | lostrunoff_I1 | simulations | timestamp | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 2020 | Senate | WY-S2 | 11/3/20 | _lite | Merav Ben-David | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 40000 | 00:19:54 3 Nov 2020 |
1 | 2020 | Senate | WV-S2 | 11/3/20 | _lite | Paula Jean Swearengin | NaN | NaN | NaN | Franklin Riley | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 40000 | 00:19:54 3 Nov 2020 |
2 | 2020 | Senate | VA-S2 | 11/3/20 | _lite | Mark Warner | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 40000 | 00:19:54 3 Nov 2020 |
3 | 2020 | Senate | TX-S2 | 11/3/20 | _lite | M.J. Hegar | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 40000 | 00:19:54 3 Nov 2020 |
4 | 2020 | Senate | TN-S2 | 11/3/20 | _lite | Marquita Bradshaw | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 40000 | 00:19:54 3 Nov 2020 |
5 rows × 88 columns
As we can see, there are a lot of unnecessary columns, but by looking through the csv file in excel and planning ahead, we can decide now which columns are relevant, and remove the unneeded ones. We only want to keep district, forecastdate (from what I can tell, the excel file only contains one recording per day, so the timestamp, which also tells us the time of day, is irrelevant; since we will be comparing FiveThirtyEight's predictions to PredictIt closing prices, and we know the forecast update occured before the end of the day, we can take whatever odds are listed on the forecast date as being for that date's end), winner_Dparty (the probability the model gives to Democratic victory), and winner_Rparty (the probability the model gives to Republican victory).
#removing unnecessary columns
ftedata = ftedata[['district', 'winner_Dparty', 'winner_Rparty', 'forecastdate']]
ftedata.head()
district | winner_Dparty | winner_Rparty | forecastdate | |
---|---|---|---|---|
0 | WY-S2 | 0.008025 | 0.991975 | 11/3/20 |
1 | WV-S2 | 0.036900 | 0.963100 | 11/3/20 |
2 | VA-S2 | 0.993375 | 0.006625 | 11/3/20 |
3 | TX-S2 | 0.158875 | 0.841125 | 11/3/20 |
4 | TN-S2 | 0.018150 | 0.981850 | 11/3/20 |
Then, as we did with the PredictIt data, let's label each entry with how many days out it is from the election. We will simply take the date portion of the timestamp, and subtract it from the actual election day to get this value.
#labelling each entry with how far out it is from election day
far_out = []
for i in range(0, ftedata.shape[0]):
far_out.append((pandas.Timestamp('20201103') - pandas.Timestamp(ftedata.at[i,'forecastdate'])).days)
ftedata['Days Out'] = far_out
ftedata.head()
district | winner_Dparty | winner_Rparty | forecastdate | Days Out | |
---|---|---|---|---|---|
0 | WY-S2 | 0.008025 | 0.991975 | 11/3/20 | 0 |
1 | WV-S2 | 0.036900 | 0.963100 | 11/3/20 | 0 |
2 | VA-S2 | 0.993375 | 0.006625 | 11/3/20 | 0 |
3 | TX-S2 | 0.158875 | 0.841125 | 11/3/20 | 0 |
4 | TN-S2 | 0.018150 | 0.981850 | 11/3/20 | 0 |
Unlike what we did with the PredictIt data, we will not add a column here for implied two-party win probability, as the structure of the data allows this to be calculated easily as we iterate through: implied two-party Democratic win probability = winner_Dparty/(winner_Dparty + winner_Rparty).
Before doing anything, we must note that FiveThirtyEight and PredictIt can only be compared on common predictions. Meaning, if PredictIt has a market which opened before FiveThirtyEight publicized their model, or if FiveThirtyEight publicized their model before a given prediction market opened on PredictIt, there is nothing to be compared. Fortunately, we can easily see that all FiveThirtyEight forecasts occur after their corresponding markets opened on PredictIt: recall from the plots above that all 2020 Senate prediction markets were open by 100 days out from the election. Then, we only need to see that the highest value of days out in the FiveThirtyEight dataset is less than 100:
max_days_out = 0
for i in range(0, ftedata.shape[0]):
if ftedata.at[i, 'Days Out'] > max_days_out:
max_days_out = ftedata.at[i, 'Days Out']
print("The highest value of days out in the FiveThirtyEight data is " + str(max_days_out))
The highest value of days out in the FiveThirtyEight data is 94
So, any prediction made by FiveThirtyEight has a corresponding prediction on PredictIt. Additionally, the corresponding prediction of a FiveThirtyEight prediction is made easy to find by the structure of the pred dictionary, created earlier.
That said, let's do our first bit of exploration: seeing how the implied Democratic odds of FiveThirtyEight compare to the implied Democratic odds of PredictIt in common markets, via both a scatterplot and a hard number.
#first, we will make an fte dictionary with a similar structure to the pred dictionary, enabling easy finding of
#corresponding predictions
#keys are a string comprised of the 538 District and Days Out; for each row, this will be unique
#values are the 538 implied Democratic win probability on that date
#the corresponding PredictIt prediction is the 1 index of the value at the same key in pred;
#the result is stored at the same key in pred, index 2
fte = {}
#populating fte accordingly
for i in range(0, ftedata.shape[0]):
if ftedata.at[i, 'Days Out'] > 0:
fte[str(ftedata.at[i, 'district']) + str(ftedata.at[i, 'Days Out'])] = ftedata.at[i,'winner_Dparty']/(
ftedata.at[i,'winner_Dparty'] + ftedata.at[i,'winner_Rparty'])
#x and y coords for the upcoming scatterplot; xs will be for 538 implied probabilities, ys will be for PredictIt
xs = []
ys = []
#for calculating the total PredictIt democratic bias, relative to 538
total_pred_dem_bias = 0.0
#populating accordingly
for k in fte.keys():
xs.append(fte[k])
ys.append(pred[k][1])
total_pred_dem_bias += pred[k][1] - fte[k]
#average = total/(number of samples)
average_pred_dem_bias = total_pred_dem_bias/len(fte.keys())
#creating the plot
fig, ax = plt.subplots()
ax.scatter(xs, ys)
x=np.linspace(min(xs), max(xs))
line = 1*x
plt.plot(x, line, 'black', label='y=x')
#labelling
plt.xlim(-0.01, 1.01)
plt.ylim(-0.01, 1.01)
ax.set_xlabel('538 Implied Dem. Win Prob.')
ax.set_ylabel('PredictIt Implied Dem. Win Prob.')
ax.set_title('538 probability versus PredictIt probability')
black_patch = mpatches.Patch(color='black', label='y=x line')
plt.legend(handles=[black_patch], loc = 'best', bbox_to_anchor=(1.1, -0.2))
plt.show()
print("Relative to FiveThirtyEight, PredictIt has an average Democratic bias of " + str(average_pred_dem_bias))
Relative to FiveThirtyEight, PredictIt has an average Democratic bias of -0.0016078394767594624
Points above the black line are points where PredictIt gave a higher Democratic win probability than FiveThirtyEight, while for points below the black line, the opposite is the case. If all points were above the line, it would indicate an overall PredictIt Democratic bias relative to FiveThirtyEight, whereas if all points were below the line, it would indicate an overall PredictIt Republican bias relative to FiveThirtyEight. Though this is not clear visually, it appears from the text output that PredictIt has a slight Republican bias relative to FiveThirtyEight. But we can test the significance of this properly in our hypothesis testing.
To possibly get an idea of relative accuracy, we can also visualize how the races turned out using a multicolored plot, similar to the one above.
#x and y coords for the upcoming scatterplot; xs will be for 538 implied probabilities, ys will be for PredictIt
xs = []
ys = []
#will label points blue for a Democratic victory, and red for a Republican victory
labels = []
#populating accordingly
for k in fte.keys():
xs.append(fte[k])
ys.append(pred[k][1])
if pred[k][2] == 1: #the Democrats won
labels.append('blue')
else:
labels.append('red')
#creating the plot
fig, ax = plt.subplots()
ax.scatter(xs, ys, color = labels, alpha = 0.75) #reducing opacity so that overlapping points can both be somewhat seen
x=np.linspace(min(xs), max(xs))
line = 1*x
plt.plot(x, line, 'black', label='y=x')
#labelling
plt.xlim(-0.01, 1.01)
plt.ylim(-0.01, 1.01)
ax.set_xlabel('538 Implied Dem. Win Prob.')
ax.set_ylabel('PredictIt Implied Dem. Win Prob.')
ax.set_title('538 probability versus PredictIt probability')
blue_patch = mpatches.Patch(color='black', label='y=x line')
plt.legend(handles=[blue_patch], loc = 'best', bbox_to_anchor=(1.1, -0.2))
plt.show()
From the above plot, it appears both FiveThirtyEight and PredictIt generally underweighted Republicans in 2020 Senate races, as there are many more red points (indicating Republican victory) in cases where both FiveThirtyEight and PredictIt favored the Democrats than there are blue points in the inverse situation. This plot does not make relative accuracy clear, though it is worth noting that there are two red clusters near the middle of the plot which are significantly above the line, implying FiveThirtyEight had been much more accurate than PredictIt in those cases. Further, it is worth noting that no candidate which had been given above a 90% win probability at any point in the election lost. We will use this fact in accuracy comparison later.
As we saw earlier, though, at least with PredictIt, predictions become more accurate closer to the election; perhaps it is worth viewing a plot similar to the above with just points from the day before the election.
#x and y coords for the upcoming scatterplot; xs will be for 538 implied probabilities, ys will be for PredictIt
xs = []
ys = []
#will label points blue for a Democratic victory, and red for a Republican victory
labels = []
#populating accordingly
for k in fte.keys():
if pred[k][0] == 1:
xs.append(fte[k])
ys.append(pred[k][1])
if pred[k][2] == 1: #the Democrats won
labels.append('blue')
else:
labels.append('red')
#creating the plot
fig, ax = plt.subplots()
ax.scatter(xs, ys, color = labels)
x=np.linspace(min(xs), max(xs))
line = 1*x
plt.plot(x, line, 'black', label='y=x')
#labelling
plt.xlim(-0.01, 1.01)
plt.ylim(-0.01, 1.01)
ax.set_xlabel('538 Implied Dem. Win Prob.')
ax.set_ylabel('PredictIt Implied Dem. Win Prob.')
ax.set_title('538 probability versus PredictIt probability')
blue_patch = mpatches.Patch(color='black', label='y=x line')
plt.legend(handles=[blue_patch], loc = 'best', bbox_to_anchor=(1.1, -0.2))
plt.show()
On the day before the election, a PredictIt Democratic bias relative to FiveThirtyEight appears more clearly from the high number of points below the y=x line. So, perhaps we should see if bias changes over time in the next section as well.
There are really two hypotheses we want to test: is PredictIt biased relative to FiveThirtyEight, and which of the two is more accurate?
First, we will test bias. We will plot PredictIt's bias relative to FiveThirtyEight over days out from the election, and see whether either the overall bias (indicated by the intercept of the graph; the "const" row in the upcoming chart has the data for this) or the change in bias over time (indicated by the x-coefficient; the "x1" row in the upcoming chart has the data for this) is statistically significant.
#We will need this library later, as we now also need the p-value of the intercept
import statsmodels.api as sm
xs = range(1,95) #xs are days out, which we know range from 1 to 94 in fte
ys = 94 * [0] #ys will be the average PredictIt democratic bias on a given day; index i stores average bias for day i+1
#populating appropriately
for k in fte.keys():
dayout = pred[k][0]
ys[dayout-1] += (pred[k][1] - fte[k])/35
#to get the average, we need to divide everything by 35, as that is the total number of predictions on each day;
#so, add (predictit prob - fte prob)/35
#creating the plot
fig, ax = plt.subplots()
ax.scatter(xs, ys, color = 'red')
#regression line
slope, intercept, r_value, p_value, std_err = stats.linregress(xs,ys)
x=np.linspace(min(xs), max(xs))
line = slope*x+intercept
plt.plot(x, line, 'black', label='y={:.2f}x+{:.2f}'.format(slope,intercept))
#labelling
plt.xlim(min(xs)-1, max(xs)+5)
plt.ylim(min(ys)-0.001, max(ys)+0.001)
ax.set_xlabel('Days Out')
ax.set_ylabel('PredictIt Dem. Bias Relative to 538')
ax.set_title('PredictIt Dem. Bias Relative to 538, on Days Out')
black_patch = mpatches.Patch(color='black', label='Regression Line')
plt.legend(handles=[black_patch], loc = 'best', bbox_to_anchor=(1.1, -0.2))
plt.show()
print('Regression line: y = ' + str(slope) + 'x + ' + str(intercept))
#getting statistics
reshapedx = []
for i in range(0,94):
reshapedx.append([xs[i]])
reshapedx2 = sm.add_constant(reshapedx)
est = sm.OLS(ys,reshapedx2)
est = est.fit()
print(est.summary())
Regression line: y = 0.00020764315155322055x + -0.011470889175537435 OLS Regression Results ============================================================================== Dep. Variable: y R-squared: 0.149 Model: OLS Adj. R-squared: 0.139 Method: Least Squares F-statistic: 16.05 Date: Wed, 15 Dec 2021 Prob (F-statistic): 0.000125 Time: 00:12:35 Log-Likelihood: 271.37 No. Observations: 94 AIC: -538.7 Df Residuals: 92 BIC: -533.7 Df Model: 1 Covariance Type: nonrobust ============================================================================== coef std err t P>|t| [0.025 0.975] ------------------------------------------------------------------------------ const -0.0115 0.003 -4.046 0.000 -0.017 -0.006 x1 0.0002 5.18e-05 4.006 0.000 0.000 0.000 ============================================================================== Omnibus: 5.113 Durbin-Watson: 0.074 Prob(Omnibus): 0.078 Jarque-Bera (JB): 2.687 Skew: 0.151 Prob(JB): 0.261 Kurtosis: 2.229 Cond. No. 110. ============================================================================== Notes: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
Curiously, PredictIt seems to be flipping back and forth in its bias relative to 538 until about a month out, when it begins to head sharply towards a Republican bias. Twenty days out, the bias becomes Republican for the first time since about thirty-six days out, and by the day before the election, PredictIt's Republican bias in the 2020 U.S. Senate markets averages 3%. Further, as we can see in the table (P>|t| column), both p-values are significantly lower than 0.05; we can therefore conclude that the trendline above is significant.
But what if, like with accuracy before, we broke down bias into two graphs: one twenty days out and closer to the election, and one over twenty days out from the election. We would almost certainly still see a trend near the election, but would there be any meaningful bias farther out? Let's see.
#creating the plot for 20 days out and closer
xs_close = xs[0:20]
ys_close = ys[0:20]
fig, ax = plt.subplots()
ax.scatter(xs_close, ys_close, color = 'red')
#regression line
slope, intercept, r_value, p_value, std_err = stats.linregress(xs_close,ys_close)
x=np.linspace(min(xs_close), max(xs_close))
line = slope*x+intercept
plt.plot(x, line, 'black', label='y={:.2f}x+{:.2f}'.format(slope,intercept))
#labelling
plt.xlim(min(xs_close)-1, max(xs_close)+5)
plt.ylim(min(ys_close)-0.001, max(ys_close)+0.001)
ax.set_xlabel('Days Out')
ax.set_ylabel('PredictIt Dem. Bias')
ax.set_title('PredictIt Dem. Bias Relative to 538, on Days Out, Near Election Day')
black_patch = mpatches.Patch(color='black', label='Regression Line')
plt.legend(handles=[black_patch], loc = 'best', bbox_to_anchor=(1.1, -0.2))
plt.show()
print('Regression line for near the election: y = ' + str(slope) + 'x + ' + str(intercept))
reshapedx_close = []
for i in range(0,20):
reshapedx_close.append([xs_close[i]])
reshapedx2_close = sm.add_constant(reshapedx_close)
est = sm.OLS(ys_close,reshapedx2_close)
est = est.fit()
print(est.summary())
#creating the plot for 21 days out and farther
xs_far = xs[20:]
ys_far = ys[20:]
fig, ax = plt.subplots()
ax.scatter(xs_far, ys_far, color = 'red')
#regression line
slope, intercept, r_value, p_value, std_err = stats.linregress(xs_far,ys_far)
x=np.linspace(min(xs_far), max(xs_far))
line = slope*x+intercept
plt.plot(x, line, 'black', label='y={:.2f}x+{:.2f}'.format(slope,intercept))
#labelling
plt.xlim(min(xs_far)-1, max(xs_far)+5)
plt.ylim(min(ys_far)-0.001, max(ys_far)+0.001)
ax.set_xlabel('Days Out')
ax.set_ylabel('PredictIt Dem. Bias')
ax.set_title('PredictIt Dem. Bias Relative to 538, on Days Out, Farther Out')
black_patch = mpatches.Patch(color='black', label='Regression Line')
plt.legend(handles=[black_patch], loc = 'best', bbox_to_anchor=(1.1, -0.2))
plt.show()
print('Regression line for farther from the election: y = ' + str(slope) + 'x + ' + str(intercept))
#getting statistics
reshapedx_far = []
for i in range(0,74):
reshapedx_far.append([xs_far[i]])
reshapedx2_far = sm.add_constant(reshapedx_far)
est = sm.OLS(ys_far,reshapedx2_far)
est = est.fit()
print(est.summary())
Regression line for near the election: y = 0.001663616365416215x + -0.040173981110693494 OLS Regression Results ============================================================================== Dep. Variable: y R-squared: 0.718 Model: OLS Adj. R-squared: 0.703 Method: Least Squares F-statistic: 45.87 Date: Wed, 15 Dec 2021 Prob (F-statistic): 2.41e-06 Time: 00:12:42 Log-Likelihood: 73.910 No. Observations: 20 AIC: -143.8 Df Residuals: 18 BIC: -141.8 Df Model: 1 Covariance Type: nonrobust ============================================================================== coef std err t P>|t| [0.025 0.975] ------------------------------------------------------------------------------ const -0.0402 0.003 -13.653 0.000 -0.046 -0.034 x1 0.0017 0.000 6.773 0.000 0.001 0.002 ============================================================================== Omnibus: 1.382 Durbin-Watson: 0.574 Prob(Omnibus): 0.501 Jarque-Bera (JB): 1.217 Skew: -0.485 Prob(JB): 0.544 Kurtosis: 2.280 Cond. No. 25.0 ============================================================================== Notes: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
Regression line for farther from the election: y = -0.00019453894353564514x + 0.015280357829800618 OLS Regression Results ============================================================================== Dep. Variable: y R-squared: 0.206 Model: OLS Adj. R-squared: 0.194 Method: Least Squares F-statistic: 18.63 Date: Wed, 15 Dec 2021 Prob (F-statistic): 4.99e-05 Time: 00:12:42 Log-Likelihood: 250.74 No. Observations: 74 AIC: -497.5 Df Residuals: 72 BIC: -492.9 Df Model: 1 Covariance Type: nonrobust ============================================================================== coef std err t P>|t| [0.025 0.975] ------------------------------------------------------------------------------ const 0.0153 0.003 5.527 0.000 0.010 0.021 x1 -0.0002 4.51e-05 -4.316 0.000 -0.000 -0.000 ============================================================================== Omnibus: 3.381 Durbin-Watson: 0.144 Prob(Omnibus): 0.184 Jarque-Bera (JB): 2.573 Skew: 0.403 Prob(JB): 0.276 Kurtosis: 3.430 Cond. No. 176. ============================================================================== Notes: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
While the near-election output turned out as expected, the output farther out showed something seemingly crazy - a trend in the opposite direction! And the p-values in all cases are low enough to consider the results significant.
So, farther out from the election, PredictIt tends to show a slight Democratic bias relative to 538, while when the election gets close, PredictIt tends to show a heavy Republican bias relative to 538. We will break this down more in the conclusion.
Now, the moment I've been most excited for: let's compare PredictIt's accuracy to FiveThirtyEight's! Again, we will be using FTE-scores as our metric to measure accuracy; after all, if FiveThirtyEight is deemed to be less accurate than PredictIt by their own metric, surely they would not dispute the result! We will do this comparison over time, as we found earlier (at least in the case of PredictIt) that accuracy changes over time significantly.
xs = range(1,95) #Days out
ys_pred = 94*[0] #PredictIt average FTE-score on i+1 days out
ys_fte = 94*[0] #538 average FTE-score on i+1 days out
#populating the two sets of ys accordingly
for k in fte.keys():
pred_score = (pred[k][1] - pred[k][2])**2
fte_score = (fte[k] - pred[k][2])**2
ys_pred[pred[k][0]-1] += (25.0 - (100.0 * pred_score))/35 #dividing by 35 so it ends up being an average
ys_fte[pred[k][0]-1] += (25.0 - (100.0 * fte_score))/35 #dividing by 35 so it ends up being an average
#creating the plot
fig, ax = plt.subplots()
ax.scatter(xs, ys_pred, color = 'blue') #coloring PredictIt points blue
ax.scatter(xs, ys_fte, color = 'orange') #coloring 538 points blue
#labelling
plt.xlim(min(xs)-1, max(xs)+5)
plt.ylim(min(ys_pred)-2, max(ys_fte)+2)
ax.set_xlabel('Days Out')
ax.set_ylabel('Average FTE-Score')
ax.set_title('PredictIt Score versus 538 Score, on Days Out')
blue_patch = mpatches.Patch(color='blue', label='PredictIt scores')
orange_patch = mpatches.Patch(color='orange', label='538 scores')
plt.legend(handles=[blue_patch, orange_patch], loc = 'best', bbox_to_anchor=(1.1, -0.2))
plt.show()
This is intense! They seem to be running neck-and-neck; let's plot out the differences in scores and see if we can pull out anything statistically significant.
#we already have the xs from last time, and the ys are easy to create: ys[i] = ys_pred[i] - ys_fte[i]
ys = 94 * [0]
for i in range(0,94):
ys[i] = ys_pred[i] - ys_fte[i]
#creating the plot
fig, ax = plt.subplots()
ax.scatter(xs, ys, color = 'red')
#regression line
slope, intercept, r_value, p_value, std_err = stats.linregress(xs,ys)
x=np.linspace(min(xs), max(xs))
line = slope*x+intercept
plt.plot(x, line, 'black', label='y={:.2f}x+{:.2f}'.format(slope,intercept))
#labelling
plt.xlim(min(xs)-1, max(xs)+5)
plt.ylim(min(ys)-0.1, max(ys)+0.1)
ax.set_xlabel('Days out')
ax.set_ylabel('PredictIt Score - 538 Score')
ax.set_title('PredictIt Score versus 538 Score, on Days Out')
blue_patch = mpatches.Patch(color='blue', label='PredictIt scores')
orange_patch = mpatches.Patch(color='orange', label='538 scores')
plt.legend(handles=[blue_patch, orange_patch], loc = 'best', bbox_to_anchor=(1.1, -0.2))
plt.show()
#getting statistics
reshapedx = []
for i in range(0,94):
reshapedx.append([xs[i]])
reshapedx2 = sm.add_constant(reshapedx)
est = sm.OLS(ys,reshapedx2)
est = est.fit()
print(est.summary())
OLS Regression Results ============================================================================== Dep. Variable: y R-squared: 0.038 Model: OLS Adj. R-squared: 0.027 Method: Least Squares F-statistic: 3.590 Date: Wed, 15 Dec 2021 Prob (F-statistic): 0.0613 Time: 00:15:33 Log-Likelihood: -20.878 No. Observations: 94 AIC: 45.76 Df Residuals: 92 BIC: 50.84 Df Model: 1 Covariance Type: nonrobust ============================================================================== coef std err t P>|t| [0.025 0.975] ------------------------------------------------------------------------------ const 0.1051 0.064 1.656 0.101 -0.021 0.231 x1 -0.0022 0.001 -1.895 0.061 -0.005 0.000 ============================================================================== Omnibus: 1.133 Durbin-Watson: 0.488 Prob(Omnibus): 0.567 Jarque-Bera (JB): 1.215 Skew: -0.239 Prob(JB): 0.545 Kurtosis: 2.713 Cond. No. 110. ============================================================================== Notes: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
Neither the constant nor the x-coefficient has a low enough p-value to deem either of them statistically significant; they are 0.101 and 0.061 respectively, and both are greater than 0.05. So, we cannot conclude whether FiveThirtyEight or PredictIt is more accurate overall in a statistically significant way over this dataset, though we can view accuracy winners on individual days using the plots above.
It may also be worth examining who was more accurate in only close races, which I'll define as a race in which both FiveThirtyEight and PredictIt never gave either party a 90% chance or higher of winning; as we saw earlier, in any race which we would not call "close", the winner was of the favored party. Let's examine this in the scatterplot below.
#first, we need to figure out which markets are not close, so we can exclude them in construction of our ys;
#note that the first two digits of the fte key indicate the unique race in all cases except for Georgia, where two
#elections were occuring. However, this will not pose any issues, as neither party in either of the two Georgia
#races was given over a 90% chance of victory at any point within 94 days of the election by either model
#creating a dictionary of races to exclude; keys will be the two digit state abbrevation, and values will be true;
#if a state is not present, than it should not be excluded
to_exclude = {}
for k in fte.keys():
if fte[k] <= 0.10 or fte[k] >= 0.90:
to_exclude[k[0:2]] = True
to_exclude
xs = range(1,95) #Days out
ys_pred = 94*[0] #PredictIt average FTE-score on i+1 days out
ys_fte = 94*[0] #538 average FTE-score on i+1 days out
#populating the two sets of ys accordingly
for k in fte.keys():
if k[0:2] not in to_exclude:
pred_score = (pred[k][1] - pred[k][2])**2
fte_score = (fte[k] - pred[k][2])**2
ys_pred[pred[k][0]-1] += (25.0 - (100.0 * pred_score))/(35-len(to_exclude.keys())) #dividing by no. of predictions
ys_fte[pred[k][0]-1] += (25.0 - (100.0 * fte_score))/(35-len(to_exclude.keys())) #dividing by number of predictions
#creating the plot
fig, ax = plt.subplots()
ax.scatter(xs, ys_pred, color = 'blue') #coloring PredictIt points blue
ax.scatter(xs, ys_fte, color = 'orange') #coloring 538 points blue
#labelling
plt.xlim(min(xs)-1, max(xs)+5)
plt.ylim(min(ys_pred)-2, max(ys_fte)+2)
ax.set_xlabel('Days Out')
ax.set_ylabel('Average FTE-Score')
ax.set_title('PredictIt Score versus 538 Score in Close Races, on Days Out')
blue_patch = mpatches.Patch(color='blue', label='PredictIt scores')
orange_patch = mpatches.Patch(color='orange', label='538 scores')
plt.legend(handles=[blue_patch, orange_patch], loc = 'best', bbox_to_anchor=(1.1, -0.2))
plt.show()
Here, it seems we have more consistent PredictIt outperformance of FiveThirtyEight; as above, we can test this by building a linear regression model on the differences.
#we already have the xs from last time, and the ys are easy to create: ys[i] = ys_pred[i] - ys_fte[i]
ys = 94 * [0]
for i in range(0,94):
ys[i] = ys_pred[i] - ys_fte[i]
#creating the plot
fig, ax = plt.subplots()
ax.scatter(xs, ys, color = 'red')
#regression line
slope, intercept, r_value, p_value, std_err = stats.linregress(xs,ys)
x=np.linspace(min(xs), max(xs))
line = slope*x+intercept
plt.plot(x, line, 'black', label='y={:.2f}x+{:.2f}'.format(slope,intercept))
#labelling
plt.xlim(min(xs)-1, max(xs)+5)
plt.ylim(min(ys)-0.1, max(ys)+0.1)
ax.set_xlabel('Days out')
ax.set_ylabel('PredictIt Score - 538 Score')
ax.set_title('PredictIt Score versus 538 Score in Close Races, on Days Out')
blue_patch = mpatches.Patch(color='blue', label='PredictIt scores')
orange_patch = mpatches.Patch(color='orange', label='538 scores')
plt.legend(handles=[blue_patch, orange_patch], loc = 'best', bbox_to_anchor=(1.1, -0.2))
plt.show()
#getting statistics
reshapedx = []
for i in range(0,94):
reshapedx.append([xs[i]])
reshapedx2 = sm.add_constant(reshapedx)
est = sm.OLS(ys,reshapedx2)
est = est.fit()
print(est.summary())
OLS Regression Results ============================================================================== Dep. Variable: y R-squared: 0.052 Model: OLS Adj. R-squared: 0.042 Method: Least Squares F-statistic: 5.033 Date: Wed, 15 Dec 2021 Prob (F-statistic): 0.0273 Time: 00:18:30 Log-Likelihood: -108.95 No. Observations: 94 AIC: 221.9 Df Residuals: 92 BIC: 227.0 Df Model: 1 Covariance Type: nonrobust ============================================================================== coef std err t P>|t| [0.025 0.975] ------------------------------------------------------------------------------ const 0.9999 0.162 6.169 0.000 0.678 1.322 x1 -0.0066 0.003 -2.243 0.027 -0.013 -0.001 ============================================================================== Omnibus: 0.575 Durbin-Watson: 0.471 Prob(Omnibus): 0.750 Jarque-Bera (JB): 0.707 Skew: -0.159 Prob(JB): 0.702 Kurtosis: 2.719 Cond. No. 110. ============================================================================== Notes: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
Here, we do have statistically significant results, as seen in the low p-values! So, in close races (as defined), PredictIt not only outperforms FiveThirtyEight at the basline (94 days out, as we see above), but also has an increasing accuracy relative to FiveThirtyEight as election day comes closer! And, this is all measured using a metric commonly used by 538!
Though we can't tell whether PredictIt or FiveThirtyEight is more accurate in predictive power over all 2020 U.S. Senate races, we could still try to see whose predictions "mean more". As in, we can answer the question - what gives us more information as to who the winner of a race will actually be: knowing FiveThirtyEight's prediction for a race, or knowing PredictIt's? Fortunately, we have the perfect tool to answer this question: decision trees.
A decision tree allows us to classify inputs by following its branches using the attributes of our input. An example of how this may work: we may have a dataset of dogs, humans, and cats, with two attributes: #legs, and #whiskers. The first node on the tree may say: if #legs less than 3, follow the left branch; otherwise, follow the right branch. The left branch leads to a human classification, while right branch leads to another node, which could say: if #whiskers is 0, follow the left branch; otherwise, follow the right branch. The left branch here leads to a dog classification, while the right branch leads to a cat classification. An example of how this plays out: suppose we have data on a middle-aged man, with 2 legs and 4 whiskers (he hasn't been good about shaving). We look at the root node, which tells us to follow the left branch, which (correctly) classifies this input as human!
A decision tree, at least when using "entropy" as the criterion in sklearn, is built using the following algorithm: look over the whole training data (which I will explain shortly) to find the attribute which, when split on, reduces uncertainty the most. Split on this attribute. Repeat for resulting nodes until either certainty is reached, and we classify as the class of all of the objects represented in that node, or we hit the max_depth, and we classify as the class of the majority of objects represented in that node.
Now, we'll try to build a decision tree. However, we want an optimal max_depth parameter so that there is not overfitting to our data (I recognize other parameters can be tweaked, but I concluded in an earlier project that this is the most significant one). To find the optimal max depth of the tree, we will use holdout validation, where a random sample is taken from our dataset to train the model (about 70% of the data), and the remainder of the data is used to test the model's classification accuracy. We then choose a max_depth which maximizes the proportion of the test data correctly classified.
Note that the attribute in the root node is what the algorithm considered to be the most important attribute to split on. So, let's create a few decision trees using this method, and see if the algorithm consistently chooses one attribute to split on at the root. We will use three attributes: FiveThirtyEight implied Democratic odds, PredictIt implied Democratic odds, and days out from the election (as this was found to significantly influence accuracy). If the trees are consistently splitting on FiveThirtyEight implied Democratic odds, then perhaps the model believes that to be better information to know with regards to predicting outcome, and if the trees are consistently splitting on PredictIt implied Democratic odds, then perhaps the model believes that to be better information to know with regards to predicting outcome.
#importing necessary modules
import random
from sklearn import tree
#we will create 5 trees randomly
for tree_num in range(1,6):
#we will use the lists below to store the testing and training inputs and outputs
X_train = [] #training inputs
X_test = [] #testing inputs
y_train = [] #training outputs
y_test = [] #testing true outputs; we will check if these match those produced by the model
#populating testing and training inputs and outputs accordingly; each input has a 70% chance of being used for training,
#and a 30% chance of being used for testing
#for inputs, we use a list containing days out, PredictIt's implied Dem odds, and 538's implied Dem. odds, in that order
#for outputs, we use a list containing the actual outcome
for k in fte.keys():
if random.random() < 0.7:
X_train.append([pred[k][0],pred[k][1],fte[k]])
y_train.append(pred[k][2])
else:
X_test.append([pred[k][0],pred[k][1],fte[k]])
y_test.append([pred[k][2]])
#fitting a decision tree to the training data with a max_depth of 1
clf = tree.DecisionTreeClassifier(criterion = "entropy", max_depth=1)
clf.fit(X_train, y_train)
#this list contains the classifications that the decision tree would make for the test data
predictions = clf.predict(X_test)
#correct will be assigned the number of classifications that the decision tree made correctly in the test data
correct = 0
for i in range(0,len(y_test)):
if predictions[i] == y_test[i]:
#if the prediction matches the true classification, we increment correct
correct += 1
#the proportion correct is the number of correct classifications over the total number of classifications attempted
proportion = correct/len(y_test)
#we are now going to use a while loop to find the optimal value for max_depth
#n will be the max_depth value we are attempting in the current iteration of the while loop, and as soon as
#incrementing n leads to a less accurate tree, we stop the loop
n = 1
#the proportion that the loop got correct on the previous iteration; will be stored at the start of each iteration
prev_proportion = 0
#the tree that the loop created on the previous iteration; will be stored at the start of each iteration
prev_tree = clf
#we exit the the loop when max_depth=n leads to a worse-performing tree than max_depth=n-1; the performance of the tree
#with max_depth=n is stored in "proportion", while the performance of the tree with max_depth=n-1 is stored in
#"prev_proportion"
while proportion >= prev_proportion:
n = n+1
#storing the previous proportion
prev_proportion = proportion
#storing the previous tree
prev_tree = clf
#fitting a tree with max_depth=n
clf = tree.DecisionTreeClassifier(criterion = "entropy", max_depth=n)
clf.fit(X_train, y_train)
#this list contains the classifications that the decision tree would make for the test data
predictions = clf.predict(X_test)
#correct will be assigned the number of classifications that the decision tree made correctly in the test data
correct = 0
for i in range(0,len(y_test)):
if predictions[i] == y_test[i]:
#if the prediction matches the true classification, we increment correct
correct += 1
#the proportion correct is the number of correct classifications over the total number of classifications attempted
proportion = correct/len(y_test)
#when this point in the code has been reached, the decision tree for max_length=n-1 performed the best; this tree
#is prev_tree, so we display this tree
fig = plt.figure(figsize=(25,20))
_ = tree.plot_tree(prev_tree,
feature_names=['Days Out', 'PredictIt prob', 'FiveThirtyEight prob'],
class_names=['0','1'],
filled=True)
#the proportion correct for prev_tree was prev_proportion, and we can print this
print("Tree " + str(tree_num) + " classified " + str(int(100 * prev_proportion)) + "% of the test data correctly.")
Tree 1 classified 97% of the test data correctly. Tree 2 classified 94% of the test data correctly. Tree 3 classified 94% of the test data correctly. Tree 4 classified 95% of the test data correctly. Tree 5 classified 95% of the test data correctly.
In the trees above, we can see that the root attributes used are sometimes FiveThirtyEight's probability, and sometimes PredictIt's probability; it really just depends on which portion of the data is used as training data, and which is used as testing data. So, no meaningful conclusion can be drawn here, either. Though it is worth noting: in creating these decision trees, I realized that the choice of root attribute does not necessarily imply which trait is truly more important. In the above data, we see in most of the trees that the first attribute is used generally to classify the most Democratic markets, which makes sense: very rarely does a market in which PredictIt or FiveThirtyEight had given the Democratic over 75% odds go to the Republican, so the tree uses a number around there to classify points as certainly Democratic. However, classifying toss-ups, for which one could argue that knowing whether to use PredictIt or FiveThirtyEight predictions is more important, occurs in the middle of the tree, and requires a much more nuanced splitting than what occurs at the root, as we can see in the trees above. And in that part of the tree, both PredictIt and FiveThirtyEight predictions are heavily used (days out is sometimes used as well, so it isn't irrelevant in prediction).
Also of note: using this dataset to train a decision tree to classify PredictIt U.S. Senate election markets in general (let alone simply PredictIt U.S. election markets more broadly) may be flawed, as this entire dataset is likely skewed. But we will dive into this more in our closing section.
A few points from throughout our analyses and hypothesis testing stand out as noteworthy:
But now, we need to discuss skewed sample, which may have played a role in our observations. In the 2020 general election, Democratic margins on average underperformed polls by nearly 4%, meaning polling was skewed towards Democrats, and the actual results were more Republican than the polling predicted. So, there may be a cause-effect relationship between PredictIt's Republican bias relative to FiveThirtyEight and PredictIt's increased accuracy relative to FiveThirtyEight: PredictIt likely beat out FiveThirtyEight in accuracy simply because it was more bullish on Republicans in the 2020 U.S. Senate elections, and Republicans outperformed expectations more frequently than Democrats. This would render our accuracy conclusions a by-product of our bias conclusion: PredictIt's markets only outperform FiveThirtyEight's forecasts in this dataset because their markets are more baised towards Republicans.
Which then renders the question: is this (ultimately correct) bias over the 2020 U.S. Senate sample representative of PredictIt's predictive power, or did PredictIt simply get lucky in this dataset? The answer is not clear, and we would need to see more data - perhaps PredictIt has a general bias towards Republicans relative to FiveThirtyEight's forecast, regardless of the year, and while in years like 2016 and 2020 this would cause PredictIt to appear more accurate, in years like 2018, where Democrats slightly overperformed polls, this bias would hurt PredictIt's accuracy relative to FiveThirtyEight. Or, perhaps in years like 2016 and 2020, PredictIt shows (accurate) Republican bias relative to FiveThirtyEight's significantly polls-based forecasting, while in years like 2018, PredictIt shows (accurate) Democratic bias relative to FiveThirtyEight's significantly polls-based forecasting. Whether the former or the latter is the case, we cannot know, as we do not have the data to test these hypotheses.
In conclusion: over this dataset, PredictIt's prices and implied probabilities appear strongly correlated with actual, observed outcomes. And, as election day approaches, PredictIt's implied probabilities become more accurate; PredictIt even appears more accurate than FiveThirtyEight, a reputable forecaster! But when it comes to generalizing these conclusions to other markets in PredictIt, we simply don't have the data to do so.
I would like to again thank PredictIt for providing me with their 2020 U.S. Senate election data. This analysis was a pleasure to perform, and I hope I have the opporunity to perform analyses like this on some of their of data in the future.
Additionally, I would like to thank my professor, John Dickerson, for his guidance throughout this process. I feel confident in saying that this project wouldn't have come to fruition without his advice at a few crucial steps in the process.
Finally, I want to thank my parents, as they had to put up with me talking about FiveThirtyEight, PredictIt, and polling nonstop from ages 15-18.