The answer to this question comes down to two things:
- our algorithm
- the frequency of data collection
Our SmartFactor algorithm takes into account the detail of data we’re able to collect for each state. In the case of NY Lottery, we collect data two ways – by scraping it from the lottery website every day AND by submitting a freedom of information request on a regular basis. They do not publish the Est % of tickets sold through either of these data sources. But this is where the FREQUENCY of data collection becomes important.
The frequency of data collection allows us to gather and store the game data on a regular schedule in our database. We do not just look at the game data at one point in time. We collect it, store it, and use it over time. We’ve been doing this since 2011. The more data we collect over time the more confidence we have for our estimated % of tickets sold calculation.
We start by scraping the games from the website every day. This tells us when there are new games released. We then ask for the detailed prizes remaining through the freedom of information request (FOIA)
The FOIA report the NY Lottery sends out provides the total prizes paid and the total prizes unpaid for ALL prize levels for all games. With this information, our algorithm will work backward, using each games overall odds to calculate the total number of tickets printed. At that point, we have everything we need to determine the estimated % of tickets sold for the date of that FOIA Report.
We’ve invested a lot of time, energy and experience to automate this process and perform it every time we collect the data. By doing this over and over again, and tracking all of this data over time, we’ve become very accurate and confident with our Estimated % sold for NY lottery.
However, to be completely honest, the NY lottery is one of our toughest states for processing lottery data. Most states publish everything we need on their website. NY Lottery does not. For our analysis to work, we are not 100% automated. The FOIA step requires manual intervention in our automated processes. Because of this manual step in our process, we have a team of data stewards who focus on reviewing and validating the NY lottery data we collect every week.
Once in a while, we’ll repost a NY report when we catch an error. This has happened 3 times over the last 2 years, so we are not perfect. Even with our occasional ‘reposts’, we know we have the very best process and detailed data for NY Scratch-off games that anyone is going to get.