Categories
Building the Data & Predictive Models

Generating the Data: Setting Quality Control Criteria for Web-Scraping

The core data to this set: eBay auction prices, hosted by a select set of consigners… so as to control all the random crap that would otherwise flood my numbers with noise. In other words, by restricting sellers eligible for the dataset, we can control (somewhat) the variability in the sales prices due not to the card itself, but due to differences across sellers in terms of their: shipping costs; general sketchiness (e.g., feedback ratings; clarity of writing; formatting choices); detailedness of listings, quality of photos, etc. 

I pick a select list of sellers, then web-scraped data from their eBay sales history. I collect data about the card’s set, subset, player, team, rookie status, whether autographed and/or containing memorabilia. Sellers must start auctions at low prices (rather than operate, essentially, as a shop by listing the starting price as the price they want to get); they’re shipping and handling policies must be consistent. What someone is willing to pay for the card should be well-represented the final bid price. It should not be a reflection of other stuff: like variabilities in shipping prices. 

That, then, explains what data I am collecting and the minimum quality standards I enforce for data to be collected. On another page, you can check-out all the components that go into building my models to predicts trading card values. But, whether you look into that or not, the idea of this page is simple: take at look at what cards actually sell for, if you are going to jump into prediction. Don’t take people’s asking prices as a good guide. Also, in the above listing of criteria, I probably have exposed certain biases and beliefs systems about how value can be determined. I am theorizing. Don’t take my values fore-granted. Be critical, judge them. Ask if they are sensible relative your own experiences and/or research. A great place to kick off some research is by searching for your cards of interest on Ebay. Use the “sold” filter to see if my numbers map onto what you can find. 

In short, I recommend you don’t believe me blindly when I argue my predictions are best. Judge for yourself. Do my predictions match-up to reality? You decide.