Categories
Challenges & Insights: Generating a Price Guide of Trading Card Values Uncategorized

A Couple of Hard Truths on Estimates

I’ll be the first to say it — despite pouring thousands of hours into this project, some (many!) card prices totally evade me. Recently a Wayne Gretzky rookie card sold for $3.75 million. My model pegged it at $175,000. So, I missed by a margin of 20:1. Ouch!

So, a moment of self reflection is merited:

First off, this is an evolving project. I am three months in on data collection (using my new methodology…). So, I can cast about for some excuses there: lots of cards just don’t have enough price points.

Still, a margin of 20:1 is pretty brutal, even with serious data limitations. So what else is going on?

Well, secondly, I gotta say there is a major challenge in estimating card prices because there are, seemingly, to be two distinct “data generating processes” going on — with a very ill-defined line cutting off cards from being in one set versus the other.

Very loosely speaking, there are some cards that have everything going from them: for the vintage cards, you might think of the uber high grade cards (from PSA and Beckett) of the top players in any given sport — like Gretzky or Howe, Michael Jordan, Mickey Mantel, Jerry Rice or Joe Montana, etc. It is my belief that these cards obey Exponential laws… tiny upticks in a card’s rarity blows-up the value like crazy. It feels as if every “big dollar” collector wants these cards and will spend what it takes to get them. A PSA 10 isn’t just worth 10% or 50% more than a PSA 9. It might be worth 1000% more.

Then there are the cards of players who are extraordinary but fall just short of being superheroes. Let’s get real… these guys and girls are amazing… they represent the top 0.001% of sports talent in the world — but… they aren’t quite in the top 0.0000001%. And, because of that, their cards go up incrementally in value. A linear relationship might exist between their rookie card as a PSA 9 vs. PSA 10 — maybe one is worth $100 and the other $150. More, but not crazily so. Perhaps the RPA of a 20 goal scorer in hockey might be worth 50% more than a 30 goal scorer — which, of course, feels right… a player that producers 50% more should be represented by a card worth 50% more. (And, yet, this relation falls apart when we are talking about 50 goal scorers. Their value might be 500% that of the 30 goal scorer.)

What’s to be Done?

So… the problem having been stated… and a plausible theory explaining the puzzle having been presented… what can be done?

Well, first-off, anyone doing data science for purposes of prediction is going to have to make trade-offs between doing data-driven results vs. parametric modeling.

Letting data drive your model can be great, especially because the estimates you produce are gonna be accurate — they are, after all, pretty much just an averaging of the prices that you observed for that particular card… BUT this method is only gonna let you predict for what you’ve already got. Obviously, that sorta sucks — if two cards are highly similar, why not use data from one to inform prediction of the other?

Letting theory drive your model can be great, especially because it lets you predict the value of cards that you do not have data for, but only if your theory (or theories) are… in reality… the primary one’s that drive results. Moreover, if you need multiple theories to explain what is going on with some subsets of cards, but not others, then your scope conditions need to be well-defined (which is one of the major challenges here)

So, in my case, I may have to sacrifice range for accuracy — to focus more on what I do have rather than extrapolation. As a theoretical sort of guy, this isn’t wholly satisfying, but the game’s not over…

IF I can think through how to parameterize my model to accurately divide cards into the exponential vs linear data generating process… THEN I am the playoff team that was down 3 games to 1, but who has suddenly tied the series back up.

Some of this I have already done. Some I am working on. Some is still yet to be realized. But, overall, I feel pretty good that’s where we’re headed.

In future posts, I’ll get into some of the gritty details of what’s driving my current models and how I intend to evolve them.