The workhorse of my project looks complex at first glance, but is — I swear — really quite simple. The key idea is this: even though I am collecting a lot of data on market prices, there is no way I can find data for every single sports card to ever exist… let alone find multiple datapoint for every card to ever exist — which is really what would be required if not for a trick up my sleeve (an obvious trick, granted). If we were simply stating market price of individual cards based on observed data of those exact cards, then we would need to collect lots of data (an impossible amount of data, I figure) to be confident that all the card prices we’ve observed aren’t too high or too low due to random flukes — a botched auction, a typo in the auction’s title, a quiet week on eBay, etc.
So… what is to be done? Well, I would suggest a statistical model where we say, “hey, cards that are pretty alike are probably worth similar amounts…” and, of course, in a very much opposite vein, “and if they are pretty different, we probably can’t learn too much about one’s value from the other!”
And here you might say, “well, duh!” Fair enough. But now let’s step it up with some specific with the jargony, pretentious, math equation below. I am using what is called a linear regression model with fixed-effects in order to build predictions of value for each card in my dataset of trading cards. Sometimes a card’s value is mostly predicted by its own sales data. Sometimes its value is mostly predicted by the value of other cards that are most similar to it… This method, then, should allow us to predict the value of cards for which we have not directly observed market data.
Do keep in mind, the model’s estimates will only be as good as the data which feeds it. So, I do recommend checking-out my other blog post in this section concerning how I collect data…. Crucially, I stick to auction prices, rather than asking prices (which is to say, what something is worth is what someone is actually willing to pay for it), and I have strict criteria about which sellers enter the database… In any case, I digress. Jumping back, here is what the model looks in statistics speak (albeit, dumbed-down for my own sake):
And, in case your into the statistical programming side of things, this is what the model, running in R, looks like:
Alright then…. So…. In basic English, What is all that saying?
Well, pretty much this:
A hockey card’s market value is proportionate to some combination of: (1) the year the card was produced; (2) the player featured on the card; (3) the series and subset to which the card belongs; (4) the player’s relative “value-added” to a card (PRV)… vs. other players; (5) whether the card features (i) rookie status; (ii) an autograph; (iii) a patch from a jersey affixed to the card; (6) jersey memorabilia affixed to card; and (7) any other memorabilia affixed to the card (ill-defined). Moreover, note the * which stands for “interaction.” In other words, the model considers the effect of every possible combination of values being interacted. So, some subsets may create a lot of extra value for players with a high PRV whereas others might not generate any special extra value; as well, I interact PRV with the already crazy interaction of Rookie, Patch and Autograph… since various combinations of these items might generate extra special values, separate from measuring each of these alone. So… an awesome player like Gretzky might make a card be worth a little extra (or any other high-end player — who’d be represented with a very high PRV)… and an autograph might make a card worth a little extra… but together, I’d bet that’s worth a lot extra… its the interaction that’s really big here… not each element working in isolation.
And, so, that’s the basic idea. The model is not a dogma. It is bound for tinkering to reflect new findings, about what matters, in the data. Yet, that’s where we stand now.
Further updates to come.
Best,
SCV