Data and bias

nicoleandmaggie

Aug 3

This tweet recently made the rounds of twitter:

Economists have spent so much time studying how monetary incentives influence behavior, they have become immune to those very same influences. Homo economicus? That's y'all, not me. pic.twitter.com/TI8OmR6v5r

— Jathan Sadowski (@jathansadowski) July 19, 2022

Justin Wolfers has since deleted his defense.

But... here's my 2 cents as someone who isn't bringing in over half a million per year in salary from the University of Michigan (Justin Wolfers and Betsey Stevenson's salary info is available online as state employees):

1. 100K is a lot, and if you don't think it's a lot, there's a problem. To speak in terms that the top 2% can understand, that's a whole new personal assistant.

2. The motivation of 100K is not really as big a deal as getting stunning data. Just the data by themselves are incentive to not bite the hand that feeds the researcher (in this case Uber). And the Uber data are stunning. They've helped us learn a lot about human behavior and contingent labor markets, and probably lots of other stuff that's more industrial organization.

Does that mean that you can't trust anything that comes out of the Uber data, or any other study where the company has generously provided data?

No.

But it does mean that you need to think really hard about the studies that do come out of the data (and the studies that don't come out as well).

Ask yourself:

Does the company (or in some cases, government agency) benefit from the study results? If not, then it's probably ok.

There are plenty of amazing studies using the Uber data that tell us about the type of employee who uses the contingent labor market and what their preferences are. Uber has no reason to benefit from or to suppress this information. The studies are orthogonal to influences that Uber might be giving (purposefully or not) to grateful researchers. These results are probably trustworthy, that is, they can be evaluated on the merits of their own internal validity.

If the company would have cause to benefit from the results-- then you might be more cautious. Not that a good economist would purposefully fudge data or results. They don't need to. With any research project there are a lot of decisions that need to be made about specifications and samples and data cleaning. Researchers just have to unconsciously feel grateful to the company to bias themselves with these choices, particularly if they don't have a pre-analysis plan. (And even if they do have a pre-analysis plan, they might still choose what they unconsciously think will benefit, or at least not hurt, the company).

On top of that, there's selection bias in the choice of research question. Even excellent economists will choose to just not go places that might make the company look bad when said company has provided data.

Similarly, negative results can be suppressed by the data provider. I know of a case where the US government suppressed one of my colleague's research findings that made their agency look bad after providing him with data (though they did allow someone else to publish the same negative findings later under a new, less fascist, government regime). Any time that clearance is required to share results, that can be a problem.

To sum:

Just data provision is enough to bias research results. If a company provides data, then results that show the company in a positive light will be shown and results that show the company in a negative light will not be shown to the public. Results that don't affect the company one way or the other are probably fine and can be evaluated on their own merits.

There's a lot to be said for data that come from legal requirements (ex. FOIA), are available from third parties, or from internet leaks.

It is important to know who provided the data, not just who provided the funding, when doing disclosures.

Comment