Matteo Tranchero

Working Papers

Finding Diamonds in the Rough: Data-Driven Opportunities and Pharmaceutical Innovation

(Revise & Resubmit)

Winner, Best Paper Award, Wharton Innovation Doctoral Symposium (2023)

Winner, Best PhD Student Paper, 3^rd AIM Conference at USC Marshall (2023)

Winner, Best Conference PhD Paper Prize, SMS Conference, Toronto (2023)

Winner, William H. Newman Award, Academy of Management (2024)

Winner, TIM Division Best Paper Award, Academy of Management (2024)

Runner-up, ISA Best Paper in Innovation and Entrepreneurship Award (2025)

Finalist, Best Conference Paper Prize, SMS Conference, Toronto (2023)

Finalist, Research Methods Paper Prize, SMS Conference, Toronto (2023)

Best Paper Designation, Academy of Management (2024)

Abstract

Big data are increasingly used to make predictions about the value of uncertain investments, thereby helping firms identify innovation opportunities without the need for domain knowledge. This trend has raised questions about which firms will primarily benefit from the availability of these data-driven predictions. Contrary to existing research suggesting that data-driven predictions level the playing field for firms lacking domain knowledge, I argue — using a simple theoretical framework — that these predictions reinforce the competitive advantage of firms with domain knowledge. In high-stakes contexts like innovation, where returns are skewed and only a few leads can be pursued, domain knowledge helps evaluate predictions and avoid false positives. I test this idea using novel data on the pharmaceutical industry, exploiting the features of genome-wide association studies (GWAS) that provide data-driven predictions about new drug targets. The results show that GWAS stimulate corporate investments in innovation, yet around one-third of these efforts are misallocated toward false positive predictions. Companies lacking domain knowledge react more strongly but are disproportionally likely to fall into the trap of false positives. Instead, domain knowledge helps firms pursue fewer alternatives that are more likely to be the best opportunities. Together, the results show that even if data-driven predictions are valuable in innovation, domain knowledge remains a crucial source of competitive advantage in the age of big data technologies.

[Working Paper][Will Mitchell Dissertation Research Grant][INFORMS/Organization Science Dissertation Proposal Competition]

Data-Driven Search and the Birth of Theory: Evidence from Genome-Wide Association Studies

(Revise & Resubmit)

Finalist, Steven Klepper Award for Best Young Scholar Paper, DRUID (2022)

Finalist, Best Conference Paper Prize, SMS Conference, London (2022)

Finalist, Best PhD Conference Paper Prize, SMS Conference, London (2022)

Abstract

How does big data change the search for innovation? Data-driven predictions can identify promising technological combinations even when their underlying mechanisms are unknown. This has raised concerns that decoupling innovation from theoretical understanding weakens incentives to develop new theory and results in discoveries whose consequences are poorly understood. Using an evolutionary framework of variation and selection, I argue that data technologies can, instead, reinforce theory generation. Data-driven search broadens the space of combinations explored and increases the variability in outcomes compared to theory-driven approaches. As a result, this search strategy uncovers more surprising findings that stimulate, rather than substitute, new theorizing. I test these ideas in the domain of human genetics, where genome-wide association studies (GWAS) represent a data-driven search for the genetic roots of disease. Compared to traditional theory-based approaches, GWAS introduce gene-disease combinations that span a wider portion of the genetic landscape, more frequently fall at both extremes of scientific quality, and often defy expectations from existing knowledge. Rather than crowding out theory, GWAS findings trigger a surge of follow-on work aimed at elucidating their causal mechanisms. Together, the results reveal a complementarity between theory and data in search, suggesting that big data technologies can fuel virtuous cycles of theorizing sparked by empirical anomalies.

[Working Paper][Panmure Grant]

The Streetlight Effect in Data-Driven Exploration

With Johannes Hoelzemann, Gustavo Manso, and Abhishek Nagaraj

(Submitted)

Winner, The Oliver E. Williamson Best Conference Paper Award, SIOE (2025)

Abstract

We study exploration under uncertainty and show how access to data on past attempts can paradoxically hinder breakthrough discovery. We develop a model of the "streetlight effect" demonstrating that when data highlights attractive but ultimately suboptimal projects, it can narrow exploration and suppress innovation. In a laboratory experiment, we find that revealing the value of an enticing project lowers payoffs and reduces breakthrough discoveries. This drop stems from increased free-riding behavior, which crowds out the generation of new data. We validate our theory in the context of scientific research into the genetic origins of human diseases. To identify the causal impact of past data, we use an instrumental variable that leverages exogenous genetic overlaps between humans and laboratory mice, which reduces research costs for specific genes and leads to prioritized data collection about them. We find that diseases with early evidence of promising genetic targets are 16 percentage points less likely to yield breakthroughs than those where early efforts failed. While competition attenuates the streetlight effect, it does not eliminate it. Our paper provides the first systematic analysis of this phenomenon, outlining the conditions under which data leads agents to look under the lamppost rather than engage in socially beneficial exploration.

[Working Paper] [NBER Working Paper] [Preregistration]

How Does Data Access Shape Science? Evidence from the Impact of U.S. Census’s Research Data Centers on Economics Research

With Abhishek Nagaraj

(Submitted)

Abstract

This study examines the impact of access to confidential administrative data on the rate, direction, and policy relevance of economics research. To study this question, we exploit the progressive geographic expansion of the U.S. Census Bureau's Federal Statistical Research Data Centers (FSRDCs). FSRDCs boost data diffusion, help empirical researchers publish more articles in top outlets, and increase citation-weighted publications. Besides direct data usage, spillovers to non-adopters also drive this effect. Further, citations to exposed researchers in policy documents increase significantly. Our findings underscore the importance of data access for scientific progress and evidence-based policy formulation.

[Working Paper] [NBER Working Paper][Sloan Grant][Tweetstorm summary]

Theorizing with Large Language Models

With Cecil-Francis Brenninkmeijer, Arul Murugan, and Abhishek Nagaraj

(Revise & Resubmit)

Abstract

Large Language Models (LLMs) are proving to be a powerful toolkit for management and organizational research. While early work has largely focused on the value of these tools for data processing and replicating survey-based research, the potential of LLMs for theory building is yet to be recognized. We argue that LLMs can accelerate the pace at which researchers can develop, validate, and extend strategic management theory. We propose a novel framework called Generative AI-Based Experimentation (GABE) that enables researchers to conduct exploratory in silico experiments that can mirror the complexities of real-world organizational settings, featuring multiple agents and strategic interdependencies. This approach is unique because it allows researchers to unpack the mechanisms behind results by directly modifying agents' roles, preferences, and capabilities, and asking them to reveal the explanations behind decisions. We apply this framework to a novel theory studying strategic exploration under uncertainty. We show how our framework can not only replicate the results from experiments with human subjects at a much lower cost, but can also be used to extend theory by clarifying boundary conditions and uncovering mechanisms. We conclude that LLMs possess tremendous potential to complement existing methods for theorizing in strategy and, more broadly, the social sciences.

[Working Paper] [NBER Working Paper][Tweetstorm summary]

Work in Progress

Misleading Science Maps: Firm Strategic Responses to Data Malpractice in Alzheimer's Research

With Christian Fons-Rosen and Lee Fleming

The Role of Scientific Theories in Innovation: Evidence from Representations of Causal Relationships in Biology

Empirical Search Landscapes

With Abhishek Nagaraj

A Spatial Approach to Quantifying the Returns to Basic Research: Evidence from NIH-Funded Genomics

With Bikash Kumar Panda and Charlie Guthmann

[I3 Summer Fellows Program, NBER]

Peer-Reviewed Publications

Restricted-Access Data in Economics: Adoption, Diffusion, and Impact of U.S. Census Bureau's Microdata

With Fernando Stipanicic and Abhishek Nagaraj

Harvard Data Science Review, 2025

Abstract

Microdata from government agencies is believed to be valuable for economics research, and yet access to this data is highly restricted due to concerns about privacy and security. We provide an empirical assessment of the use and impact of restricted-access data that researchers can analyze at the U.S. Census Bureau's secure facilities. Our findings show that the use of Census Bureau's confidential data is growing and the publications employing it have a higher impact on the scientific and policy debate. However, adoption remains largely limited to established researchers from prestigious institutions. Our results and discussion inform the design of policies that balance privacy protection with accessibility to confidential microdata.

[Link to Journal] [Pre Print]

Pre-Doctoral Publications

Patterns of innovation during the Industrial Revolution: a reappraisal using a composite indicator of patent quality

With Alessandro Nuvolari and Valentina Tartari

Explorations in Economic History, 82, 101419, 2021 [Lead article]

Winner, Bernardo Nobile Prize for the best Master's thesis using patent data

Abstract

The distinction between macro- and microinventions is at the core of recent debates on the Industrial Revolution. Yet, the empirical testing of this notion has remained elusive. We address this issue by introducing a new quality indicator for all patents granted in England in the period 1700–1850. The indicator provides the opportunity for a large-scale empirical appraisal of macro- and microinventions. Our findings indicate that macroinventions did not exhibit any specific time-clustering, while microinventions were characterized by clustering behavior. In addition, we also find that macroinventions displayed a labor-saving bias and were mostly introduced by professional engineers. These results suggest that Allen’s and Mokyr’s views of macroinventions, rather than conflicting, should be regarded as complementary.

[Paper] [Online Appendix] [Data] [Bernardo Nobile Prize]

The role of comparative advantage, endowments and technology in structural transformation

With Giovanni Dosi

in Alcorta et al. (eds), New Perspectives on Structural Change: Causes and Consequences of Structural Change in the Global Economy, 2021, Oxford: Oxford University Press

Abstract

In this chapter we discuss the role of natural resources and endowment structures on structural change. Departing from theories of trade that stress specialization according to one’s comparative advantages as the key route to development, we articulate an alternative point of view on the role of technological learning and absolute advantages for structural transformation. Ricardian adjustment processes relying on endowment-based comparative advantages are often times a misleading driver of development; rather, technological competitiveness offers a better criterion to achieve sustained economic well-being. This theoretical perspective provides useful guidance to interpret the effects of globalization and the role of natural resources relative to industrial and trade policies in shaping the process of structural change and economic development.

[Book chapter]

Only one way to skin a cat? Heterogeneity and equifinality in European national innovation systems

With Valeria Cirillo, Arianna Martinelli, and Alessandro Nuvolari

Research Policy, 48(4), 905-922, 2019

Abstract

One of the most significant results of the qualitative literature on national systems of innovation (NSIs) is that different systemic arrangements (i.e. configurations of actors and institutions) can deliver similar levels of innovative performance. Using factor analysis on a novel dataset of 29 quantitative indicators of innovative activities we provide an empirical characterization of the structure of European NSIs over the last ten years. Our results cast doubt on the empirical significance of the “equifinality” of heterogeneous systemic arrangements in the context of NSI. Innovation systems show inherent complexity, which leads to a high level of complementarity among their constituent components and configuration. This result implies that successful innovation policies should be systemic, leaving little flexibility in policy design and scope.

[Paper]

Working Papers

Finding Diamonds in the Rough: Data-Driven Opportunities and Pharmaceutical Innovation

Data-Driven Search and the Birth of Theory: Evidence from Genome-Wide Association Studies

The Streetlight Effect in Data-Driven Exploration

How Does Data Access Shape Science? Evidence from the Impact of U.S. Census’s Research Data Centers on Economics Research

Theorizing with Large Language Models

Work in Progress

Misleading Science Maps: Firm Strategic Responses to Data Malpractice in Alzheimer's Research

The Role of Scientific Theories in Innovation: Evidence from Representations of Causal Relationships in Biology

Empirical Search Landscapes

A Spatial Approach to Quantifying the Returns to Basic Research: Evidence from NIH-Funded Genomics

Peer-Reviewed Publications

Restricted-Access Data in Economics: Adoption, Diffusion, and Impact of U.S. Census Bureau's Microdata

Pre-Doctoral Publications

Patterns of innovation during the Industrial Revolution: a reappraisal using a composite indicator of patent quality

The role of comparative advantage, endowments and technology in structural transformation

Only one way to skin a cat? Heterogeneity and equifinality in European national innovation systems

Matteo Tranchero

Contact

Address