Working Papers

Finding Diamonds in the Rough: Data-Driven Opportunities and Pharmaceutical Innovation

Winner, Best Paper Award, Wharton Innovation Doctoral Symposium (2023)

Winner, Best PhD Student Paper, 3rd AIM Conference at USC Marshall (2023)

Winner, Best Conference PhD Paper Prize, SMS Conference, Toronto (2023)

Winner, William H. Newman Award, Academy of Management (2024)

Winner, TIM Division Best Paper Award, Academy of Management (2024)

Finalist, Best Conference Paper Prize, SMS Conference, Toronto (2023)

Finalist, Research Methods Paper Prize, SMS Conference, Toronto (2023)

Best Paper Designation, Academy of Management (2024)

Abstract

Big data are increasingly used to make predictions about the value of uncertain investments, thereby helping firms identify innovation opportunities without the need for domain knowledge. This trend has raised questions about which firms will primarily benefit from the availability of these data-driven predictions. Contrary to existing research suggesting that data-driven predictions level the playing field for firms lacking domain knowledge, I argue — using a simple theoretical framework — that these predictions reinforce the competitive advantage of firms with domain knowledge. In high-stakes contexts like innovation, where returns are skewed and only a few leads can be pursued, domain knowledge helps evaluate predictions and avoid false positives. I test this idea using novel data on the pharmaceutical industry, exploiting the features of genome-wide association studies (GWAS) that provide data-driven predictions about new drug targets. The results show that GWAS stimulate corporate investments in innovation, yet around one-third of these efforts are misallocated toward false positive predictions. Companies lacking domain knowledge react more strongly but are disproportionally likely to fall into the trap of false positives. Instead, domain knowledge helps firms pursue fewer alternatives that are more likely to be the best opportunities. Together, the results show that even if data-driven predictions are valuable in innovation, domain knowledge remains a crucial source of competitive advantage in the age of big data technologies.

[Working Paper][Will Mitchell Dissertation Research Grant][INFORMS/Organization Science Dissertation Proposal Competition]

Data-Driven Search and the Birth of Theory: Evidence from Genome-Wide Association Studies

Finalist, Steven Klepper Award for Best Young Scholar Paper, DRUID (2022)

Finalist, Best Conference Paper Prize, SMS Conference, London (2022)

Finalist, Best PhD Conference Paper Prize, SMS Conference, London (2022)

Abstract

How does big data change the recombinant search for innovation? Data-driven predictions can reveal promising technological combinations even when their mechanisms are unknown. Some argue this shift replaces traditional theory-based approaches, raising concerns that data-driven search might stifle theory generation. Using an evolutionary framework of variation and selection, I argue that data technologies actually reinforce theorizing. Data-driven search broadens the space of combinations explored and increases the variability in outcomes compared to theory-driven approaches. As a result, it uncovers more unexpected findings that stimulate, rather than substitute, new theory generation. I test these ideas in the domain of human genetics, where genome-wide association studies (GWAS) are a form of data-driven search for the genetic roots of diseases decoupled from prior theory. Compared to theory-driven studies, GWASs introduce gene-disease combinations that span a wider portion of the genetic landscape, are more likely to fall at both extremes of scientific quality, and often challenge expectations. Instead of crowding out theory, GWAS findings trigger a surge of follow-on work aimed at elucidating their causal mechanisms. Together, the results reveal a complementarity between theory and data in search, suggesting that big data technologies foster virtuous cycles of theorizing sparked by empirical anomalies.

[New Draft Coming Soon] [Panmure Grant]

How Does Data Access Shape Science? Evidence from the Impact of U.S. Census’s Research Data Centers on Economics Research

With Abhishek Nagaraj (Reject & Resubmit)

Abstract

This study examines the impact of access to confidential administrative data on the rate, direction, and policy relevance of economics research. To study this question, we exploit the progressive geographic expansion of the U.S. Census Bureau's Federal Statistical Research Data Centers (FSRDCs). FSRDCs boost data diffusion, help empirical researchers publish more articles in top outlets, and increase citation-weighted publications. Besides direct data usage, spillovers to non-adopters also drive this effect. Further, citations to exposed researchers in policy documents increase significantly. Our findings underscore the importance of data access for scientific progress and evidence-based policy formulation.

[Working Paper] [NBER Working Paper][Sloan Grant][Tweetstorm summary]

The Streetlight Effect in Data-Driven Exploration

With Johannes Hoelzemann, Gustavo Manso, and Abhishek Nagaraj (Reject & Resubmit)

Abstract

We study exploration under uncertainty and show how access to data on past attempts can paradoxically hinder breakthrough discovery. We develop a model of the ``streetlight effect'' demonstrating that when data highlights attractive but ultimately suboptimal projects, it can narrow exploration and suppress innovation. In a lab experiment, we find that revealing the true value of an enticing project lowers individual payoffs by 5% and reduces breakthrough discoveries by 56% — even when participants know the project is suboptimal. We validate our theory in the field using data on scientific research into the genetic origins of human diseases. To identify the causal impact of past data, we use an instrumental variables approach that leverages quasi-random genetic overlaps between human and mouse genes, which asymmetrically reduce research costs for certain diseases. We find that diseases with early evidence of promising genetic targets are 16 percentage points less likely to yield breakthroughs than those where early efforts failed. While competition dampens the streetlight effect, it does not eliminate it. Our paper provides the first systematic analysis of this phenomenon, outlining the conditions under which data leads agents to look under the lamppost rather than engage in socially beneficial exploration.

[Working Paper] [NBER Working Paper] [Online Appendix] [Preregistration]

Theorizing with Large Language Models

With Cecil-Francis Brenninkmeijer, Arul Murugan, and Abhishek Nagaraj

Abstract

Large Language Models (LLMs) are proving to be a powerful toolkit for management and organizational research. While early work has largely focused on the value of these tools for data processing and replicating survey-based research, the potential of LLMs for theory building is yet to be recognized. We argue that LLMs can accelerate the pace at which researchers can develop, validate, and extend strategic management theory. We propose a novel framework called Generative AI-Based Experimentation (GABE) that enables researchers to conduct exploratory in silico experiments that can mirror the complexities of real-world organizational settings, featuring multiple agents and strategic interdependencies. This approach is unique because it allows researchers to unpack the mechanisms behind results by directly modifying agents' roles, preferences, and capabilities, and asking them to reveal the explanations behind decisions. We apply this framework to a novel theory studying strategic exploration under uncertainty. We show how our framework can not only replicate the results from experiments with human subjects at a much lower cost, but can also be used to extend theory by clarifying boundary conditions and uncovering mechanisms. We conclude that LLMs possess tremendous potential to complement existing methods for theorizing in strategy and, more broadly, the social sciences.

[Working Paper] [NBER Working Paper][Tweetstorm summary]

Work in Progress

The role of scientific theories in innovation: Evidence from representations of causal relationships in biology

Empirical Search Landscapes

With Abhishek Nagaraj

Science as a ''Weickian'' map in technological search

With Bikash Kumar Panda

[I3 Summer Fellows Program, NBER]

U.S. Census Administrative Data Use in Economics: Adoption, Diffusion and Impact

With Fernando Stipanicic and Abhishek Nagaraj

Pre-Doctoral Publications

Patterns of innovation during the Industrial Revolution: a reappraisal using a composite indicator of patent quality

With Alessandro Nuvolari and Valentina Tartari

Explorations in Economic History, 82, 101419, 2021 [Lead article]

Winner, Bernardo Nobile Prize for the best Master's thesis using patent data

Abstract

The distinction between macro- and microinventions is at the core of recent debates on the Industrial Revolution. Yet, the empirical testing of this notion has remained elusive. We address this issue by introducing a new quality indicator for all patents granted in England in the period 1700–1850. The indicator provides the opportunity for a large-scale empirical appraisal of macro- and microinventions. Our findings indicate that macroinventions did not exhibit any specific time-clustering, while microinventions were characterized by clustering behavior. In addition, we also find that macroinventions displayed a labor-saving bias and were mostly introduced by professional engineers. These results suggest that Allen’s and Mokyr’s views of macroinventions, rather than conflicting, should be regarded as complementary.

[Paper] [Online Appendix] [Data] [Bernardo Nobile Prize]

The role of comparative advantage, endowments and technology in structural transformation

With Giovanni Dosi

in Alcorta et al. (eds), New Perspectives on Structural Change: Causes and Consequences of Structural Change in the Global Economy, 2021, Oxford: Oxford University Press

Abstract

In this chapter we discuss the role of natural resources and endowment structures on structural change. Departing from theories of trade that stress specialization according to one’s comparative advantages as the key route to development, we articulate an alternative point of view on the role of technological learning and absolute advantages for structural transformation. Ricardian adjustment processes relying on endowment-based comparative advantages are often times a misleading driver of development; rather, technological competitiveness offers a better criterion to achieve sustained economic well-being. This theoretical perspective provides useful guidance to interpret the effects of globalization and the role of natural resources relative to industrial and trade policies in shaping the process of structural change and economic development.

[Book chapter]

Only one way to skin a cat? Heterogeneity and equifinality in European national innovation systems

With Valeria Cirillo, Arianna Martinelli, and Alessandro Nuvolari

Research Policy, 48(4), 905-922, 2019

Abstract

One of the most significant results of the qualitative literature on national systems of innovation (NSIs) is that different systemic arrangements (i.e. configurations of actors and institutions) can deliver similar levels of innovative performance. Using factor analysis on a novel dataset of 29 quantitative indicators of innovative activities we provide an empirical characterization of the structure of European NSIs over the last ten years. Our results cast doubt on the empirical significance of the “equifinality” of heterogeneous systemic arrangements in the context of NSI. Innovation systems show inherent complexity, which leads to a high level of complementarity among their constituent components and configuration. This result implies that successful innovation policies should be systemic, leaving little flexibility in policy design and scope.

[Paper]



Matteo Tranchero

Contact

+1 (341) 400-3543 | mtranc@wharton.upenn.edu |

Address

The Wharton School of the University of Pennsylvania