Working Papers

Finding Diamonds in the Rough: Data-Driven Opportunities and Pharmaceutical Innovation

Job Market Paper

Winner, Best Paper Award, Wharton Innovation Doctoral Symposium (2023)

Winner, Best PhD Student Paper, 3rd AIM Conference at USC Marshall (2023)

Winner, Best Conference PhD Paper Prize, SMS Conference, Toronto (2023)

Winner, INFORMS/Organization Science Dissertation Proposal Competition (2023)

Finalist, Best Conference Paper Prize, SMS Conference, Toronto (2023)

Finalist, Research Methods Paper Prize, SMS Conference, Toronto (2023)

Nominated, William H. Newman Award, Academy of Management (2024)

Best Paper Designation, Academy of Management (2024)

Abstract

Big data are increasingly used to make predictions about uncertain investments, thereby helping firms identify innovation opportunities without the need for domain knowledge. This trend has led to questions about which firms will primarily benefit from the availability of these data-driven predictions. Contrary to existing research suggesting that data-driven predictions level the playing field for firms lacking domain knowledge, I argue—using a simple theoretical framework—that these predictions actually reinforce the competitive advantage of firms with domain knowledge. In innovation contexts, where returns are skewed and not all leads can be pursued, domain knowledge helps evaluate predictions and avoid false positives. I test this idea in the context of pharmaceutical innovation, exploiting the features of genome-wide association studies (GWASs) that provide data-driven predictions about new drug targets. The results show that GWASs stimulate corporate investments, but around one-third of these resources are misallocated toward false positive predictions. Companies lacking domain knowledge react more strongly but are disproportionally likely to fall into the trap of false positives. Instead, domain knowledge helps firms make fewer investments that target only the best opportunities. Together, the results show that even if data-driven predictions hold value when searching for innovations, domain knowledge remains the crucial source of competitive advantage in the age of big data.

[Working Paper][Will Mitchell Dissertation Research Grant][INFORMS/Organization Science Dissertation Proposal Competition]

Data-Driven Search and Innovation: Evidence from Genome-Wide Association Studies

Finalist, Steven Klepper Award for Best Young Scholar Paper, DRUID (2022)

Finalist, Best Conference Paper Prize, SMS Conference, London (2022)

Finalist, Best PhD Conference Paper Prize, SMS Conference, London (2022)

Abstract

In many settings, innovation involves searching for new combinations in technological spaces defined by mapping efforts. Recent advances in search technologies allow for the collection of large quantities of data to inform exploration decisions in such cases. This method of data-driven recombinant search starkly contrasts with traditional approaches that rely solely on theoretical understandings to identify breakthroughs. In this paper, I conduct an empirical case study to provide the first investigation of how data can change recombinant search in well-defined landscapes. I study this emergent phenomenon in the context of human genetics, where the advent of genome-wide association studies (GWASs) has enabled a form of data-driven search for the genetic roots of diseases divorced from theoretical considerations. By comparing gene-disease combinations introduced by GWASs with those from theory-driven studies, I provide unique evidence of how the search process shapes innovation outcomes. My results show that discoveries introduced by GWASs span a wider portion of the genetic landscape, are more likely to involve neglected human genes, and are of higher scientific value than comparable combinations introduced by theory-driven studies. However, heterogeneity analyses reveal that data-driven search performs poorly with interdependent components because correlational data neglect complex interactions that only a theoretical understanding can capture. This paper contributes to exploring the boundary conditions of data-driven search and generates questions for future research on how data shape innovation.

[New Draft Coming Soon] [Panmure Grant]

How Does Data Access Shape Science? Evidence from the Impact of U.S. Census’s Research Data Centers on Economics Research

With Abhishek Nagaraj (Reject & Resubmit)

Abstract

This study examines the impact of access to confidential administrative data on the rate, direction, and policy relevance of economics research. To study this question, we exploit the progressive geographic expansion of the U.S. Census Bureau's Federal Statistical Research Data Centers (FSRDCs). FSRDCs boost data diffusion, help empirical researchers publish more articles in top outlets, and increase citation-weighted publications. Besides direct data usage, spillovers to non-adopters also drive this effect. Further, citations to exposed researchers in policy documents increase significantly. Our findings underscore the importance of data access for scientific progress and evidence-based policy formulation.

[Working Paper] [NBER WP][Sloan Grant][Tweetstorm summary]

The Streetlight Effect in Data-Driven Exploration

With Johannes Hoelzemann, Gustavo Manso, and Abhishek Nagaraj

Abstract

We examine innovative contexts like scientific research or technical R\&D where agents must search across many potential projects of varying and uncertain returns. Is it better to possess incomplete but accurate data on the value of some projects, or might there be cases where it is better to explore on a blank slate? While more data usually improves welfare, we present a theoretical framework to understand how it can unexpectedly decrease it. In our model of the streetlight effect, we predict that when data shines a light on attractive but not optimal projects, it can severely narrow the breadth of exploration and lower individual and group payoffs. We test our predictions in an online lab experiment and show that the availability of data on the true value of one project can lower individual payoffs by 17% and reduce the likelihood of discovering the optimal outcome by 54% compared to cases where no data is provided. Suggestive empirical evidence from genetics research illustrates our framework in a real-world setting: data on moderately promising genetic targets delays valuable discoveries by 1.6 years on average. Our paper provides the first systematic examination of the streetlight effect, outlining the conditions under which data leads agents to look under the lamppost rather than engage in socially beneficial exploration.

[Working Paper] [Online Appendix] [Preregistration]

Work in Progress

Empirical Search Landscapes

With Abhishek Nagaraj

Using Entitymetrics for Strategy Research: Data and Applications in the Pharmaceutical Industry

With Bikash Kumar Panda

[I3 Summer Fellows Program, NBER]

U.S. Census Administrative Data Use in Economics: Adoption, Diffusion and Impact

With Fernando Stipanicic and Abhishek Nagaraj

Born to Create and Lead? The Role of Personality Traits for Entrepreneurship and Management

With Tuomas Kari, Lukas Leucht, and Joosua Virtanen

Pre-Doctoral Publications

Patterns of innovation during the Industrial Revolution: a reappraisal using a composite indicator of patent quality

With Alessandro Nuvolari and Valentina Tartari

Explorations in Economic History, 82, 101419, 2021 [Lead article]

Winner, Bernardo Nobile Prize for the best Master's thesis using patent data

Abstract

The distinction between macro- and microinventions is at the core of recent debates on the Industrial Revolution. Yet, the empirical testing of this notion has remained elusive. We address this issue by introducing a new quality indicator for all patents granted in England in the period 1700–1850. The indicator provides the opportunity for a large-scale empirical appraisal of macro- and microinventions. Our findings indicate that macroinventions did not exhibit any specific time-clustering, while microinventions were characterized by clustering behavior. In addition, we also find that macroinventions displayed a labor-saving bias and were mostly introduced by professional engineers. These results suggest that Allen’s and Mokyr’s views of macroinventions, rather than conflicting, should be regarded as complementary.

[Paper] [Online Appendix] [Data] [Bernardo Nobile Prize]

The role of comparative advantage, endowments and technology in structural transformation

With Giovanni Dosi

in Alcorta et al. (eds), New Perspectives on Structural Change: Causes and Consequences of Structural Change in the Global Economy, 2021, Oxford: Oxford University Press

Abstract

In this chapter we discuss the role of natural resources and endowment structures on structural change. Departing from theories of trade that stress specialization according to one’s comparative advantages as the key route to development, we articulate an alternative point of view on the role of technological learning and absolute advantages for structural transformation. Ricardian adjustment processes relying on endowment-based comparative advantages are often times a misleading driver of development; rather, technological competitiveness offers a better criterion to achieve sustained economic well-being. This theoretical perspective provides useful guidance to interpret the effects of globalization and the role of natural resources relative to industrial and trade policies in shaping the process of structural change and economic development.

[Book chapter]

Only one way to skin a cat? Heterogeneity and equifinality in European national innovation systems

With Valeria Cirillo, Arianna Martinelli, and Alessandro Nuvolari

Research Policy, 48(4), 905-922, 2019

Abstract

One of the most significant results of the qualitative literature on national systems of innovation (NSIs) is that different systemic arrangements (i.e. configurations of actors and institutions) can deliver similar levels of innovative performance. Using factor analysis on a novel dataset of 29 quantitative indicators of innovative activities we provide an empirical characterization of the structure of European NSIs over the last ten years. Our results cast doubt on the empirical significance of the “equifinality” of heterogeneous systemic arrangements in the context of NSI. Innovation systems show inherent complexity, which leads to a high level of complementarity among their constituent components and configuration. This result implies that successful innovation policies should be systemic, leaving little flexibility in policy design and scope.

[Paper]



Matteo Tranchero

Contact

+1 (341) 400-3543 | m.tranchero@berkeley.edu |

Address

Haas School of Business, University of California, Berkeley