Data Projects

Project 1: Economics Research in the U.S., 1990-2019

With Abhishek Nagaraj

 

Together with Abhishek Nagaraj, we received a large grant from the Alfred P. Sloan Foundation to build and distribute a comprehensive and disambiguated dataset of 250,000+ economics-related publications by over 19,000 academics affiliated with U.S. research institutions from 1990-2019. This database constitutes a near-census of economic research carried out in the past three decades, integrated with additional information on paper citations and fields. We used this database to estimate the impact of access to Federal Statistical Research Data Centers (FSRDCs) on economic research, and in on-going work with Fernando Stipanicic. This database is made available under the Open Database License: http://opendatacommons.org/licenses/odbl/1.0/.

[Data] [Sloan Grant]

Project 2: Knowledge Entities in Pharmaceutical Patents

With Bikash Kumar Panda (in progress)

 

Bikash Kumar Panda and I are building a large-scale, open database including all the bio-entities mentioned in USPTO patents. We plan to share these data together with a methodological paper that highlights and exemplifies the potential of entitymetrics to measure creative recombinations and science-to-technology spillovers. We are currently in the process of assembling full texts of all USPTO patents (both applications and granted) and then experimenting with sophisticated machine learning tools (e.g., BIOBert) to extract references to genes, diseases, and chemical entities. We plan to share the data by late 2023 under the Open Database License. This project is supported by the I3 Open Data Summer Fellows Program.

Project 3: 300 Year of British Patents

With Enrico Berkes (in progress)

 

In our spare time, Enrico Berkes and I are in the process of digitizing all the patents granted in England from 1617 to 1900. This period is fascinating and saw momentous economic and social change, encompassing both the First and the Second Industrial Revolution. We believe that the remarkable completeness of the corpus of patent documents will provide a unique lens to understand the determinants of the innovations that enabled modern growth. In particular, we are digitizing the full specification of each patent, as well as extracting disambiguated information about the inventors (including profession and address). This project is an ambitious extension of my Master’s thesis project that led to the creation of a quality index for English historical patents.

[Data on Patent Quality]



Matteo Tranchero

Contact

+1 (341) 400-3543 | m.tranchero@berkeley.edu |

Address

Haas School of Business, University of California, Berkeley