Working paper

A flexible, scaleable approach to the international patent 'name game'

The inventors in PATSTAT are often duplicates: the same person or company may be split into multiple entries in PATSTAT, each associated to different

Publishing date
28 September 2014
Authors
Georg Zachmann

The inventors in PATSTAT are often duplicates: the same person or company may be split into multiple entries in PATSTAT, each associated to different patents. In this paper, we address this problem with an algorithm that efficiently de-duplicates the data. It needs minimal manual input and works well even on consumer-grade computers. Comparisons between entries are not limited to their names, and thus this algorithm is an improvement over earlier ones that required extensive manual work or overly cautious clean-up of the names.

Source code on Github.

Download data.

About the authors

  • Georg Zachmann

    Georg Zachmann is a Senior Fellow at Bruegel, where he has worked since 2009 on energy and climate policy. His work focuses on regional and distributional impacts of decarbonisation, the analysis and design of carbon, gas and electricity markets, and EU energy and climate policies. Previously, he worked at the German Ministry of Finance, the German Institute for Economic Research in Berlin, the energy think tank LARSEN in Paris, and the policy consultancy Berlin Economics.

Related content

Dataset

European natural gas imports

This dataset aggregates daily data on European natural gas import flows and storage levels.

Georg Zachmann, Ben McWilliams, Ugnė Keliauskaitė and Giovanni Sgaravatti