| Literature DB >> 31695047 |
Gaétan de Rassenfosse1, Jan Kozak1, Florian Seliger2.
Abstract
The dataset provides geographic coordinates for inventor and applicant locations in 18.8 million patent documents spanning over more than 30 years. The geocoded data are further allocated to the corresponding countries, regions and cities. When the address information was missing in the original patent document, we imputed it by using information from subsequent filings in the patent family. The resulting database can be used to study patenting activity at a fine-grained geographic level without creating bias towards the traditional, established patent offices.Entities:
Year: 2019 PMID: 31695047 PMCID: PMC6834584 DOI: 10.1038/s41597-019-0264-6
Source DB: PubMed Journal: Sci Data ISSN: 2052-4463 Impact factor: 6.444
Fig. 1Schematic Overview of the Data Generation Process.
Included inventor and applicant countries, and corresponding country codes.
| Country code | Country | Country code | Country |
|---|---|---|---|
| AU | Australia | LV | Latvia |
| AT | Austria | LI | Liechtenstein |
| BE | Belgium | LT | Lithuania |
| BR | Brazil | LU | Luxembourg |
| BG | Bulgaria | MT | Malta |
| CA | Canada | MX | Mexico |
| CL | Chile | NL | Netherlands |
| CN | China | NZ | New Zealand |
| HR | Croatia | NO | Norway |
| CZ | Czech Republic | PL | Poland |
| DK | Denmark | PT | Portugal |
| EE | Estonia | RO | Romania |
| FI | Finland | RU | Russian Federation |
| FR | France | SK | Slovak Republic |
| DE | Germany | SI | Slovenia |
| GR | Greece | ZA | South Africa |
| HU | Hungary | KR | South Korea |
| IS | Iceland | ES | Spain |
| IN | India | SE | Sweden |
| IE | Ireland | CH | Switzerland |
| IL | Israel | TR | Turkey |
| IT | Italy | GB | United Kingdom |
| JP | Japan | US | United States |
Selected examples of address fields in PATSTAT.
| person_address | address_1 | address_2 | address_3 | address_4 | address_5 |
|---|---|---|---|---|---|
| Janssen Pharmaceutica N.V., Turnhoutseweg 30, B-2340 Beerse | Janssen Pharmaceutica N.V. | Turnhoutseweg 30 | B-2340 Beerse | ||
| Université de Geneve (UNIGE), Faculty of Medicine, Dept. of Pathology and Immunology, 1 Rue Michel Servet,CH-1211 Geneva | Université de Geneve (UNIGE) | Faculty of Medicine | Dept. of Pathology and Immunology | 1 Rue Michel Servet | CH-1211 Geneva |
| John F. Welch Technology Centre Pvt. Ltd., Plot 22, EPIP, Phase II, Hoodi Village, Whitefield Road, 560066 Bangalore, Karnataka | John F. Welch Technology Centre Pvt. Ltd, | Plot 22, EPIP, Phase II, Hoodi Village | Whitefield Road | 560066 Bangalore, Karnataka |
Selected examples of address fields in the DPMA data.
| Type | Example of typical field |
|---|---|
| Inventor | Mospak, Christian, Dipl.-Ing. (FH), 71069 Sindelfingen |
| Applicant | SICAN Gesellschaft für Silizium-Anwendungen und CAD/CAT Niedersachsen mbH, 30419 Hannover |
Systematic geolocalization problems and their treatment.
| Problem | Quantity | Treatment |
|---|---|---|
| The request yields another country than in the original queried address. | 1.4% of all results | Correct country codes or use coordinates from other data sources |
| The postal code in the returned result is significantly different from the postal code in the queried address. | 0.9% of all results from addresses with postal codes | Use coordinates from other data sources if API gave back a wrong location as likely consequence of wrong postal codes |
| The request yields more than two results. | 1.3% of all results | Use coordinates from other data sources if the geolocalization service found more than two results |
| The request yields a location that is much more precise than the information in the queried address. | 0.5% of the relevant results | Use coordinates from other data sources if the string length of the address in the geolocalization results was two to three times longer than the queried address. |
List of variables.
| Variable name | Description |
|---|---|
| appln_id | application identifier from PATSTAT, which identifies the first filing |
| patent_office | patent office where the first filing was filed |
| filing_date | filing date of first filing |
| lat | latitude |
| lng | longitude |
| ctry_code | country code |
| name_0 | country |
| name_1 | 1st administrative area |
| name_2 | 2nd administrative area |
| name_3 | 3rd administrative area |
| name_4 | 4th administrative area |
| name_5 | 5th administrative area |
| city | city name (in the United States: county name) |
| coord_source | geolocalization: information on latitude and longitude comes from a geolocalization web service geonames: information on latitude and longitude comes from geonames.org patentsView: information on latitude and longitude comes from PatentsView |
| source | source where information on latitude/longitude comes from |
| 1: information comes from the first filing itself | |
| 2: information comes from direct equivalent | |
| 3: information comes from other subsequent filings | |
| 4: information comes from the applicant’s location in first filings | |
| 5: information comes from the applicant’s location in the equivalent | |
| type | 6: information comes from the applicant’s location in other subsequent filings |
priority: first filing is a Paris Convention priority pct: first filing is an international application (not claiming a Paris Convention priority filing) continual: first filing is a parent filing (and not a Paris Convention priority) tech_rel: first filing is based on a technical relationship (and not a Paris Convention priority) | |
| single: singletons, i.e. filings without further family members (and thus without subsequent filings) |
Fig. 2Share of information retrieved from different sources for selected inventor countries (upper panel: filing years 2000 to 2004; lower panel: filing years 2005 to 2009).
| Measurement(s) | patent • geographic location |
| Technology Type(s) | digital curation • geocoding |
| Factor Type(s) | geographic location • source • year |