e-CSTI

Evidence data platform constructed
by Council for Science, Technology and Innovation

To main text
Font size
Language

ANALYSIS
Analysis of relation between research output and researcher attribution

1. Purpose of “visualization”

A system has been constructed to visualize how government research funding is linked to publications and other outputs. Specifically, it visualizes the relation between research output and researcher attribution by linking data on academic articles compiled in commercial bibliographic information databases, and information on researchers in the Cross-Ministerial Research and Development Management System (e-Rad). Implications obtained from this system will provide important insights for more effective methods of funding allocation. It is also expected to help relevant ministries and agencies to upgrade policy making processes, and universities and research and development institutions to upgrade their academic management.

2. “Visualization” method

2.1 Bibliographic information databases

The following three types of bibliographic information databases were used in this analysis.

  • A) Dimensions (Digital Science)
    The data used were from papers published between 2008 and 2018, and extracted by the Cabinet Office from Dimensions’ worldwide bulk data in July 2018.
  • B) Scopus (Elsevier)
    The data used were from papers by Japanese research institutes that were published between 2008 and 2018 and extracted by Elsevier in December 2019.
  • C) Web of Science (Clarivate Analytics)
    The data used were from papers by Japanese research institutes published between 2008 and 2019 and extracted by Clarivate Analytics in October 2019. The scope of papers covered natural sciences (SCIE, CPCI-S), humanities, and social sciences (SSCI, A&HCI, CPCI-SSH), academic books (BKCI-S, BKCI-SSH) and regions or growth areas (ESCI).

2.2 The bibliographic information database, author IDs, and e-Rad researcher IDs

e-Rad researcher IDs , surnames, and first names (in Japanese phonetic spelling “Katakana” and English), and research institute names of affiliation were extracted from e-Rad database. In case if there was not data on English names, it was complemented by an heuristic algorithm. The English name (surname / first name) and the name of affiliated institute are further used for detecting linkages between e-Rad researcher ID data and the researcher ID data on bibliographic databases. All the author IDs had exact matches, except for the case of a researcher with multiple author IDs, or researchers with the same last and first names in the same institution, where a single e-Rad researcher ID was linked to multiple author IDs .

論文著者属性の推定手法
Fig. 1 Method for inferring attributes of authors of papers

The procedure for each bibliographic information database is as follows:

  • A) Dimensions
    The Global Research Identifier Database (http://www.grid.ac), an open-access database provided by Digital Science, was used to obtain a table of author IDs (GRID IDs) and names (both English and Japanese) of affiliated Japanese research institutes. Algorithm-based matching between e-Rad researcher IDs and GRID IDs was performed. For the major institutes that could not be linked owing to institute name discrepancies, new columns for name identification were manually added in the table and further manual and heuristic based matching was carried out using these columns, which improved the match rate and contributed to increase matching coverage.Thereafter, author surnames (English), author first names (English), and affiliated institutes (GRID IDs) were exactly matched, and currently effective author IDs were obtained via API. As of March 2019, which was when this work was completed, the Hamamatsu University School of Medicine (a national university corporation) had not been assigned a GRID ID, and thus, its linking rate of researcher IDs and author IDs according to this method was 0% (at the time of writing, a GRID ID was assigned to the same university corporation).
  • B) Scopus
    Algorithm-based matching was performed between a table of institute names (English), institute IDs provided by Elsevier, and research institute names compiled in e-Rad, and then a field for disambiguating institute names was manually created for major institutes that could not be linked owing to institute name variants, thereby improving the concordance rate and obtaining institute IDs in e-Rad from Scopus data. Author IDs with exact matches of full names (English) and affiliated institutes were obtained using SQL after creating a one author-name column in e-Rad by connecting author first names (English) and author surnames (English) via half-width spaces in order to adjust the author name format to that of Scopus data, whose records have author first names and surnames in one column.
  • C) Web of Science
    Institutes that had not been assigned institute IDs or had variant spellings were often found in data from papers provided by Clarivate Analytics. Algorithm-based matching was performed on the names of research institutes (English) compiled in e-Rad, and then a field for gathering institute names was manually created for the major institutes that could not be linked owing to institute name mismatches, thereby improving the concordance rate. Thereafter, author IDs with matching author surnames (English), author first names (English), and affiliated institutes were obtained using SQL. Note that the author IDs obtained above are IDs assigned for the purpose of identifying authors in the Web of Science, not the “Web of Science Researcher IDs” that are registered by researchers themselves and released by Publons. The IDs are valid only in the database and are not confirmed by the authors themselves.

2.3 Linking bibliographic information and e-Rad researcher attributes

A correspondence table of e-Rad researcher IDs and author IDs was used to extract bibliographic information corresponding to each researcher from each database. The attributes of authors of papers were inferred or obtained, as shown in Table 1.

論文著者の属性
Table 1 Attributes of authors of papers

Indices used in the visualization are shown in Table 2.

可視化に用いた指標
Table 2 Explanation of each index

3. “Visualization" results

Attribute data for all researchers in Japan linked to author IDs in bibliographic information databases are shown for each bibliographic information database.

書誌情報データベースの著者 ID と紐づけられた 日本全体の研究者の属性データ

The results of the “visualization” analysis of the relation between the research output and researcher attributes of all researchers in Japan are shown below.

A) Dimensions

日本全体研究者の論文生産と出版時年齢の関係 Dimensions
日本全体研究者の論文生産と出版時年齢の関係 Dimensions
日本全体研究者の任期有無と論文生産の関係 Dimensions
日本全体研究者の性別と論文生産の関係 Dimensions
日本全体研究者の機関間移動の有無と論文生産の関係 Dimensions
日本全体研究者の論文生産と所属機関の類型との関係 Dimensions

B) Scopus

日本全体研究者の論文生産と出版時年齢の関係 Scopus
日本全体研究者の論文生産と出版時年齢の関係 Scopus
日本全体研究者の任期有無と論文生産の関係 Scopus
日本全体研究者の性別と論文生産の関係 Scopus
日本全体研究者の機関間移動の有無と論文生産の関係 Scopus
日本全体研究者の論文生産と所属機関の類型との関係 Scopus

C) Web of Science

日本全体研究者の論文生産と出版時年齢の関係 WoS
日本全体研究者の論文生産と出版時年齢の関係 WoS
日本全体研究者の任期有無と論文生産の関係 WoS
日本全体研究者の性別と論文生産の関係 WoS
日本全体研究者の機関間移動の有無と論文生産の関係 WoS
日本全体研究者の論文生産と所属機関の類型との関係 WoS

4.Material

The above reports can be obtained in pdf format.

Links

page top