Open Access

A Multi-match Approach to the Author Uncertainty Problem

,  and   
Jun 07, 2019

Cite
Download Cover

Figure 1

The progression of user inputs (starting in the top right and ending in the bottom right).
The progression of user inputs (starting in the top right and ending in the bottom right).

Match Results for Zhong Lin Wang_

Match fieldOriginal dataset sizeReduced dataset sizePercent reduction
Source4,8103,56026%
Affiliation4,8104,14714%
Web of Science Category4,8104,17313%
Co-authors4,8104,17513%
Title4,8103,79421%
ISSN4,8103,55526%
Publication Year4,8104,34910%
Cited References4,8104,34710%
Email4,8103,34930%
All of the above4,8102,89440%

Match Results for Haesun Park_

Match fieldOriginal dataset sizeReduced dataset sizePercent reduction
Co-authors23,2988,52763%
Source23,29814,17439%
Affiliation (Organization Only)23,29820,86710%
Title23,2984,97879%
ISSN23,29814,76237%
1st Author23,29814,79137%
ORCID iD23,2983,28886%
Researcher ID23,2982,90388%
Web of Science Category23,29821,9066%
Publication Year23,29823,0941%
Country23,29823,0441%
All of the above23,2982,31990%

Comparison of Disambiguation Results for the Three Cases_

PorterWangPark
Top 3 list reduction matchSource (34%), Co-authorsEmail (30%), SourceResearcher ID (88%),
fields(32%) and ISSN (31%)(26%), ISSN (26%)ORCID (86%), Title (79%)
Original dataset size3,6174,81023,298
Total list reduction52%40%90%
Number of true positives lost150

Match Results for Alan Porter_

Match fieldOriginal dataset sizeReduced dataset sizePercent reduction
Source3,6172,37734%
Co-authors3,6172,46532%
Title3,6172,82622%
ISSN3,6172,48631%
Publication Year3,6172,90520%
Affiliation3,617NANA
Cited References3,617NANA
Email3,617NANA
All of the above3,6171,75052%
Language:
English
Publication timeframe:
4 times per year
Journal Subjects:
Computer Sciences, Information Technology, Project Management, Databases and Data Mining