Building a Sample Frame of SMEs Using Patent, Search Engine, and Website Data

This research outlines the process of building a sample frame of US SMEs. The method starts with a list of patenting organizations and defines the boundaries of the population and subsequent frame using free to low-cost data sources, including search engines and websites. Generating high-quality data is of key importance throughout the process of building the frame and subsequent data collection; at the same time, there is too much data to curate by hand. Consequently, we turn to machine learning and other computational methods to apply a number of data matching, filtering, and cleaning routines. The results show that it is possible to generate a sample frame of innovative SMEs with reasonable accuracy for use in subsequent research: Our method provides data for 79% of the frame. We discuss implications for future work for researchers and NSIs alike and contend that the challenges associated with big data collections require not only new skillsets but also a new mode of collaboration.

Language:: English

Publication timeframe:: 4 times per year
Journal Subjects:: Mathematics, Probability and Statistics

Journal RSS Feed

Building a Sample Frame of SMEs Using Patent, Search Engine, and Website Data

Sanjay K. Arora

Sarah Kelley

Sarvothaman Madhavan

Published Online: Mar 13, 2021

Page range: 1 - 30

Received: Sep 01, 2019

Accepted: Sep 01, 2020

DOI: https://doi.org/10.2478/jos-2021-0001

KeywordsSample frame, administrative and big data, machine learning, bias, small and medium-sized enterprises

© 2020 Sanjay K. Arora et al., published by Sciendo

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 3.0 License.

Keywords
Sample frame, administrative and big data, machine learning, bias, small and medium-sized enterprises