A Use Case of Patent Classification Using Deep Learning with Transfer Learning

Patent classification is one of the areas in Intellectual Property Analytics (IPA), and a growing use case since the number of patent applications has been increasing worldwide. We propose using machine learning algorithms to classify Portuguese patents and evaluate the performance of transfer learning methodologies to solve this task.

Design/methodology/approach

We applied three different approaches in this paper. First, we used a dataset available by INPI to explore traditional machine learning algorithms and ensemble methods. After preprocessing data by applying TF-IDF, FastText and Doc2Vec, the models were evaluated by cross-validation in 5 folds. In a second approach, we used two different Neural Networks architectures, a Convolutional Neural Network (CNN) and a bi-directional Long Short-Term Memory (BiLSTM). Finally, we used pre-trained BERT, DistilBERT, and ULMFiT models in the third approach.

Findings

BERTTimbau, a BERT architecture model pre-trained on a large Portuguese corpus, presented the best results for the task, even though with a performance of only 4% superior to a LinearSVC model using TF-IDF feature engineering.

Research limitations

The dataset was highly imbalanced, as usual in patent applications, so the classes with the lowest samples were expected to present the worst performance. That result happened in some cases, especially in classes with less than 60 training samples.

Practical implications

Patent classification is challenging because of the hierarchical classification system, the context overlap, and the underrepresentation of the classes. However, the final model presented an acceptable performance given the size of the dataset and the task complexity. This model can support the decision and improve the time by proposing a category in the second level of ICP, which is one of the critical phases of the grant patent process.

Originality/value

To our knowledge, the proposed models were never implemented for Portuguese patent classification.

Idioma:: Inglés

Calendario de la edición:: 4 veces al año
Temas de la revista:: Informática, Tecnologías de la información, Gestión de proyectos, Bases de datos y minería de datos

RSS Feed de revista

A Use Case of Patent Classification Using Deep Learning with Transfer Learning

Roberto Henriques

Adria Ferreira

Mauro Castelli

Categoría del artículo: Research Paper

Publicado en línea: 12 ago 2022

Páginas: 49 - 70

Recibido: 12 mar 2022

Aceptado: 04 jul 2022

DOI: https://doi.org/10.2478/jdis-2022-0015

Palabras claveNatural Language Processing (NLP), Patent classification, Transfer Learning, Bi-directional Encoder Representations for Transformers (BERT)

© 2022 Roberto Henriques et al., published by Sciendo

This work is licensed under the Creative Commons Attribution 4.0 International License.

Purpose

Design/methodology/approach

Findings

Research limitations

Practical implications

Originality/value

Palabras clave
Natural Language Processing (NLP), Patent classification, Transfer Learning, Bi-directional Encoder Representations for Transformers (BERT)