Accès libre

A Use Case of Patent Classification Using Deep Learning with Transfer Learning

À propos de cet article

Citez

Figure 1

Evolution of patent applications and grants at IP5 offices from 2009 to 2019 (IP5, 2019).
Evolution of patent applications and grants at IP5 offices from 2009 to 2019 (IP5, 2019).

Figure 2

Patents applications in Portugal since 2010.
Patents applications in Portugal since 2010.

Figure 3

Number of patents by section.
Number of patents by section.

Figure 4

Frequency of classes by the number of patents.
Frequency of classes by the number of patents.

Figure 5

Boxplot of text size by section with a) outliers and b) without outliers.
Boxplot of text size by section with a) outliers and b) without outliers.

Figure 6

Wordcloud with the more frequent words by section.
Wordcloud with the more frequent words by section.

Figure 7

Applied methodology.
Applied methodology.

Figure 8

Classes G06 most frequent words in parallel to most similar classes.
Classes G06 most frequent words in parallel to most similar classes.

Mean F1 score (cross-validation k=5) with different feature engineering methods.

Model F1_weighted (%)
LinearSVC (baseline) 60.8
CNN 50
DistilBERT Multilingual 50.1
BiLSTM 57
ULMFiT 57
BERT-Base Multilingual 59.5
BERTimbau 63.6

Precision, Recall and F1 score by Section.

Section Precision Recall F1
A 0.7923 0.7 0.7433
H 0.6805 0.7248 0.7019
C 0.6515 0.7427 0.6941
E 0.5981 0.5923 0.5952
F 0.5446 0.5393 0.5419
B 0.5291 0.5388 0.5339
G 0.4903 0.4633 0.4764
D 0.5098 0.4041 0.4509

Patent classification related studies.

Authors Feature Engineering Algorithm Section Language Dataset size Number of classes
(Trappey et al., 2006) Key phrases frequency based on TF-IDF Neural Networks full document English 300 training124 test 9
(Derieux et al., 2010) Terms extraction and semantic relation SVM full document English, German, French 985 training2000 test 630
(Trappey et al., 2013) Key phrases frequency based on TF-IDF Ontology-Based Neural Network full document English 333 training160 test 23
(Zhang, 2014) - SVM - English 5000 5
(Wu et al., 2016) SOM, KPCA SVM full document English 60.000 7
(Li et al., 2018) Skip-gram CNN title and abstract English 742.097 training 1350 test 637
(Risch & Krestel, 2019) Domain-specific FastText word embeddings Bi-directional GRU title and abstract English ~1.7M training~300.000 test 637
(Abdelgawad et al., 2020) GloVe, Word2Vec, FastText Hierarchical SVM and CNN with BOHB (Bayesian Optimization hyperband) title, abstract, description, and claims English 75.000 training28.926 test 451
(Lee & Hsiang, 2020) - BERT-Base claims English 1,950,247 training150,000 test 632

F1 score on the test set.

Model F1_weighted (%)
LinearSVC (baseline) 60.8
CNN 50
DistilBERT Multilingual 50.1
BiLSTM 57
ULMFiT 57
BERT-Base Multilingual 59.5
BERTimbau 63.6

IPC Areas of Technology.

Section Description
A Human Necessities
B Performing Operations; Transporting
C Chemistry; Metallurgy
D Textiles; Paper
E Fixed Constructions
F Mechanical Engineering; Lighting; Heating; Weapons; Blasting Engines or Pumps
G Physics
H Electricity

Features used in the analysis.

Feature Description
id Patent internal identification
Title Descriptive name of the patent
Claims The legal scope of the invention, including delimitations and application field
Abstract A brief description of the invention presented in the patent
Section IPC 1st level classification code
Class IPC 2nd level classification code
Subclass IPC 3rd level classification code
Main group IPC 4th level classification code
Subgroup IPC 5th level classification code
eISSN:
2543-683X
Langue:
Anglais