Accès libre

Research on Machine Learning Program Generation Algorithm Based on AORBCO

À propos de cet article

Citez

Figure 1.

Overall framework diagram of program generation capability
Overall framework diagram of program generation capability

Figure 2.

AD-EKG Overall Framework
AD-EKG Overall Framework

Figure 3.

RippleNet calculation process
RippleNet calculation process

Figure 4.

TCF Calculation Process
TCF Calculation Process

Figure 5.

A Code Generation Algorithm Framework Based on Knowledge Enhancement
A Code Generation Algorithm Framework Based on Knowledge Enhancement

Figure 6.

Diagram of DPR-based enhancer architecture
Diagram of DPR-based enhancer architecture

Figure 7.

original input
original input

Figure 8.

Retrieving information Example
Retrieving information Example

Figure 9.

Text Replacement Example
Text Replacement Example

Figure 10.

Code Generation Example
Code Generation Example

Figure 11.

Top-K ablation experiments of AD-EKG under different variants
Top-K ablation experiments of AD-EKG under different variants

Figure 12.

Example plot of a sample dataset
Example plot of a sample dataset

Dataset statistics

Domain knowledge graph Dataset
Number of objects 5262 Number of dataset objects 233
Relationship types 48 Number of algorithm objects 1448
Number of triples 14774 Number of interactions 1485
Average number of descriptive words 50.5 Sparsity 0.00440

Cloud Platform Experimental Environment Information

Name Configuration information
operating system Ubuntu 20.04.5 LTS
memory 64G
graphics card NVIDIA A100 40GB
development language Python 3.8
Deep learning platform Pytorch 2.0.0

Statistical data on Q&A dataset

Dataset Attribute
source language English
target language Python
quantity 121
Average number of words in the source language 52
Maximum number of words in the source language 69
Average number of words in the target language 1365
Maximum number of words in the target language 1593

Comparative Experiment (%)

label model Parameter quantity CodeBLEU ROUGE-1 ROUGE-2 ROUGE-L
1 CodeT5 770M 12.62 7.62 3.02 5.29
2 CodeT5-EKG 770M 23.93 13.52 4.62 10.02
3 CodeT5 2B 32.83 20.04 6.43 14.32
4 CodeT5-EKG 2B 47.94 24.30 9.22 17.60
5 CodeT5 6B 46.27 32.96 14.21 25.68
6 CodeT5-EKG 6B 51.12 35.58 16.11 27.54

Pre-training dataset

Language Sample quantity
Ruby 2,119,741
JavaScript 5,856,984
Go 1,501,673
Python 3,418,376
Java 10,851,759
PHP 4,386,876
C 4,187,467
C++ 2,951,945
C# 4,119,796

CTR prediction comparison experiment (%)

Model AUC Precision Recall F1-score
KGNN-LS 80.01 71.63 76.10 73.80
KGCN 71.62 62.78 64.38 63.57
RippleNet 82.55 69.43 86.91 77.19
TCF 82.16 78.24 82.81 80.46
AD-EKG 88.20 83.80 86.82 85.28

Comparison with other models (%)

label model Parameter quantity CodeBLEU ROUGE-1 ROUGE-2 ROUGE-L
1 CodeT5-EKG 770M 23.93 13.52 4.62 10.02
2 CodeT5-EKG 2B 47.94 24.30 9.22 17.60
3 CodeT5-EKG 6B 51.12 35.58 16.11 27.54
4 CodeGen-Mono 2B 34.08 20.23 6.52 14.94
5 GPT-Neo 2.7B 19.82 12.57 2.79 11.28
6 InstructCodeT5 16B 43.71 25.00 9.63 21.06

Experimental environment information

Name Configuration information
operating system Windows 11
RAM 16G
Graphics card NVIDIA GeForce RTX 3070 8G
development language Python 3.7.8
Deep learning platform TensorFlow 2.2.0
eISSN:
2470-8038
Langue:
Anglais
Périodicité:
4 fois par an
Sujets de la revue:
Computer Sciences, other