Accesso libero

Advancing Large Language Model Agent via Iterative Contrastive Trajectory Optimization

,  e   
31 dic 2024
INFORMAZIONI SU QUESTO ARTICOLO

Cita
Scarica la copertina

Figure 1.

Iterative Contrastive Trajectory Optimization (ICTO) Framework
Iterative Contrastive Trajectory Optimization (ICTO) Framework

Figure 2.

Iterative Learning Progress of ICTO
Iterative Learning Progress of ICTO

Figure 3.

Case Study of WebShop
Case Study of WebShop

Figure 4.

Iterative Learning Progress of ICTO
Iterative Learning Progress of ICTO

Comparison of ICTO and Baseline Performances

Method WebShop ScienceWorld ALFWorld
SFT 63.1 70.0% 12.5
ETO 67.4 72.3% 11.2
IPR 68.3 73.8% 10.8
RLCD 65.8 71.5% 11.5
NAT 66.5 72.0% 11.0
ICTO (ours) 70.2 75.6% 9.7

Experimental environment

Component Details
CPU Intel Core i9-10900K
GPU NVIDIA Tesla V100 PCIe 32GB
LLM Agent Model Llama2-7B Chat
Optimizer AdamW Optimizer
Experiment Management Tool DeepSpeed

Generalization Performance of ICTO on OOD Tasks

Method WebShop ScienceWorld ALFWorld
SFT 52.3 60.0% 15.0
ETO 55.8 62.0% 14.2
IPR 57.1 63.5% 13.8
RLCD 54.2 61.0% 14.5
NAT 56.0 62.5% 14.0
ICTO (ours) 59.5 66.0% 12.5

Ablation Study of ICTO Modules

Training Scheme WebShop ScienceWorld ALFWorld
w/o Contrastive Learning 64.2 67.8% 11.6
w/o Behavioral Cloning 60.7 62.5% 13.1
Iteration=1 66.1 69.2% 12.8
Iteration=2 68.5 70.6% 12.3
Iteration=3 70.9 72.3% 11.7
Iteration=4 72.3 73.1% 11.0
Iteration=5 72.0 72.8% 10.5
Lingua:
Inglese
Frequenza di pubblicazione:
4 volte all'anno
Argomenti della rivista:
Informatica, Informatica, altro