Advancing Large Language Model Agent via Iterative Contrastive Trajectory Optimization
, oraz
31 gru 2024
O artykule
Data publikacji: 31 gru 2024
Zakres stron: 19 - 27
DOI: https://doi.org/10.2478/ijanmc-2024-0033
Słowa kluczowe
© 2024 Chengang Jing et al., published by Sciendo
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Figure 1.

Figure 2.

Figure 3.

Figure 4.

Comparison of ICTO and Baseline Performances
Method | WebShop | ScienceWorld | ALFWorld |
---|---|---|---|
SFT | 63.1 | 70.0% | 12.5 |
ETO | 67.4 | 72.3% | 11.2 |
IPR | 68.3 | 73.8% | 10.8 |
RLCD | 65.8 | 71.5% | 11.5 |
NAT | 66.5 | 72.0% | 11.0 |
Experimental environment
Component | Details |
---|---|
CPU | Intel Core i9-10900K |
GPU | NVIDIA Tesla V100 PCIe 32GB |
LLM Agent Model | Llama2-7B Chat |
Optimizer | AdamW Optimizer |
Experiment Management Tool | DeepSpeed |
Generalization Performance of ICTO on OOD Tasks
Method | WebShop | ScienceWorld | ALFWorld |
---|---|---|---|
SFT | 52.3 | 60.0% | 15.0 |
ETO | 55.8 | 62.0% | 14.2 |
IPR | 57.1 | 63.5% | 13.8 |
RLCD | 54.2 | 61.0% | 14.5 |
NAT | 56.0 | 62.5% | 14.0 |
Ablation Study of ICTO Modules
Training Scheme | WebShop | ScienceWorld | ALFWorld |
---|---|---|---|
w/o Contrastive Learning | 64.2 | 67.8% | 11.6 |
w/o Behavioral Cloning | 60.7 | 62.5% | 13.1 |
Iteration=1 | 66.1 | 69.2% | 12.8 |
Iteration=2 | 68.5 | 70.6% | 12.3 |
Iteration=3 | 70.9 | 72.3% | 11.7 |
Iteration=4 | 72.3 | 73.1% | 11.0 |
Iteration=5 | 72.0 | 72.8% | 10.5 |