Why does LLM-generated code still fail to compile or run correctly even with advanced models?

Why does LLM-generated code still fail to compile or run correctly even with advanced models?

Shanghai Jiao Tong University and Tencent present AP2O-Coder, a new approach that analyzes specific error types in failed code and trains AI to fix them step by step, adapting to each modelโ€™s weaknesses.

The method improves code generation performance by up to 3% in pass@k while using significantly less preference data across multiple LLM families including Llama, Qwen, and DeepSeek.

AP2O-Coder: Adaptively Progressive Preference Optimization for Reducing Compilation and Runtime Errors in LLM-Generated Code

Paper: https://arxiv.org/pdf/2510.02393 Code: https://github.com/TsingZ0/AP2O

Our report: https://mp.weixin.qq.com/s/gmVmOFOjk51WQZJsjA9x-g

๐Ÿ“ฌ PapersAccepted by Jiqizhixin

๐Ÿ”— ์›๋ณธ ๋งํฌ

๋ฏธ๋””์–ด

image