中圖分類號:TP181 文獻標志碼:A DOI: 10.16157/j.issn.0258-7998.256451 中文引用格式: 安棟,王媛媛,宋寧寧,等. 強化學習評估指標的系統(tǒng)性分析與優(yōu)化研究[J]. 電子技術應用,2025,51(10):17-23. 英文引用格式: An Dong,Wang Yuanyuan,Song Ningning,et al. Systematic analysis and optimization research on reinforcement learning evaluation metrics[J]. Application of Electronic Technique,2025,51(10):17-23.
Systematic analysis and optimization research on reinforcement learning evaluation metrics
An Dong1,Wang Yuanyuan2,Song Ningning3,Dai Chao2,Liu Zhiyin2
1.National Computer System Engineering Research Institute of China;2.China Information Security Research Academy Co.,Ltd.;3.China Electronics Corporation
Abstract: Reinforcement learning evaluation metrics, serving as core tools for measuring the performance of agents and guiding algorithm optimization, face key challenges such as the singularity of metrics, environmental dependence, and the lack of interpretability in practical applications. This paper systematically analyzes the classification framework of existing evaluation metrics, proposes a multi-dimensional metric system based on performance, learning process, strategy, robustness, and efficiency, and explores its applicability and limitations in different task scenarios (such as sparse reward and high-dimensional state space). The study indicates that traditional metrics are prone to overlooking the requirements of safety, efficiency, and alignment with human preferences in complex environments, and there is a need to design evaluation methods that integrate multiple objectives in combination with the characteristics of tasks. For future research, this paper suggests focusing on directions such as multi-objective Pareto optimization, reward modeling based on human feedback, and the quantification of exploration efficiency in sparse reward environments, so as to enhance the comprehensiveness and interpretability of evaluations. By combining theoretical analysis with practical cases, this paper provides methodological support for the standardization of the reinforcement learning evaluation system and its adaptation across different fields, thus promoting its efficient implementation in complex scenarios.
Key words : reinforcement learning;evaluation metrics;explainability;reward
引言
強化學習作為機器學習的重要分支,通過智能體與環(huán)境的交互學習最優(yōu)策略,已在游戲智能[1-2]、機器人控制[3-4]、自動駕駛[5]、生物醫(yī)療[6]等領域取得了顯著成果。強化學習越來越被重視,圖1通過每年發(fā)表論文數量展示強化學習領域的增長趨勢(數據來自 Web of Science?)。