发布时间:2023/12/14 12:52:57 作者:申玉鑫,刘晓明,肖逸,余德平* 【字体:
大 中 小】
申玉鑫,刘晓明,肖逸,余德平*
(四川大学 机械工程学院,四川 成都 610065)
摘要:针对在管道运输和航空航天领域常见的大口径轴孔装配任务,设计一种基于PPO算法的装配控制方法。首先,建立强化学习算法与装配环境交互训练框架,设计两个网络用于拟合装配策略和评估值函数;其次,设计机器人输出的动作空间与装配环境输出的状态空间,保证学习过程中的有效探索;然后,设计非线性奖励函数以确保训练过程的快速收敛;最后,搭建基于MuJoCo物理引擎的机器人大口径轴孔装配仿真平台,并在仿真平台上对设计算法进行训练和实验。结果表明:基于PPO算法的训练框架能保证训练过程的快速收敛,改进后的优势函数估计方法提升了训练过程的稳定性,训练模型不仅能保证轴插入孔和法兰面贴合,还能保证装配过程的安全性。
关键词:装配;PPO算法;MuJoCo仿真
中图分类号:TP249 文献标志码:A doi:10.3969/j.issn.1006-0316.2023.12.012
文章编号:1006-0316 (2023) 12-0074-07
Robotic Peg-in-Hole Assembly Control and Simulation Based on PPO Algorithm
SHEN Yuxin,LIU Xiaoming,XIAO Yi,YU Deping
( School of Mechanical Engineering, Sichuan University, Chengdu 610065, China )
Abstract:A PPO algorithm-based assembly control method is proposed for the large-diameter peg-in-hole assembly which is common in pipeline transportation and aerospace fields. Firstly, the interactive training framework between the reinforcement learning algorithm and assembly environment is established, and two networks are designed to fit the assembly strategy and the evaluation value function respectively. Secondly, the action space of robot output and the state space of assembly environment output are designed to ensure the effective exploration in the learning process. Then, a nonlinear reward function is designed to ensure the fast and stable convergence of the training process. Finally, a simulation platform for robot assembly of large-diameter peg-in-hole assembly based on MuJoCo physics engine is built, and the designed algorithm is trained and tested on the simulation platform. The results show that the training framework based on PPO algorithm can ensure the fast convergence of the training process, and the improved dominance function estimation method can improve the stability of the training process. The training model can not only ensure the fit of the shaft insertion hole and the flange surface, but also ensure the safety of the assembly process.
Key words:assembly;PPO algorithm;MuJoCo simulation
———————————————
收稿日期:2023-07-16
作者简介:申玉鑫(1998-),男,四川遂宁人,硕士研究生,主要研究方向为机器人自动化,E-mail:shenyuxin2021@163.com。*通讯作者:余德平(1984-),男,江西抚州人,博士,教授,主要研究方向为智能与自动化装备,E-mail:williamydp@scu.edu.cn。