强化学习求解组合优化问题的研究综述

张宏立; 朱家政; 董颖超

doi:10.13568/j.cnki.651094.651316.2023.02.02.0001

您当前的位置：

首页 >

文章列表页 >

强化学习求解组合优化问题的研究综述

更新时间：2026-01-21

- 强化学习求解组合优化问题的研究综述
- Journal of Xinjiang University (Natural Science Edition in Chinese and English) Vol. 40, Issue 2, Pages: 129-141(2023)
- 作者机构：
  
  新疆大学电气工程学院
- 作者简介：
- 基金信息：
- DOI：10.13568/j.cnki.651094.651316.2023.02.02.0001
  CLC： TP181
- Published：2023
- 稿件说明：
移动端阅览
[1]张宏立,朱家政,董颖超.强化学习求解组合优化问题的研究综述[J].新疆大学学报(自然科学版)(中英文),2023,40(02):129-141.
[1]张宏立,朱家政,董颖超.强化学习求解组合优化问题的研究综述[J].新疆大学学报(自然科学版)(中英文),2023,40(02):129-141. DOI： 10.13568/j.cnki.651094.651316.2023.02.02.0001.

DOI：10.13568/j.cnki.651094.651316.2023.02.02.0001.

摘要

组合优化问题广泛的存在于生产实践的各个领域，解决组合优化问题的主要手段通常包括使用由领域专家人工设计的启发式算法以及设计成熟的求解器，按照一定顺序构建一个解决方案．而随着实际问题复杂度逐渐的增加，这类方法无法于在线求解方面取得很好的效果，得到的结果可能往往是次优的．而强化学习给出了一个很好的替代方案，通过对智能体模型的良好训练，迅速地对此类问题进行求解．故回顾了近年来将强化学习框架应用于组合优化问题的研究，对其基本原理、相关方法、应用研究进行总结和综述，并指出未来该方向亟待解决的若干问题．

Abstract

Combinatorial optimization problems are widespread in all areas of production practice

and the main means of solving combinatorial optimization problems usually include the use of heuristic algorithms manually designed by domain experts and the design of sophisticated solvers to construct a solution in a certain order. As the complexity of the actual problem gradually increases

such methods can not achieve good results in online solving

and the results obtained may often be suboptimal. And reinforcement learning gives a good alternative to solve such problems rapidly by well-training an intelligent body model. Therefore

this paper reviews the recent research on applying the reinforcement learning framework to combinatorial optimization problems

summarizes and reviews its basic principles

related methods and application studies

and points out several problems that need to be addressed in this direction in the future.

关键词

Keywords

references

BELLMAN R. A Markovian decision process[J]. Journal of Mathematics and Mechanics, 1957, 6:679-684.

章炯民,陶增乐,吴文娟.组合优化问题的神经网络解――装箱问题和背包问题的求解[J].华东师范大学学报(自然科学版),1998, 4:102-105.

ZHANG C, SONG W, CAO Z, et al. Learning to dispatch for job shop scheduling via deep reinforcement learning[J]. Advances in Neural Information Processing Systems, 2020, 33:1621-1632.

ZHUANG W W, CHEN C, QIU G X. A new deep reinforcement learning model for dynamic portfolio optimization[J]. Journal of University of Science and Technology of China, 2022, 52(11):15-28.

RUSKONE-FOURMESTRAUX A, DELMER A, LAVERGNE A, et al. Multiple lymphomatous polyposis of the gastrointestinal tract:prospective clinicopathologic study of 31 cases[J]. Gastroenterology, 1997, 112(1):7-16.

MANUAL C U. IBM ILOG CPLEX optimization studio[J]. Version, 1987, 12:1-586.

MAHER S, MILTENBERGER M, PEDROSO J P, et al. PySCIPOpt:mathematical programming in python with the SCIP optimization suite[C]//Mathematical Software-ICMS 2016:5th International Conference. Berlin:Springer, 2016.

BELLM E C, KULKARNI S R, BARLOW T, et al. The zwicky transient facility:surveys and scheduler[J]. Publications of the Astronomical Society of the Pacific, 2019, 131(1000):068003.

GRAY M A. Sage:a new mathematics software system[J]. Computing in Science and Engineering, 2008, 10(6):72-75.

DANTZIG G B. Linear programming[J]. Operations Research, 2002, 50(1):42-47.

BALAS E. Projection, lifting and extended formulation in integer and combinatorial optimization[J]. Annals of Operations Research, 2005, 140(1):125.

FLOOD M M. The traveling-salesman problem[J]. Operations Research, 1956, 4(1):61-75.

WONG R T. Combinatorial optimization:algorithms and complexity(christos h. papadimitriou and kenneth steiglitz)[J]. SIAM Review, 1983, 25(3):424-425.

HELD M, KARP R M. A dynamic programming approach to sequencing problems[J]. Journal of the Society for Industrial and Applied Mathematics, 1962, 10(1):196-210.

DANTZIG G, FULKERSON R, JOHNSON S. Solution of a large-scale traveling-salesman problem[J]. Journal of the Operations Research Society of America, 1954, 2(4):393-410.

PERDOMO-ORTIZ A, DICKSON N, DREW-BROOK M, et al. Finding low-energy conformations of lattice protein models by quantum annealing[J]. Scientific Reports, 2012, 2(1):1-7.

AOUNI B, COLAPINTO C, LA TORRE D. Financial portfolio management through the goal programming model:current state-of-the-art[J]. European Journal of Operational Research, 2014, 234(2):536-545.

BARAHONA F. On the computational complexity of Ising spin glass models[J]. Journal of Physics A:Mathematical and General,1982, 15(10):3241-3253.

JONES N D. Space-bounded reducibility among combinatorial problems[J]. Journal of Computer and System Sciences, 1975,11(1):68-85.

PAPA D A, MARKOV I L. Hypergraph partitioning and clustering[J]. Handbook of Approximation Algorithms and Metaheuristics,2007, 20073547:1-15.

GOEMANS M X, WILLIAMSON D P. Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming[J]. Journal of the ACM, 1995, 42(6):1115-1145.

WU Y, LI W K, GOH M, et al. Three-dimensional bin packing problem with variable bin height[J]. European Journal of Operational Research, 2010, 202(2):347-355.

VARNAMKHASTI M J. Overview of the algorithms for solving the multidimensional knapsack problems[J]. Advanced Studies in Biology, 2012, 4(1):37-47.

BLUM C, ROLI A. Metaheuristics in combinatorial optimization:overview and conceptual comparison[J]. ACM Computing Surveys, 2003, 35(3):268-308.

MARTELLO S, TOTH P. Lower bounds and reduction procedures for the bin packing problem[J]. Discrete Applied Mathematics,1990, 28(1):59-70.

FEO T A, RESENDE M G C, SMITH S H. A greedy randomized adaptive search procedure for maximum independent set[J].Operations Research, 1994, 42(5):860-878.

GARDINER E J, WILLETT P, ARTYMIUK P J. Graph-theoretic techniques for macromolecular docking[J]. Journal of Chemical Information and Computer Sciences, 2000, 40(2):273-279.

AGRAWAL R, MANNILA H, SRIKANT R, et al. Fast discovery of association rules[J]. Advances in Knowledge Discovery and Data Mining, 1996, 12(1):307-328.

TARJAN R E, TROJANOWSKI A E. Finding a maximum independent set[J]. SIAM Journal on Computing, 1977, 6(3):537-546.

XIAO M, NAGAMOCHI H. Exact algorithms for maximum independent set[J]. Information and Computation, 2017, 255:126-146.

LANCIA G, BAFNA V, ISTRAIL S, et al. SNPs problems, complexity, and algorithms[C]//European Symposium on Algorithms.Berlin:Springer, 2001.

FILIOL E, FRANC E, GUBBIOLI A, et al. Combinatorial optimisation of worm propagation on an unknown network[J]. International Journal of Computer Science, 2007, 2(2):124-130.

闫兴篡,殷建平,蔡志平,等.求图的最小顶点覆盖集的一个近似算法[J].哈尔滨工业大学学报, 2008, 40(7):1131-1135.

HOCHREITER S, SCHMIDHUBER J. Long short-term memory[J]. Neural Computation, 1997, 9(8):1735-1780.

DEY R, SALEM F M. Gate-variants of gated recurrent unit(GRU)neural networks[C]//2017 IEEE 60th International Midwest Symposium on Circuits and Systems(MWSCAS). Boston:IEEE, 2017.

SUBAKAN C, RAVANELLI M, CORNELL S, et al. Attention is all you need in speech separation[C]//2021 IEEE International Conference on Acoustics, Speech and Signal Processing(ICASSP). Online:IEEE, 2021.

OUYANG W, WANG Y, HAN S, et al. Improving generalization of deep reinforcement learning-based TSP solvers[C]//2021 IEEE Symposium Series on Computational Intelligence(SSCI). Online:IEEE, 2021.

FEI H, JI D, LI B, et al. Rethinking boundaries:end-to-end recognition of discontinuous mentions with pointer networks[C]//Proceedings of the AAAI Conference on Artificial Intelligence. Online:AAAI, 2021.

WU F, SOUZA A, ZHANG T, et al. Simplifying graph convolutional networks[C]//International Conference on Machine Learning.Long Beach:PMLR, 2019.

XIE Y, ZHANG Y, GONG M, et al. Mgat:multi-view graph attention networks[J]. Neural Networks, 2020, 132:180-189.

PENG Y, LIN Y, JING X Y, et al. Enhanced graph isomorphism network for molecular admet properties prediction[J]. IEEE Access, 2020, 8:168344-168360.

DAI H, DAI B, SONG L. Discriminative embeddings of latent variable models for structured data[C]//International Conference on Machine Learning. New York:PMLR, 2016.

SILVER D, HUANG A, MADDISON C J, et al. Mastering the game of Go with deep neural networks and tree search[J]. Nature,2016, 529(7587):484-489.

SCHRITTWIESER J, ANTONOGLOU I, HUBERT T, et al. Mastering Atari, Go, chess and shogi by planning with a learned model[J]. Nature, 2020, 588(7839):604-609.

MNIH V, BADIA A P, MIRZA M, et al. Asynchronous methods for deep reinforcement learning[C]//International Conference on Machine Learning. New York:PMLR, 2016.

BELLMAN R. On the theory of dynamic programming[J]. Proceedings of the National Academy of Sciences, 1952, 38(8):716-719.

WATKINS C J C H, DAYAN P. Q-learning[J]. Machine Learning, 1992, 8(3):279-292.

MNIH V, KAVUKCUOGLU K, SILVER D, et al. Human-level control through deep reinforcement learning[J]. Nature, 2015,518(7540):529-533.

SUTTON R S. Learning to predict by the methods of temporal differences[J]. Machine Learning, 1988, 3(1):9-44.

GAO G, JIN R. An end-to-end flow control method based on DQN[C]//2022 International Conference on Big Data, Information and Computer Network(BDICN). Sanya:IEEE, 2022.

PETERS J, VIJAYAKUMAR S, SCHAAL S. Reinforcement learning for humanoid robotics[C]//Proceedings of the Third IEEERAS International Conference on Humanoid Robots. Karlsruhe:IEEE, 2003.

WILLIAMS R J. Simple statistical gradient-following algorithms for connectionist reinforcement learning[J]. Machine Learning,1992, 8(3):229-256.

WANG Y, HE H, TAN X. Truly proximal policy optimization[C]//Uncertainty in Artificial Intelligence. Online:PMLR, 2020.

BROWNE C B, POWLEY E, WHITEHOUSE D, et al. A survey of Monte Carlo tree search methods[J]. IEEE Transactions on Computational Intelligence and AI in Games, 2012, 4(1):1-43.

BELLO I, PHAM H, LE Q V, et al. Neural combinatorial optimization with reinforcement learning[C]//Proceedings of the 5th International Conference on Learning Representations(ICLR 2017). Toulon:ICLR, 2017.

DAI H J, KHALIL E B, ZHANG Y Y, et al. Learning combinatorial optimization algorithms over graphs[C]//Proceedings of the31st International Conference on Neural Information Processing Systems. Long Beach:Curran Associates Inc., 2017.

NAZARI M, OROOJLOOY A, TAKAC M, et al. Reinforcement learning for solving the vehicle routing problem[C]//Proceedings of the 32nd International Conference on Neural Information Processing Systems. Montreal:Curran Associates Inc., 2018.

DEUDON M, COURNUT P, LACOSTE A, et al. Learning heuristics for the TSP by policy gradient[C]//International Conference on the Integration of Constraint Programming, Artificial Intelligence, and Operations Research. Cham:Springer, 2018.

KOOL W, VAN HOOF H, WELLING M. Attention, learn to solve routing problems![C]//Proceedings of the 7th International Conference on Learning Representations(ICLR 2019). New Orleans:ICLR, 2019.

LIU F, ZENG G. Study of genetic algorithm with reinforcement learning to solve the TSP[J]. Expert Systems with Applications,2009, 36(3):6995-7001.

CAPPART Q, MOISAN T, ROUSSEAU L M, et al. Combining reinforcement learning and constraint programming for combinatorial optimization[C]//Proceedings of the AAAI Conference on Artificial Intelligence. Online:AAAI, 2021.

DRORI I, KHARKAR A, SICKINGER W R, et al. Learning to solve combinatorial optimization problems on real-world graphs in linear time[C]//19th IEEE International Conference on Machine Learning and Applications(ICMLA). Online:IEEE, 2020.

XU Y, FANG M, CHEN L, et al. Reinforcement learning with multiple relational attention for solving vehicle routing problems[J].IEEE Transactions on Cybernetics, 2021, 52(10):11107-11120.

ZHANG R, PROKHORCHUK A, DAUWELS J. Deep reinforcement learning for traveling salesman problem with time windows and rejections[C]//International Joint Conference on Neural Networks(IJCNN). Online:IEEE, 2020.

BARRETT T, CLEMENTS W, FOERSTER J, et al. Exploratory combinatorial optimization with reinforcement learning[C]//Proceedings of the AAAI Conference on Artificial Intelligence. Online:AAAI, 2020.

CAPPART Q, GOUTIERRE E, BERGMAN D, et al. Improving optimization bounds using machine learning:decision diagrams meet deep reinforcement learning[C]//Proceedings of the AAAI Conference on Artificial Intelligence. Honolulu:AAAI, 2019.

TANG Y, AGRAWAL S, FAENZA Y. Reinforcement learning for integer programming:learning to cut[C]//International Conference on Machine Learning. Online:PMLR, 2020.

GU S, YANG Y. A deep learning algorithm for the max-cut problem based on pointer network structure with supervised learning and reinforcement learning strategies[J]. Mathematics, 2020, 8(2):298.

ZHAO H, SHE Q, ZHU C, et al. Online 3D bin packing with constrained deep reinforcement learning[C]//Proceedings of the AAAI Conference on Artificial Intelligence. Online:AAAI, 2021.

DUAN L, HU H, QIAN Y, et al. A multi-task selected learning approach for solving 3D flexible bin packing problem[C]//Proceedings of the 18th International Conference on Autonomous Agents and Multi-Agent Systems. Montreal:IFAAMAS, 2019.

JIANG Y, CAO Z, ZHANG J. Solving 3D bin packing problem via multimodal deep reinforcement learning[C]//Proceedings of the 20th International Conference on Autonomous Agents and Multi-Agent Systems. Auckland:IFAAMAS, 2021.

SONG J, LANKA R, YUE Y, et al. Co-training for policy learning[C]//Uncertainty in Artificial Intelligence. Toronto:PMLR,2020.

ALIPOUR M M, ABDOLHOSSEINZADEH M. A multiagent reinforcement learning algorithm to solve the maximum independent set problem[J]. Multiagent and Grid Systems, 2020, 16(1):101-115.

Views

1168

下载量

CSCD

Alert me when the article has been cited

提交

Tools

Publicity Resources

具有互补特征学习框架和注意力特征融合模块的语音情感识别模型

Related Author

黄佩瑶

程慧慧

唐小煜

Related Institution

华南师范大学工学部电子与信息工程学院

华南师范大学物理学院

⁰