

浏览全部资源
扫码关注微信
新疆大学电气工程学院
Published:2023
移动端阅览
[1]张宏立,朱家政,董颖超.强化学习求解组合优化问题的研究综述[J].新疆大学学报(自然科学版)(中英文),2023,40(02):129-141.
[1]张宏立,朱家政,董颖超.强化学习求解组合优化问题的研究综述[J].新疆大学学报(自然科学版)(中英文),2023,40(02):129-141. DOI: 10.13568/j.cnki.651094.651316.2023.02.02.0001.
DOI:10.13568/j.cnki.651094.651316.2023.02.02.0001.
组合优化问题广泛的存在于生产实践的各个领域,解决组合优化问题的主要手段通常包括使用由领域专家人工设计的启发式算法以及设计成熟的求解器,按照一定顺序构建一个解决方案.而随着实际问题复杂度逐渐的增加,这类方法无法于在线求解方面取得很好的效果,得到的结果可能往往是次优的.而强化学习给出了一个很好的替代方案,通过对智能体模型的良好训练,迅速地对此类问题进行求解.故回顾了近年来将强化学习框架应用于组合优化问题的研究,对其基本原理、相关方法、应用研究进行总结和综述,并指出未来该方向亟待解决的若干问题.
Combinatorial optimization problems are widespread in all areas of production practice
and the main means of solving combinatorial optimization problems usually include the use of heuristic algorithms manually designed by domain experts and the design of sophisticated solvers to construct a solution in a certain order. As the complexity of the actual problem gradually increases
such methods can not achieve good results in online solving
and the results obtained may often be suboptimal. And reinforcement learning gives a good alternative to solve such problems rapidly by well-training an intelligent body model. Therefore
this paper reviews the recent research on applying the reinforcement learning framework to combinatorial optimization problems
summarizes and reviews its basic principles
related methods and application studies
and points out several problems that need to be addressed in this direction in the future.
BELLMAN R. A Markovian decision process[J]. Journal of Mathematics and Mechanics, 1957, 6:679-684.
章炯民,陶增乐,吴文娟.组合优化问题的神经网络解――装箱问题和背包问题的求解[J].华东师范大学学报(自然科学版),1998, 4:102-105.
ZHANG C, SONG W, CAO Z, et al. Learning to dispatch for job shop scheduling via deep reinforcement learning[J]. Advances in Neural Information Processing Systems, 2020, 33:1621-1632.
ZHUANG W W, CHEN C, QIU G X. A new deep reinforcement learning model for dynamic portfolio optimization[J]. Journal of University of Science and Technology of China, 2022, 52(11):15-28.
RUSKONE-FOURMESTRAUX A, DELMER A, LAVERGNE A, et al. Multiple lymphomatous polyposis of the gastrointestinal tract:prospective clinicopathologic study of 31 cases[J]. Gastroenterology, 1997, 112(1):7-16.
MANUAL C U. IBM ILOG CPLEX optimization studio[J]. Version, 1987, 12:1-586.
MAHER S, MILTENBERGER M, PEDROSO J P, et al. PySCIPOpt:mathematical programming in python with the SCIP optimization suite[C]//Mathematical Software-ICMS 2016:5th International Conference. Berlin:Springer, 2016.
BELLM E C, KULKARNI S R, BARLOW T, et al. The zwicky transient facility:surveys and scheduler[J]. Publications of the Astronomical Society of the Pacific, 2019, 131(1000):068003.
GRAY M A. Sage:a new mathematics software system[J]. Computing in Science and Engineering, 2008, 10(6):72-75.
DANTZIG G B. Linear programming[J]. Operations Research, 2002, 50(1):42-47.
BALAS E. Projection, lifting and extended formulation in integer and combinatorial optimization[J]. Annals of Operations Research, 2005, 140(1):125.
FLOOD M M. The traveling-salesman problem[J]. Operations Research, 1956, 4(1):61-75.
WONG R T. Combinatorial optimization:algorithms and complexity(christos h. papadimitriou and kenneth steiglitz)[J]. SIAM Review, 1983, 25(3):424-425.
HELD M, KARP R M. A dynamic programming approach to sequencing problems[J]. Journal of the Society for Industrial and Applied Mathematics, 1962, 10(1):196-210.
DANTZIG G, FULKERSON R, JOHNSON S. Solution of a large-scale traveling-salesman problem[J]. Journal of the Operations Research Society of America, 1954, 2(4):393-410.
PERDOMO-ORTIZ A, DICKSON N, DREW-BROOK M, et al. Finding low-energy conformations of lattice protein models by quantum annealing[J]. Scientific Reports, 2012, 2(1):1-7.
AOUNI B, COLAPINTO C, LA TORRE D. Financial portfolio management through the goal programming model:current state-of-the-art[J]. European Journal of Operational Research, 2014, 234(2):536-545.
BARAHONA F. On the computational complexity of Ising spin glass models[J]. Journal of Physics A:Mathematical and General,1982, 15(10):3241-3253.
JONES N D. Space-bounded reducibility among combinatorial problems[J]. Journal of Computer and System Sciences, 1975,11(1):68-85.
PAPA D A, MARKOV I L. Hypergraph partitioning and clustering[J]. Handbook of Approximation Algorithms and Metaheuristics,2007, 20073547:1-15.
GOEMANS M X, WILLIAMSON D P. Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming[J]. Journal of the ACM, 1995, 42(6):1115-1145.
WU Y, LI W K, GOH M, et al. Three-dimensional bin packing problem with variable bin height[J]. European Journal of Operational Research, 2010, 202(2):347-355.
VARNAMKHASTI M J. Overview of the algorithms for solving the multidimensional knapsack problems[J]. Advanced Studies in Biology, 2012, 4(1):37-47.
BLUM C, ROLI A. Metaheuristics in combinatorial optimization:overview and conceptual comparison[J]. ACM Computing Surveys, 2003, 35(3):268-308.
MARTELLO S, TOTH P. Lower bounds and reduction procedures for the bin packing problem[J]. Discrete Applied Mathematics,1990, 28(1):59-70.
FEO T A, RESENDE M G C, SMITH S H. A greedy randomized adaptive search procedure for maximum independent set[J].Operations Research, 1994, 42(5):860-878.
GARDINER E J, WILLETT P, ARTYMIUK P J. Graph-theoretic techniques for macromolecular docking[J]. Journal of Chemical Information and Computer Sciences, 2000, 40(2):273-279.
AGRAWAL R, MANNILA H, SRIKANT R, et al. Fast discovery of association rules[J]. Advances in Knowledge Discovery and Data Mining, 1996, 12(1):307-328.
TARJAN R E, TROJANOWSKI A E. Finding a maximum independent set[J]. SIAM Journal on Computing, 1977, 6(3):537-546.
XIAO M, NAGAMOCHI H. Exact algorithms for maximum independent set[J]. Information and Computation, 2017, 255:126-146.
LANCIA G, BAFNA V, ISTRAIL S, et al. SNPs problems, complexity, and algorithms[C]//European Symposium on Algorithms.Berlin:Springer, 2001.
FILIOL E, FRANC E, GUBBIOLI A, et al. Combinatorial optimisation of worm propagation on an unknown network[J]. International Journal of Computer Science, 2007, 2(2):124-130.
闫兴篡,殷建平,蔡志平,等.求图的最小顶点覆盖集的一个近似算法[J].哈尔滨工业大学学报, 2008, 40(7):1131-1135.
HOCHREITER S, SCHMIDHUBER J. Long short-term memory[J]. Neural Computation, 1997, 9(8):1735-1780.
DEY R, SALEM F M. Gate-variants of gated recurrent unit(GRU)neural networks[C]//2017 IEEE 60th International Midwest Symposium on Circuits and Systems(MWSCAS). Boston:IEEE, 2017.
SUBAKAN C, RAVANELLI M, CORNELL S, et al. Attention is all you need in speech separation[C]//2021 IEEE International Conference on Acoustics, Speech and Signal Processing(ICASSP). Online:IEEE, 2021.
OUYANG W, WANG Y, HAN S, et al. Improving generalization of deep reinforcement learning-based TSP solvers[C]//2021 IEEE Symposium Series on Computational Intelligence(SSCI). Online:IEEE, 2021.
FEI H, JI D, LI B, et al. Rethinking boundaries:end-to-end recognition of discontinuous mentions with pointer networks[C]//Proceedings of the AAAI Conference on Artificial Intelligence. Online:AAAI, 2021.
WU F, SOUZA A, ZHANG T, et al. Simplifying graph convolutional networks[C]//International Conference on Machine Learning.Long Beach:PMLR, 2019.
XIE Y, ZHANG Y, GONG M, et al. Mgat:multi-view graph attention networks[J]. Neural Networks, 2020, 132:180-189.
PENG Y, LIN Y, JING X Y, et al. Enhanced graph isomorphism network for molecular admet properties prediction[J]. IEEE Access, 2020, 8:168344-168360.
DAI H, DAI B, SONG L. Discriminative embeddings of latent variable models for structured data[C]//International Conference on Machine Learning. New York:PMLR, 2016.
SILVER D, HUANG A, MADDISON C J, et al. Mastering the game of Go with deep neural networks and tree search[J]. Nature,2016, 529(7587):484-489.
SCHRITTWIESER J, ANTONOGLOU I, HUBERT T, et al. Mastering Atari, Go, chess and shogi by planning with a learned model[J]. Nature, 2020, 588(7839):604-609.
MNIH V, BADIA A P, MIRZA M, et al. Asynchronous methods for deep reinforcement learning[C]//International Conference on Machine Learning. New York:PMLR, 2016.
BELLMAN R. On the theory of dynamic programming[J]. Proceedings of the National Academy of Sciences, 1952, 38(8):716-719.
WATKINS C J C H, DAYAN P. Q-learning[J]. Machine Learning, 1992, 8(3):279-292.
MNIH V, KAVUKCUOGLU K, SILVER D, et al. Human-level control through deep reinforcement learning[J]. Nature, 2015,518(7540):529-533.
SUTTON R S. Learning to predict by the methods of temporal differences[J]. Machine Learning, 1988, 3(1):9-44.
GAO G, JIN R. An end-to-end flow control method based on DQN[C]//2022 International Conference on Big Data, Information and Computer Network(BDICN). Sanya:IEEE, 2022.
PETERS J, VIJAYAKUMAR S, SCHAAL S. Reinforcement learning for humanoid robotics[C]//Proceedings of the Third IEEERAS International Conference on Humanoid Robots. Karlsruhe:IEEE, 2003.
WILLIAMS R J. Simple statistical gradient-following algorithms for connectionist reinforcement learning[J]. Machine Learning,1992, 8(3):229-256.
WANG Y, HE H, TAN X. Truly proximal policy optimization[C]//Uncertainty in Artificial Intelligence. Online:PMLR, 2020.
BROWNE C B, POWLEY E, WHITEHOUSE D, et al. A survey of Monte Carlo tree search methods[J]. IEEE Transactions on Computational Intelligence and AI in Games, 2012, 4(1):1-43.
BELLO I, PHAM H, LE Q V, et al. Neural combinatorial optimization with reinforcement learning[C]//Proceedings of the 5th International Conference on Learning Representations(ICLR 2017). Toulon:ICLR, 2017.
DAI H J, KHALIL E B, ZHANG Y Y, et al. Learning combinatorial optimization algorithms over graphs[C]//Proceedings of the31st International Conference on Neural Information Processing Systems. Long Beach:Curran Associates Inc., 2017.
NAZARI M, OROOJLOOY A, TAKAC M, et al. Reinforcement learning for solving the vehicle routing problem[C]//Proceedings of the 32nd International Conference on Neural Information Processing Systems. Montreal:Curran Associates Inc., 2018.
DEUDON M, COURNUT P, LACOSTE A, et al. Learning heuristics for the TSP by policy gradient[C]//International Conference on the Integration of Constraint Programming, Artificial Intelligence, and Operations Research. Cham:Springer, 2018.
KOOL W, VAN HOOF H, WELLING M. Attention, learn to solve routing problems![C]//Proceedings of the 7th International Conference on Learning Representations(ICLR 2019). New Orleans:ICLR, 2019.
LIU F, ZENG G. Study of genetic algorithm with reinforcement learning to solve the TSP[J]. Expert Systems with Applications,2009, 36(3):6995-7001.
CAPPART Q, MOISAN T, ROUSSEAU L M, et al. Combining reinforcement learning and constraint programming for combinatorial optimization[C]//Proceedings of the AAAI Conference on Artificial Intelligence. Online:AAAI, 2021.
DRORI I, KHARKAR A, SICKINGER W R, et al. Learning to solve combinatorial optimization problems on real-world graphs in linear time[C]//19th IEEE International Conference on Machine Learning and Applications(ICMLA). Online:IEEE, 2020.
XU Y, FANG M, CHEN L, et al. Reinforcement learning with multiple relational attention for solving vehicle routing problems[J].IEEE Transactions on Cybernetics, 2021, 52(10):11107-11120.
ZHANG R, PROKHORCHUK A, DAUWELS J. Deep reinforcement learning for traveling salesman problem with time windows and rejections[C]//International Joint Conference on Neural Networks(IJCNN). Online:IEEE, 2020.
BARRETT T, CLEMENTS W, FOERSTER J, et al. Exploratory combinatorial optimization with reinforcement learning[C]//Proceedings of the AAAI Conference on Artificial Intelligence. Online:AAAI, 2020.
CAPPART Q, GOUTIERRE E, BERGMAN D, et al. Improving optimization bounds using machine learning:decision diagrams meet deep reinforcement learning[C]//Proceedings of the AAAI Conference on Artificial Intelligence. Honolulu:AAAI, 2019.
TANG Y, AGRAWAL S, FAENZA Y. Reinforcement learning for integer programming:learning to cut[C]//International Conference on Machine Learning. Online:PMLR, 2020.
GU S, YANG Y. A deep learning algorithm for the max-cut problem based on pointer network structure with supervised learning and reinforcement learning strategies[J]. Mathematics, 2020, 8(2):298.
ZHAO H, SHE Q, ZHU C, et al. Online 3D bin packing with constrained deep reinforcement learning[C]//Proceedings of the AAAI Conference on Artificial Intelligence. Online:AAAI, 2021.
DUAN L, HU H, QIAN Y, et al. A multi-task selected learning approach for solving 3D flexible bin packing problem[C]//Proceedings of the 18th International Conference on Autonomous Agents and Multi-Agent Systems. Montreal:IFAAMAS, 2019.
JIANG Y, CAO Z, ZHANG J. Solving 3D bin packing problem via multimodal deep reinforcement learning[C]//Proceedings of the 20th International Conference on Autonomous Agents and Multi-Agent Systems. Auckland:IFAAMAS, 2021.
SONG J, LANKA R, YUE Y, et al. Co-training for policy learning[C]//Uncertainty in Artificial Intelligence. Toronto:PMLR,2020.
ALIPOUR M M, ABDOLHOSSEINZADEH M. A multiagent reinforcement learning algorithm to solve the maximum independent set problem[J]. Multiagent and Grid Systems, 2020, 16(1):101-115.
0
Views
1168
下载量
0
CSCD
Publicity Resources
Related Articles
Related Author
Related Institution
京公网安备11010802024621