新疆大学数学与系统科学学院
纸质出版:2023
移动端阅览
[1]王雪纯.知识蒸馏正则化方法研究(英文)[J].新疆大学学报(自然科学版)(中英文),2023,40(05):534-542+549.
[1]王雪纯.知识蒸馏正则化方法研究(英文)[J].新疆大学学报(自然科学版)(中英文),2023,40(05):534-542+549. DOI: 10.13568/j.cnki.651094.651316.2023.02.26.0002.
DOI:10.13568/j.cnki.651094.651316.2023.02.26.0002.
在深度学习中
正则化是防止模型过拟合和提高模型泛化性能的重要工具.知识蒸馏(Knowledge Distillation
KD)是一组由一个模型生成的软标签作为监督信号去指导另一个模型的相对较新的
流行的正则化方法.首先
给出了KD正则化的基本知识并将现有的知识蒸馏正则化分为两大类
即正向蒸馏和互蒸馏.对每种类型
都详细介绍了关键的组成部分和代表性方法.其次
比较了这两大类正则化方法的优缺点并在图像分类上评估了模型的泛化性能.同时
也为特定的任务和场景选择合适的KD正则化方法提供了指南.最后
总结了KD正则化方法存在的关键性挑战并讨论了将来的研究方向.
In deep learning
regularization is extremely important as it prevents overfitting models and improves their generalization performances. A relatively new
yet increasingly popular type of regularization is knowledge distillation(KD)
a set of techniques for soft labels generated by one model as supervised signals to guide the training of another model. We first explain the fundamentals of KD regularization and then categorize KD regularization strategies into two different types
viz. forward distillation and mutual distillation. For each type
we discuss in detail its key components and representative methods. After comparing the pros and cons of KD regularization strategies and testing their performance on the common benchmark of image classification
we provide guidelines on how to choose appropriate KD regularization techniques for specific scenarios. Finally
we identify a number of key challenges and discuss future research directions of KD regularization.
MINAEE S, BOYKOV Y Y, PORIKLI F, et al. Image segmentation using deep learning:a survey[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 44(7):3523-3542.
OTTER D W, MEDINA J R, KALITA J K. A survey of the usages of deep learning for natural language processing[J]. IEEE Transactions on Neural Networks and Learning Systems, 2020, 32(2):604-624.
KROGH A, HERTZ J. A simple weight decay can improve generalization[J]. Advances in Neural Information Processing Systems, 1991, 4:950-957.
NAKAMURA K, HONG B W. Adaptive weight decay for deep neural networks[J]. IEEE Access, 2019, 7:118857-118865.
SRIVASTAVA N, HINTON G E, KRIZHEVSKY A, et al. Dropout:a simple way to prevent neural networks from overfitting[J]. The Journal of Machine Learning Research, 2014, 15(1):1929-1958.
SALIMANS T, KINGMA D P. Weight normalization:a simple reparameterization to accelerate training of deep neural networks[J]. Advances in Neural Information Processing Systems, 2016, 29:1-9.
WU Y X, HE K M. Group normalization[J]. International Journal of Computer Vision, 2020, 128(3):742-755.
TAKAHASHI R, MATSUBARA T, UEHARA K. Data augmentation using random image cropping and patching for deep CNNs[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2019, 30(9):2917-2931.
YUN S, HAN D, OH S J, et al. Cutmix:regularization strategy to train strong classifiers with localizable features[C]//2019 IEEE/CVF International Conference on Computer Vision. Seoual:IEEE, 2019.
ZHONG Z, ZHENG L, KANG G, et al. Random erasing data augmentation[C]//Proceedings of the AAAI Conference on Artificial Intelligence. New York:AAAI Press, 2020.
WANG J, BAO W, SUN L, et al. Private model compression via knowledge distillation[C]//Proceedings of the AAAI Conference on Artificial Intelligence. Honolulu:AAAI Press, 2019.
SUN S, CHENG Y, GAN Z, et al. Patient knowledge distillation for BERT model compression[C]//Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing(EMNLP-IJCNLP). Hong Kong:Association for Computational Linguistics, 2019.
NGUYEN-MEIDINE L T, BELAL A, KIRAN M, et al. Unsupervised multi-target domain adaptation through knowledge distillation[C]//2021 IEEE Winter Conference on Applications of Computer Vision(WACV). Waikoloa:IEEE, 2021.
BA L J, CARUANA R. Do deep nets really need to be deep?[J]. Advances in Neural Information Processing Systems, 2014, 27:2654-2662.
HINTON G, VINYALS O, DEAN J. Distilling the knowledge in a neural network[J]. Computer Science, 2015, 14177:38-39.
ZHENG Z, PENG X. Self-guidance:improve deep neural network generalization via knowledge distillation[C]//2022 IEEE/CVF Winter Conference on Applications of Computer Vision. Waikoloa:IEEE, 2022.
CHO J H, HARIHARAN B. On the efficacy of knowledge distillation[C]//2019 IEEE/CVF International Conference on Computer Vision(ICCV).Seoul:IEEE, 2019.
NAYAK G K, MOPURI K R, SHAJ V, et al. Zero-shot knowledge distillation in deep networks[C]//The 36th International Conference on Machine Learning. Long Beach:International Machine Learning Society(IMLS), 2019.
ZHANG L, SONG X, GAO A, et al. Be your own teacher:improve the performance of convolutional neural networks via self distillation[C]//2019IEEE/CVF International Conference on Computer Vision. Seoul:IEEE, 2019.
CHEN D, MEI J P, WANG C, et al. Online knowledge distillation with diverse peers[C]//The 34th AAAI Conference on Artificial Intelligence. New York:AAAI Press, 2020.
YUN S, PARK J, LEE K, et al. Regularizing class-wise predictions via self-knowledge distillation[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR). Seattle:IEEE, 2020.
MORADI R, BERANGI R, MINAEI B. A survey of regularization strategies for deep models[J]. Artificial Intelligence Review, 2020, 53(6):3947-3986.
SANYOS C F G, PAPA J P. Avoiding overfitting:a survey on regularization methods for convolutional neural networks[J]. ACM Computing Surveys(CSUR), 2022, 54(10s):1-25.
TANG A, QUAN P, NIU L F, et al. A survey for sparse regularization based compression methods[J]. Annals of Data Science, 2022, 9(4):695-722.
TIAN Y J, ZHANG Y Q. A comprehensive survey on regularization strategies in machine learning[J]. Information Fusion, 2022, 80:146-166.
PAN S J, YANG Q. A survey on transfer learning[J]. IEEE Transactions on Knowledge and Data Engineering, 2009, 22(10):1345-1359.
FURLANELLO T, LIPTON Z, TSCHANNEN M, et al. Born again neural networks[J]. Proceedings of Machine Learning Research, 2018, 80:1607-1616.
GHOSH R, MOTANI M. Network-to-network regularization:enforcing Occam’s razor to improve generalization[J]. Advances in Neural Information Processing Systems, 2021, 34:6341-6352.
YUAN L, TAY F E H, LI G, et al. Revisiting knowledge distillation via label smoothing regularization[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR). Seattle:IEEE, 2020.
TANG Z, WANG D, ZHANG Z. Recurrent neural network training with dark knowledge transfer[C]//2016 IEEE International Conference on Acoustics, Speech and Signal Processing(ICASSP). Shanghai:IEEE, 2016.
M¨ULLER R, KORNBLITH S, HINTON G E. When does label smoothing help?[J]. Advances in Neural Information Processing Systems, 2019, 32:1-10.
WANG J, ZHANG P, HE Q, et al. Revisiting label smoothing regularization with knowledge distillation[J]. Applied Sciences, 2021, 11(10):4699.
XU T B, LIU C L. Data-distortion guided self-distillation for deep neural networks[C]//Proceedings of the AAAI Conference on Artificial Intelligence.Honolulu:AAAI Press, 2019.
LUKMAN A, YANG C K. Improving deep mutual learning via knowledge distillation[J]. Applied Sciences, 2022, 12(15):7916.
AHN S, HU S X, DAMIANOU A, et al. Variational information distillation for knowledge transfer[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR). Long Beach:IEEE, 2019.
PARK W, KIM D, LU Y, et al. Relational knowledge distillation[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR).Long Beach:IEEE, 2019.
ZHANG Y, XIANG T, HOSPEDALES T M, et al. Deep mutual learning[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.Salt Lake:IEEE Computer Society, 2018.
YAO A, SUN D. Knowledge transfer via dense cross-layer mutual-distillation[C]//Computer Vision–ECCV 2020:16th European Conference. Glasgow:Springer Science and Business Media Deutschland GmbH, 2020.
GUO Q, WANG X, WU Y, et al. Online knowledge distillation via collaborative learning[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR). Seattle:IEEE, 2020.
GAO L, LAN X, MI H, et al. Multistructure-based collaborative online distillation[J]. Entropy, 2019, 21(4):357.
KRIZHEVSKY A, HINTON G. Learning multiple layers of features from tiny images[J]. Technical Report, University of Toronto, 2009, 1(4):7.
0
浏览量
45
下载量
0
CSCD
关联资源
相关文章
相关作者
相关机构
京公网安备11010802024621
