1. 新疆大学软件学院
2. 新疆大学信息科学与工程学院
纸质出版:2017
移动端阅览
[1]卞琛,于炯,修位蓉.基于回归检测的滑动块重复数据删除算法[J],2017,34(03):259-266.
[1]卞琛,于炯,修位蓉.基于回归检测的滑动块重复数据删除算法[J],2017,34(03):259-266. DOI: 10.13568/j.cnki.651094.2017.03.002.
DOI:10.13568/j.cnki.651094.2017.03.002.
随着大数据时代的来临
重复数据在存储系统中占有很高比例
如何在保障数据可用性的前提下提高存储系统利用率问题一直是研究人员关注的热点.重复数据删除技术是一种存储系统优化技术
通过比较数据指纹确定冗余并删除
达到保障数据唯一性的目的.在重复数据分块检测过程中
无匹配指纹的块即认定为新数据存储
然而通过研究发现
未匹配块中仍然包含大量的重复数据
若能检测到未匹配块内的重复数据
则能够在一定程度上进一步提高重复数据检测率.本文提出一种基于回归检测的滑动块重复数据删除算法
对传统滑动块技术产生的未匹配数据块进行回归检测
通过对比未匹配块的结构变化进而确定数据操作类型
再根据不同的操作类型执行不同的检测算法
达到去除未匹配块内重复数据的目的.实验表明:本算法在时间开销方面比较合理
并能够有效提高重复数据检测率.
Data reduction has become increasingly important in storage systems due to the explosive growth of digital data in the world that has ushered in the big data era. Data deduplication is a storage-optimization technique that reduces the data footprint by eliminating multiple copies of redundant data and storing only unique data. The basis of data deduplication is duplicate data detection techniques
which divide files into a number of parts
compare corresponding parts between files via Hash techniques and find out redundant data. An efficient sliding blocking algorithm with regression-checking for duplicate data detection(SBRC)is proposed. For matching-failed segments
our algorithm continues to detect duplicate data in unmatched blocks
thus improving the detection precision. Experimental results show that the proposed SBRC improves the duplicate detection precision compared with the traditional SB algorithm and content-defined chunking(CDC) algorithm and it does not lead to the unnecessary cost and complexity.
敖丽,舒继武,李明强.重复数据删除技术[J].软件学报,2010,21(5):916-929.
付印金,肖侬,刘芳.重复数据删除关键技术研究进展[J].计算机研究与发展,2012,49(1):12-20.
Min J,Yoon D,Won Y.Efficient deduplication techniques for modern backup operation[J].IEEE Transactions on Computers,2011,60(6):824-840.
Harnik D,Pinkas B,Peleg A S.Side channels in cloud services,the case of deduplication in cloud storage[J].IEEE Security&Privacy,2011,8(6),40–47.
Litwin W,Long D,Schwarz T.Combining chunk boundary and chunk signature calculations for deduplication[J].IEEE Latin America Transactions,2011,10(1),1305–1311.
Wang G B,Chen S Y,Lin M Y,et.al.SBBS:A sliding blocking algorithm with backtracking sub-blocks for duplicate data detection[J].Expert Systems with Applications,2014,41(5):2415–2423.
Yang T M,Jiang H,Feng D,et al.DEBAR:A scalable high-performance de-duplication storage system for backup and archiving[C].Proc of the IEEE IPDPS’10.Piscataway,NJ:IEEE,2010:1-12.
Policroniades C,Pratt I.Alternatives for detecting redundancy in storage systems data[C].Proc of the 2004 USENIX Annual Technical Conference.Berkeley:USENIX Association,2004:73-86.
Kruus E,Ungureanu C,Dubnicki C.Bimodal content defined chunking for backup streams[C].Proc of the USENIX FAST10.Berkeley,CA:USENIX,2010:239-252.
Bobbarjung D,Jagannathan S,Dubnicki C.Improving duplicate elimination in storage systems[J].ACM Trasactions on Storage,2006,2(4):424-448.
Carvalho M,Laender A,Goncalves M,et.al.A genetic programming approach to record deduplication[J].IEEE Transactions on Knowledge and Data Engineering,2012,24(3),399–412.
0
浏览量
101
下载量
0
CSCD
关联资源
相关文章
相关作者
相关机构
京公网安备11010802024621
