针对船舶制造海量数据关联规则挖掘过程中,事务集占用空间过多导致挖掘效率较低的问题,提出一种基于局部敏感位图存储结构(locally sensitive hash bitmap,LBM)的LBM-Eclat算法。该算法结合了局部敏感哈希和位图2种数据结构,并可以根据存储数据量的变化动态调整内部数据存储结构。通过对比实验证明基于LBM的LBM-Eclat算法能够有效提升对密集型数据集的挖掘效率,同时减少挖掘过程中的空间消耗。
In order to solve the problem of low mining efficiency caused by too much space occupied by transaction sets in the process of mining association rules of shipbuilding massive data, the Eclat algorithm is used to convert the merging of transactions into set operations using vertical databases. A LBM-Eclat algorithm based on locally sensitive hash bitmap (LBM) is proposed. LBM Eclat combines two data structures, local sensitive hash and bitmap, and can dynamically adjust the internal data storage structure according to the changes of the amount of stored data. Through comparative experiments, it is proved that LBM-Eclat algorithm based on LBM can effectively improve the mining efficiency of dense data sets and reduce the space consumption in the mining process.
2022,44(20): 143-148 收稿日期:2022-08-05
DOI:10.3404/j.issn.1672-7649.2022.20.029
分类号:TP311
作者简介:徐鹏(1983-),男,博士,研究员,研究方向为智能信息处理
参考文献:
[1] 卞德志, 胡昌平, 杨哲, 等. 面向船舶制造的统一数据库集成平台应用研究[J]. 舰船科学技术, 2020, 42(13): 134–138
BIAN De-zhi, HU Chang-ping, YANG Zhe, et al. Research on application of unified database integration platform to shipbuilding enterprises[J]. Ship Science and Technology, 2020, 42(13): 134–138
[2] AGRAWAL R. Mining association rules between sets of items in large databases[J]. ACM SIGMOD Record, 1993, 22(2).
[3] ZAKI M J. Scalable algorithms for association mining[J]. IEEE Transattions on Knowledge and Data Engineenig, 2000, 12(3): 372–390
[4] 崔馨月, 孙静宇. 改进的Eclat算法研究与应用[J]. 计算机工程与设计, 2018, 39(4): 1059–1063+1147
[5] KAUR M, GARG U, KAUR S. Advanced eclat algorithm for frequent itemsets generation[J]. International Journal of Applied Engineering Research, 2015, 10(9): 23263–23279
[6] BAKAR W A , MAN M, et al. I-Eclat: Performance enhancement of Eclat via incremental approach in frequent itemset mining[J]. Telkomnika, 2020, 18(1): 562–570
[7] MAN M, JULAILY A J, SAANY S I A, et al. Analysis study on R-Eclat algorithm in infrequent itemsets mining[J]. International Journal of Electrical and Computer Engineering, 2019, 9(6): 5446
[8] ABBASI S, MOIENI A. BloomEclat: efficient eclat algorithm based on bloom filter[J]. Journal of Algorithms and Computation, 2021, 53(1): 197–208
[9] MURALIDHARAN C, ANITHA R. Risk analysis of cloud service providers by analyzing the frequency of occurrence of problems using E-Eclat algorithm[J]. Wireless Networks, 2021, 27(8): 5587–5595
[10] 高强, 张凤荔, 陈学勤, 等. 基于改进Eclat算法的资源池节点异常模式挖掘[J]. 计算机应用研究, 2018, 35(2): 6
[11] YU X, WANG H. Improvement of eclat algorithm based on support in frequent itemset mining[C]// The 6th International Conference on Computer Research and Development, 2014, 9(9): 2116–2123.
[12] MA Z, YANG J, ZHANG T, et al. An improved eclat algorithm for mining association rules based on increased search strategy[J]. International Journal of Database Theory and Application, 2016, 9(5): 251–266
[13] 田攀博. 基于等价类变换的快速关联规则挖掘方法研究[D]. 哈尔滨: 哈尔滨工业大学, 2019.
[14] LV S, HUANG J. An improved eclat algorithm based on pruning optimization and indexing intersection[J]. 2018.
[15] 李成严, 辛雪, 赵帅, 等. Sp-IEclat: 一种大数据并行关联规则挖掘算法[J]. 哈尔滨理工大学学报, 2021, 26(4): 109–118
[16] ONDOV B D, TREANGEN T J, MELSTED P, et al. Mash: fast genome and metagenome distance estimation using MinHash[J]. Genome biology, 2016, 17(1): 1–14
[17] WU K, OTOO E J, SHOSHANI A. Optimizing bitmap indices with efficient compression[J]. ACM Transactions on Database Systems (TODS), 2006, 31(1): 1–38