船舶行业数据增长十分迅速,深度挖掘蕴含在大数据中的相关信息,可以有效加强船舶运营的精准化、高效化管理。本文提出一种基于负载平衡的并行FP-growth数据挖掘算法(BPFP-growth)。该算法通过赋予项目TID的方式,对项集树的存储方式进行了改进,基于镜像重构与负载因子完成数据的并行分组,在各自并行分区节点完成相应分组子集的频繁项集的挖掘,通过并集完成全部频繁项集的求解。实验表明,该算法具有较好的可并行性和可扩展性,能够有效实现船舶管理、资源配置等数据的挖掘,进行精准管理,优化资源配置,促进船舶行业高质量发展。
The ship industry data is growing very rapidly, and deep mining of relevant information contained in big data can effectively strengthen the precise and efficient management of ship operation. This paper proposed a parallel FP-growth algorithm based on load balancing (BPFP-growth). The algorithm improved the storage mode of item sets tree by giving the item TID, then it completed the parallel grouping of data based on image reconstruction and load factor, and completed the mining of frequent item sets of corresponding grouping subsets at each parallel partition node, finally finished the solution of global frequent item sets by combining sets. The experimental results show that the algorithm has good parallelism and expansibility, it can effectively realize data mining such as ship management and resource allocation, so as to carry out precise management, optimize resource allocation, and promote the hign-quality development of the shipping industry.
2019,41(11): 184-187 收稿日期:2019-08-29
DOI:10.3404/j.issn.1672-7649.2019.11.039
分类号:TP311
基金项目:江苏省高校自然科学研究资助项目(18KJB413009);无锡市科技发展创新基金(WX18IBG624)
作者简介:尚弘(1977-),女,硕士研究生,讲师,研究方向为物联网,数据分析
参考文献:
[1] 陈超. 多尺度关联规则挖掘理论与方法[D]. 石家庄:河北师范大学, 2017.
[2] 胡晓轩. 基于数据挖掘的船舶安全管理系统[D]. 上海:上海交通大学, 2015.
[3] 孙斌. 基于Apriori算法的船舶碰撞事故致因分析[D]. 大连:大连海事大学, 2016.
[4] 顾洵瑜, 胡甚平, 吴建军, 等. 基于FP-tree算法的船舶滞留原因关联性分析[J]. 上海海事大学学报, 2015, 36(2):60-64
[5] ZHANG Feng, LIU Min, GUI Feng, et al. A distributed frequent itemset mining algorithm using Spark for Big Data analytics[J]. Cluster Computing, 2015, 18(4):1493-1501
[6] 章志刚, 吉根林. 一种基于FP-Growth的频繁项目集并行挖掘算法[J]. 计算机工程与应用, 2014, 50(2):103-106
[7] 龙马高新教育. Python3数据分析与机器学习实战[M]. 北京. 北京大学出版社, 2018.
[8] 韩家炜, Micheline Kamber, 裴健. 数据挖掘概念与技术[M]. 北京. 机械工业出版社, 2012.