[1] Liu L, Pu C. XWRAP: an XML 2 enable wrapper constructionsystem for the Web information source [C] //Proceedingsof the 16th IEEE International Conference onData Engineering, 2000: 611-620.
[2] Ma Ling, Goharian N, Chowdhury A, et al. Extracting unstructured data from template generated Web documents [C] //In: Proceedings of the 12th International Conference on Information and Knowledge anagement, 2003: 512-515.
[3] Mei Xue, Cheng Xueqi, Guo Yan, et al. Fully automatic Wrapper generation for web information extraction [J]. Journal of Chinese Information Processing, 2008, 22(1): 22-29(in Chinese).
梅雪, 程学旗, 郭岩, 等. 一种全自动生成网页信息抽取Wrapper的方法 [J]. 中文信息学报, 2008, 22(1): 22-29.
[4] Sun Chengjie, Guan yi. A statistical approach for content extraction from web page [J]. Journal of Chinese Information Processing, 2004, 18(5): 17-22(in Chinese).
孙承杰, 关毅. 基于统计的网页正文信息抽取方法的研究 [J]. 中文信息学报, 2004, 18(5): 17-22.
[5] Sun Hao, Dong Shoubin. Adaptive approach for content extraction based on tag density [J]. Journal of Zhengzhou University, 2009, 41(1): 44-47(in Chinese).
孙皓, 董守斌. 基于标签密度的自适应正文提取方法 [J]. 郑州大学学报, 2009, 41(1): 44-47.
[6] An Zengwen, Wang Chao, Xu Jiefeng. An approach based on machine learning for information extraction method [J]. Microcomputer & Its Applications, 2010(12): 4-6(in Chinese).
安增文, 王超, 徐杰锋. 基于机器学习的网页正文提取方法 [J]. 微型机与应用, 2010(12): 4-6.
[7] You Guirong, Lu Yuchang. Extraction of topical information from Chinese web page based on the statistic and machine learning [J]. Journal of Fujian Commercial College, 2009, 4(2): 68-72(in Chinese).
游贵荣, 陆玉昌. 基于统计和机器学习的中文Web网页正文内容抽取 [J]. 福建商业高等专科学校学报, 2009, 4(2): 68-72. |