论文字数:20624,页数:41 有开题报告,任务书
摘 要
The research and realization that multilateral languages consistency
intelligence judgment for the WPS program Source
The paper researches the problems in the multi-language conformance testing of WPS, and defines the consistency judgment that includes three levers: character, word, semantic. The rule of coding Unicode and the character set of the given language is used to realize the character lever. Then it segments the sentence to words, and compares the words with standard dictionary to realize the word lever. Thirdly the paper builds an N-gram language model segmentation-based, and use this model to realize the semantic lever.
Finally, we developed a tool for English and Chinese consistency judgment by the method in this paper in WPS. For English, there adopted the method of looking up dictionary based spelling to realize the word consistency in sentences. For Chinese, there used the Statistical language model, which could express the frequency of word pair, to realize the semantic consistency in sentences. Three projects of WPS are examined using this tool. As a result it finds 33 errors in English edition, 15 errors in Chinese edition. It proves that the method in this paper is feasible and the Statistical language model is useful in the multilateral languages consistency judgment.
Key Words:Consistency Judgment,N-Gram,Language Model, Participle,
Multilateral Languages
目 录
1. 绪论 1
1.1 课题的背景及目的 1
1.2 国内外研究状况 1
1.3 课题研究方法 3
1.4 论文研究内容 3
2. 字符编码简介 5
2.1 从ASCII到Unicode 5
2.2 Unicode、UCS和UTF 6
2.3 中日韩统一表意文字 6
2.3.1. CJK的发展 6
2.3.2.字源分离原则 7
2.3.3. CJK编码区间 8
2.4 Unicode与GB2312在本文中的作用 8
3. 分词介绍及语言统计模型 10
3.1 中文分词简介 10
3.1.1 最大正向匹配法 11
3.1.2 逆向最大匹配法 12
3.1.3 最少切分法 12
3.1.4 双向匹配法 12
3.1.5 中文分词中的难题 13
3.2 统计语言模型 14
3.2.1 统计语言模型简史 14
3.2.2 统计语言模型的发展 14
3.2.3 建立一个简单的统计语言模型 15
3.3 N_Gram统计语言模型 16
3.3.1 N-Gram语言模型概述[2] 16
3.3.2 建立一个BiGram模型[6] 16
3.3.3 N-Gram的数据稀疏问题 18
3.3.4 N-Gram的数据平滑处理 18
4. WPS程序资源中多国语言一致性智能判定的实现 22
4.1 提取资源字符串 22
4.2 英文一致性判定的实现 23
4.2.1 英文字符一致性判定的实现 23
4.2.2 英文词、语法一致性的判定的实现 24
4.3 中文一致性判定的实现 27
4.3.1 中文字符一致性判定的实现 27
4.3.2 中文词、语法一致性的判定的实现 27
4.4 实验结果与分析 29
结论与展望 32
致 谢 34
参考文献 35