【编者按】:网学网其他类别为您提供The Design of Heteromerous Data Integration Based on Web Ser参考,解决您在The Design of Heteromerous Data Integration Based on Web Ser学习中工作中的难题,参考学习。
客服咨询,网学网竭诚为您服务,本站永久域名:myeducs.cn | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
2.2 Data analysis 2.2.1 Overview The integrated source is diverse in structure, semantics or presentation. The goal of data analysis is to eliminate the differences, by figuring out the similarity and conflicts among several data sources, building meta data, and then making out transform rules or functions. Also data analysis need to cleanse the wrong data that do not satisfy the integrality regulation, or not consistent in fact [3]. There are two ways in data analysis, one is data anatomy, and the other is data mining. Data anatomy uses the method of analyzing the real data, to reach the goal of finding out the semantic information, such as data types, units, and data domains. The end point of data mining is finding the relationship among attributes, the restriction among data and so on; it should be performed in a large scaled storage. After above information is available, the next step for analyzers is to deal with conflicts among the heteromerous data sources. It concludes two kinds of conflicts from a high level, as follows: 1. Schema Conflicts It can be divided into two detailed kinds, similar schema conflicts and heteromerous schema conflicts. (1) Similar schema conflicts, such as the attribute of table A is composed from several attributes of table B. The other example is data in table C is the union of table A and B, and this is at the level of tables. In samples, there is additional attributes in Dormitory MIS, which should be added into the data warehouse when integration. (2) Heteromerous schema conflicts. Here the sample is used. For students'' basic information, there is an attribute called politic status. The difference lies in that in student MIS, the information is stored in a field, which holds the value. While in the Dormitory MIS, the information is deposited alone in a table, and the key of each record is used for referring to value of politic information. 2. Semantic Conflicts It includes following types, schemas'' semantic conflicts, attributes'' semantic conflicts. Schemas'' semantic conflict is another saying for schema conflicts. Attributes'' semantic conflict can be classified into following kinds: (1) Type conflict, which means type or length of fields are different; (2) Naming conflict. Even a simple field, it can have several names, which depends on the vocabulary and habits of database designers. For example, as to display names of an enterprise, the word like "company", or "cooperation" is optional; (3) Data unit conflict. Likewise, to represent the height of a person, meter and centimeter are both satisfying; (4) Data precision conflict. Different system will not acquire for the same precision for information managed, such as numeric information, the decimal digits may be chosen in different ways, two or three, even four digits are all reasonable; (5) Format conflict. In this scenario, the classic example is date format, which can be represented in "YYMMDD" or "YYYYMMDD", and so on; Other specific categories are omitted, which can be found out by readers voluntarily. 2.2.2 Database design of source applications In the design, the first step to analyze data is to examine the current design of source applications, Student MIS and Dormitory MIS. And the most important material is SRS and Data Model Design Document. Luckily if the final implementation fits into the design, as well as the design meets the requirements, not too much time will be spent in the process. All work is to understand the current design and figure out the similarity and difference between the two systems, to make clear what entities are common, what can be reused, and what need to be changed for a better support. There are following common entities in both systems, what can be grouped into four: 1. Student politic information, native place information, nation information; 2. Student kind information, state information; 3. Academic basic information, major basic information, class basic information; 4. Student basic information, campus information, family information, the change histories of major student information. More detailed result of compare is shown in following tables. Table 2.1 represents the difference between tables in Student MIS and Dormitory MIS, while Table 2.2 and Table 2.3 represents the differences between fields of student basic information and change histories of major student information respectively. Table 2.1 Schema Conflicts
Table 2.2 Semantic conflicts of student basic information
Table 2.3 Semantic conflicts of change histories of major student information
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
本站发布的计算机毕业设计均是完整无错的全套作品,包含开题报告+程序+论文+源代码+翻译+答辩稿PPT | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
本文选自计算机毕业设计http://myeducs.cn |