Data integration
Data integration combines data from multiple sources to form a coherent data store. The common issues here are as follows:
- Heterogeneous data: This has no common key
- Different definition: This is intrinsic, that is, same data with different definition, such as a different database schema
- Time synchronization: This checks if the data is gathered under same time periods
- Legacy data: This refers to data left from the old system
- Sociological factors: This is the limit of data gathering
There are several approaches that deal with the above issues:
- Entity identification problem: Schema integration and object matching are tricky. This referred to as the entity identification problem.
- Redundancy and correlation analysis: Some redundancies can ...
Get R: Data Analysis and Visualization now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.