Data integration

Data integration combines data from multiple sources to form a coherent data store. The common issues here are as follows:

  • Heterogeneous data: This has no common key
  • Different definition: This is intrinsic, that is, same data with different definition, such as a different database schema
  • Time synchronization: This checks if the data is gathered under same time periods
  • Legacy data: This refers to data left from the old system
  • Sociological factors: This is the limit of data gathering

There are several approaches that deal with the above issues:

  • Entity identification problem: Schema integration and object matching are tricky. This referred to as the entity identification problem.
  • Redundancy and correlation analysis: Some redundancies can ...

Get R: Data Analysis and Visualization now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.