16.2 Data Preparation

We draw on the latest version of the sysres EUPRO database. This database includes all information publicly available through the CORDIS projects database 1) and is maintained by ARC systems research (ARC sys). The sysres EUPRO database presently comprises data on funded research projects of the EU FPs (complete for FP1–FP5, and about 70% complete for FP6) and all participating organizations. It contains systematic information on project objectives and achievements, project costs, project funding, and contract type, as well as information on the participating organizations including the full name, full address, and type of organization.

For purposes of network analysis, the main challenge is the inconsistency of the raw data. Apart from incoherent spelling in up to four languages per country, organizations are labeled in homogeneously. Entries may range from large corporate groupings, such as EADS, Siemens, and Philips, or large public research organizations like CNR, CNRS, and CSIC, to individual departments and labs.

Due to these shortcomings, the raw data are of limited use for meaningful network analysis. Further, any fully automated standardization procedure is infeasible. Instead, a labor-intensive, manual data-cleaning process is used in building the database. The data-cleaning process is described in reference [34]; here, we restrict our discussion to the steps of the process relevant to the present work. These are as follows.

  1. Identification of unique ...

Get Analysis of Complex Networks now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.