3.2 DATA SOURCES

The quality of the data is the single most important factor to influence the quality of the results from any analysis. The data should be reliable and represent the defined target population. Data is often collected to answer specific questions using the following types of studies:

  • Surveys or polls: A survey or poll can be useful for gathering data to answer specific questions. An interview using a set of predefined questions is usually conducted either over the phone, in person or over the Internet. They are often used to elicit information on people's opinions, preferences and behavior. For example, a poll may be used to understand how a population of eligible voters will cast their vote in an upcoming election. The specific questions to be answered along with the target population should be clearly defined prior to any survey. Any bias in the survey should be eliminated. To achieve this, a true random sample of the target population should be taken. Bias can be introduced in situations where only those responding to the questionnaire are included in the survey since this group may not represent an unbiased random sample. The questionnaire should contain no leading questions, that is, questions that favor a particular response. It is also important that no bias relating to the time the survey was conducted, is introduced. The sample of the population used in the survey should be large enough to answer the questions with confidence. This will be described in ...

Get Making Sense of Data: A Practical Guide to Exploratory Data Analysis and Data Mining now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.