In Chapter 1, you considered what data science is and is not, and saw how data science is more than data analysis, computer science, or statistics. This chapter further explores data science as a new discipline.
The chapter begins by considering two of the most important issues associated with big data. Then it works through some real-life examples of big data techniques, and considers some of the communication issues involved in an effective big data team environment. Finally, it considers how statistics is and will be part of data science, and touches on the elements of the big data ecosystem.
There are two issues associated with big data that must be discussed and understood: the “curse” of big data and rapid data flow. These two issues are discussed in the following sections.
The “curse” of big data is the danger involved in recklessly applying and scaling data science techniques that have worked well for small, medium, and large data sets, but don't necessarily work well for big data. This problem is well illustrated by the flaws found in big data trading (for which solutions are proposed in this chapter).
In short, the curse of big data is that when you search for patterns in large data sets with billions or trillions of data points and thousands of metrics, you are bound to identify coincidences that have no predictive power. Even worse, the strongest patterns might