CHAPTER 17 Looking to the Future

As I look down the road to where I see big data, data mining, and machine learning in the future, I see many opportunities and some serious challenges. The biggest challenge that will need to be addressed is the balance between data privacy and reproducible research. I was asked to address this topic in a question-and-answer session during an invited talk at the Joint Statistical Meeting for the American Statistical Society in 2013. It is a difficult topic with no clear path forward. How we balance user privacy and reproducible research results is a decision that will impact all of us both as consumers and also as data users. It requires us as a society to define what our right to privacy actually includes. This balance of competing interests includes questions of legality as well as technology. Regardless of the outcome, there will be significant impacts on organizations in these ways: to comply with research standards, adhere to legislation, and fund enforcement of the laws.

REPRODUCIBLE RESEARCH

As my fourth-grade daughter can repeat from memory, the scientific method is: formulate a hypothesis, then test your hypothesis through an experiment, and finally analyze your results to determine if you can reject the hypothesis. A key outcome of a scientifically performed study is the reproducibility of the results. Unfortunately, this basic expectation has become increasingly rare. The Economist in October 2013 detailed the difficulty in getting ...

Get Big Data, Data Mining, and Machine Learning: Value Creation for Business Leaders and Practitioners now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.