O'Reilly logo

Agile Data Science 2.0 by Russell Jurney

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Chapter 2. Agile Tools

This chapter will briefly introduce our software stack. This stack is optimized for our process.

By the end of this chapter, -you’ll be collecting, storing, processing, publishing, and decorating data. Our stack enables one person to do all of this, to go “full stack.” images/agile_data_science_data_processing_flow.png Full stack skills are some of the most in demand for data scientists. We’ll cover a lot, and quickly, but don’t worry: I will continue to demonstrate this software stack in Chapters 5 through 11. You need only understand the basics now; you will get more comfortable later.

We begin with instructions for running our stack in local mode on your own machine. In the next chapter, you’ll learn how to scale this same stack in the cloud via Amazon Web Services. Let’s get started!

Code examples for this chapter are available at https://github.com/rjurney/Agile_Data_Code_2/tree/master/ch02. Clone the repository and follow along!

git clone https://github.com/rjurney/Agile_Data_Code_2.git

Scalability = Simplicity

As NoSQL tools like Spark, Hadoop, MongoDB, data science, and big data have developed, much focus has been placed on the plumbing of analytics applications. However, this is not a book about infrastructure. This book teaches you to build applications that use such infrastructure. Once we introduce our stack, we will take this plumbing for granted and build applications that depend on it. Thus, this book devotes only two chapters to infrastructure: one ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required