Recently, we were able to ask five questions of Daniel D. Gutierrez about his new book from Technics Publications called “Machine Learning and Data Science: An Introduction to Statistical Learning Methods with R.” Below, Daniel talks about his inspiration for writing, why he chose R as the language for the book, and who would benefit most from reading it.
1. What was the genesis of the book? Why did you choose to write it? I’ve been an educator for pretty much all of my professional life while also working in industry and recently I had been teaching a number of corporate training courses on data science and R through UC Irvine Extension. I developed some educational content based on my experiences teaching and through feedback from my students at large companies like Toyota and Southern California Edison. Over time, I sensed some common threads in terms of how people were able to embrace the subject matter – like recurring questions and pain points. I decided to formalize the content in the form of a book since there were no suitable books out there that addressed the needs of professionals trying to transition into the field from other disciplines. There were a number of good books, but they were too advanced; they weren’t a good launching pad.
2. Who will this book help to get into data science and machine learning? I found that many professionals from fields like finance, sales, marketing, product development and others, needed a way to jump-start their entry into data science using methodologies like machine learning. I would get questions through places like LinkedIn, Quora, Twitter, etc. from people from all over the world, and the central theme was the same, they needed a guide to the Data Science Process. I wrote the book to help these people
3. Why did you choose the R language for the book? R is the statistical environment I use in my consulting practice for my own data science projects so that’s the language I know best. Plus, R is getting a lot of attention worldwide as the number one choice for data scientists. I’m a member of the R Meetup group in my home town of Los Angeles, and we have transplants from many countries – many have used R for years. R now has over 7,500 packages that extend the language which is one reason why R is so popular.
4. How did you decide on the organization of the book? Over the years, I’ve refined my own “Data Science Process” so I organized the book to align with the process I use to approach machine learning projects. It’s worked well for me over the years. The book allows the reader to build up the beginnings of their own “data science toolbox” with techniques they can draw upon to solve specific needs. After reading the book, they’ll want to add many techniques to their toolbox to match their needs for the specific problem domains in which they work. Over time, I hope they’ll have a very broad toolbox that can address most needs.
5. Can the book be used independently or would it help to take a class alongside? Yes, I think the book can be a standalone guide to getting started in the field, that’s how I wrote it. All you need is the book, the R code found in the book that’s available from the publisher’s website, and some open source software – R and RStudio. But I think using the book in conjunction with a course also is a good idea. In fact, I’m using the book as the required text for a new online course I’m teaching for UC Davis Extension in January 2016.