Wrapping everything in a pipeline

As a concluding topic, we will discuss how to wrap together the operations of transformation and selection we have seen so far, into a single command, a pipeline that will take your data from source to your machine learning algorithm.

Wrapping all your data operations into a single command offers some advantages:

  • Your code becomes clear and more logically constructed because pipelines force you to rely on functions for your operations (each step a function)
  • You treat the test data in the same exact way as your train data without code repetitions or possibility of any mistake in the process
  • You can easily grid-search the best parameters on all the data pipelines you devised, not just on the machine learning hyperparameters ...

Get Python Data Science Essentials - Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.