Garbage In/Garbage Out applies to more than just manufacturing. Dirty data can doom your predictive analytics project from the very start! In this video, Matt North will show you how to identify flaws, such as statistical outliers and missing values, to improve the usefulness and reliability of your results.
Using RapidMiner, Matt starts by importing a data set and examining it to ensure that it is importing correctly with the right data types. You will learn how to quickly identify outliers and missing values; and take steps to correct those problems in the data using filters on your data import. Business and data analysts that are using data for predictive modeling will find these techniques useful. A basic understanding of statistics and data organization/representations will help you get the most out of this video.
- learn how to identify and handle missing values on data imports in RapidMiner.
- learn to identify and handle statistical outliers in RapidMiner.
- understand techniques for evaluating data quality.
Matt North is a professor of Information Systems at Utah Valley University, where he teaches courses on data analytics and database development, administration and security. He holds degrees from BYU, Utah State University and West Virginia University. He served as a Fulbright appointee at Universidad Tecnológica Nacional in Argentina, and is the recipient of the International Association for Computer Information Systems’ Ben Bauman Award for Excellence, and the Gamma Sigma Alpha Outstanding Professor Award. He is the author of numerous articles, published papers and book chapters, in addition to his two books: "Data Mining for the Masses", and "Life Lessons and Leadership"Other videos in this series:Does Correlation Prove Causation in Predictive Analytics?How Do I Choose the Correct Predictive Model for My Organizational Questions?