Data Mapping and Data Mining Overview: Implications for Prevention, Detection, and Investigation

As mentioned in the previous chapter, we are working in an industry that takes us easily from terabytes (1012 bytes) of data to pegabytes (1015 bytes) of data (which is about three years’ worth of Earth Observing System [EOS] data from NASA). Two pegabytes of data is all the information contained in all U.S. academic research libraries. This is truly a needle in a haystack. Roy Williams in Data Powers of Ten gives further examples of descriptive quantities (www2.sims.berkeley.edu/research/projects/how-much-info/datapowers.html). An exabyte (1018 bytes) is equivalent to the total volume of information generated worldwide annually; volume terms such as zettabyte (1021 bytes) and yottabyte (1024 bytes) cannot even be visualized. The models and concepts presented in this book are effective in focusing efforts within large quantities of data. EDA tools provide the opportunity to analyze data that normally would take a lifetime, if not generations, to analyze.

The added emphasis on volume of data is to underscore the need for EDA tools to get through large amounts of data. In addition, if we map our warehouse, mine it, and derive intelligence from it, in the forensic world, any conclusion that is drawn needs to be based on a defined scientific methodology or forensic standard in order for the process to be designated as conclusions drawn from a scientific methodology. The EDA process will ...

Get Healthcare Fraud: Auditing and Detection Guide, 2nd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.