You are previewing Learning Python for Forensics.
O'Reilly logo
Learning Python for Forensics

Book Description

Learn the art of designing, developing, and deploying innovative forensic solutions through Python

About This Book

  • This practical guide will help you solve forensic dilemmas through the development of Python scripts

  • Analyze Python scripts to extract metadata and investigate forensic artifacts

  • Master the skills of parsing complex data structures by taking advantage of Python libraries

  • Who This Book Is For

    If you are a forensics student, hobbyist, or professional that is seeking to increase your understanding in forensics through the use of a programming language, then this book is for you.

    You are not required to have previous experience in programming to learn and master the content within this book. This material, created by forensic professionals, was written with a unique perspective and understanding of examiners who wish to learn programming

    What You Will Learn

  • Discover how to perform Python script development

  • Update yourself by learning the best practices in forensic programming

  • Build scripts through an iterative design

  • Explore the rapid development of specialized scripts

  • Understand how to leverage forensic libraries developed by the community

  • Design flexibly to accommodate present and future hurdles

  • Conduct effective and efficient investigations through programmatic pre-analysis

  • Discover how to transform raw data into customized reports and visualizations

  • In Detail

    This book will illustrate how and why you should learn Python to strengthen your analysis skills and efficiency as you creatively solve real-world problems through instruction-based tutorials. The tutorials use an interactive design, giving you experience of the development process so you gain a better understanding of what it means to be a forensic developer.

    Each chapter walks you through a forensic artifact and one or more methods to analyze the evidence. It also provides reasons why one method may be advantageous over another. We cover common digital forensics and incident response scenarios, with scripts that can be used to tackle case work in the field. Using built-in and community-sourced libraries, you will improve your problem solving skills with the addition of the Python scripting language. In addition, we provide resources for further exploration of each script so you can understand what further purposes Python can serve. With this knowledge, you can rapidly develop and deploy solutions to identify critical information and fine-tune your skill set as an examiner.

    Style and approach

    The book begins by instructing you on the basics of Python, followed by chapters that include scripts targeted for forensic casework. Each script is described step by step at an introductory level, providing gradual growth to demonstrate the available functionalities of Python.

    Downloading the example code for this book. You can download the example code files for all Packt books you have purchased from your account at If you purchased this book elsewhere, you can visit and register to have the code file.

    Table of Contents

    1. Learning Python for Forensics
      1. Table of Contents
      2. Learning Python for Forensics
      3. Credits
      4. About the Authors
      5. Acknowledgments
      6. About the Reviewer
        1. Support files, eBooks, discount offers, and more
          1. Why subscribe?
          2. Free access for Packt account holders
      8. Preface
        1. What this book covers
        2. What you need for this book
        3. Who this book is for
        4. Conventions
        5. Reader feedback
        6. Customer support
          1. Downloading the example code
          2. Downloading the color images of this book
          3. Errata
          4. Piracy
          5. Questions
      9. 1. Now For Something Completely Different
        1. When to use Python?
        2. Getting started
        3. Standard data types
          1. Strings and Unicode
          2. Integers and floats
          3. Booleans and None
          4. Structured data types
            1. Lists
            2. Dictionaries
            3. Sets and tuples
        4. Data type conversions
        5. Files
        6. Variables
        7. Understanding scripting flow logic
          1. Conditionals
          2. Loops
            1. For
            2. While
        8. Functions
        9. Summary
      10. 2. Python Fundamentals
        1. Advanced data types and functions
          1. Iterators
          2. Datetime objects
        2. Libraries
          1. Installing third-party libraries
          2. Libraries in this book
          3. Python packages
        3. Classes and object-oriented programming
        4. Try and except
          1. Raise
        5. Creating our first script –
        6. User input
          1. Using the raw input method and the system module –
          2. Understanding Argparse –
        7. Forensic scripting best practices
        8. Developing our first forensic script –
          1. Understanding the main() function
          2. Exploring the getRecord() function
          3. Interpreting the searchKey() function
          4. Running our first forensic script
        9. Troubleshooting
        10. Challenge
        11. Summary
      11. 3. Parsing Text Files
        1. Setup API
        2. Introducing our script
          1. Overview
        3. Our first iteration –
          1. Designing the main() function
          2. Crafting the parseSetupapi() function
          3. Developing the printOutput() function
          4. Running the script
        4. Our second iteration –
          1. Improving the main() function
          2. Tuning the parseSetupapi() function
          3. Modifying the printOutput() function
          4. Running the script
        5. Our final iteration –
          1. Extending the main() function
          2. Adding to the parseSetupapi() function
          3. Creating the parseDeviceInfo() function
          4. Forming the prepUSBLookup() function
          5. Constructing the getDeviceNames() function
          6. Enhancing the printOutput() function
          7. Running the script
        6. Additional challenges
        7. Summary
      12. 4. Working with Serialized Data Structures
        1. Serialized data structures
        2. A simple Bitcoin Web API
        3. Our first iteration –
          1. Exploring the main() function
          2. Understanding the getAddress() function
          3. Working with the printTransactions() function
          4. The printHeader() helper function
          5. The getInputs() helper function
          6. Running the script
        4. Our second iteration –
          1. Modifying the main() function
          2. Improving the getAddress() function
          3. Elaborating on the printTransactions() function
          4. Running the script
        5. Mastering our final iteration –
          1. Enhancing the parseTransactions() function
          2. Developing the csvWriter() function
          3. Running the script
          4. Additional challenges
        6. Summary
      13. 5. Databases in Python
        1. An overview of databases
        2. Using SQLite3
          1. Using the Structured Query Language
        3. Designing our script
        4. Manually manipulating databases with Python –
          1. Building the main() function
          2. Initializing the database with the initDB() function
          3. Checking for custodians with the getOrAddCustodian() function
          4. Retrieving custodians with the getCustodian() function
          5. Understanding the ingestDirectory() function
            1. Exploring the os.stat() method
          6. Developing the formatTimestamp() helper function
          7. Configuring the writeOutput() function
          8. Designing the writeCSV() function
          9. Composing the writeHTML() function
          10. Running the script
        5. Further automating databases –
          1. Peewee setup
          2. Jinja2 setup
          3. Updating the main() function
          4. Adjusting the initDB() function
          5. Modifying the getOrAddCustodian() function
          6. Improving the ingestDirectory() function
          7. A closer look at the formatTimestamp() function
          8. Converting the writeOutput() function
          9. Simplifying the writeCSV() function
          10. Condensing the writeHTML() function
          11. Running our new and improved script
        6. Challenge
        7. Summary
      14. 6. Extracting Artifacts from Binary Files
        1. UserAssist
          1. Understanding the ROT-13 substitution cipher –
          2. Evaluating code with timeit
        2. Working with the Registry module
        3. Introducing the Struct module
        4. Creating spreadsheets with the xlsxwriter module
          1. Adding data to a spreadsheet
          2. Building a table
          3. Creating charts with Python
        5. The UserAssist framework
          1. Developing our UserAssist logic processor –
            1. Evaluating the main() function
            2. Defining the createDictionary() function
            3. Extracting data with the parseValues() function
            4. Processing strings with the getName() function
          2. Writing Excel spreadsheets –
            1. Controlling output with the excelWriter() function
            2. Summarizing data with the dashboardWriter() function
            3. Writing artifacts in the userassistWriter() function
            4. Defining the fileTime() function
            5. Processing integers with the sortByCount() function
            6. Processing DateTime objects with the sortByDate() function
          3. Writing generic spreadsheets –
            1. Understanding the csvWriter() function
        6. Running the UserAssist framework
        7. Additional challenges
        8. Summary
      15. 7. Fuzzy Hashing
        1. Background on hashing
          1. Hashing files in Python
          2. Deep dive into rolling hashes
            1. Implementing rolling hashes –
            2. Limitations of rolling hashes
          3. Exploring fuzzy hashing –
          4. Starting with the main function
          5. Working with files in the fileController() function
          6. Working with directories in the directoryController() function
          7. Generating fuzzy hashes with the fuzzFile() function
          8. Exploring the compareFuzzies() function
          9. Creating reports with the writer() function
          10. Running the first iteration
        2. Using SSDeep in Python –
          1. Revisiting the main() function
          2. The new fileController() function
          3. Repurposing the directoryController() function
          4. Demonstrating changes in the writer() function
          5. Running the second iteration
        3. Additional challenges
        4. Citations
        5. Summary
      16. 8. The Media Age
        1. Creating frameworks in Python
        2. Introduction to EXIF metadata
          1. Introducing the Pillow module
        3. Introduction to ID3 metadata
          1. Introducing the Mutagen module
        4. Introduction to Office metadata
          1. Introducing the lxml module
        5. Metadata_Parser framework overview
          1. Our main framework controller –
          2. Controlling our framework with the main() function
        6. Parsing EXIF metadata –
          1. Understanding the exifParser() function
          2. Developing the getTags() function
          3. Adding the dmsToDecimal() function
        7. Parsing ID3 metdata –
          1. Understanding the id3Parser() function
          2. Revisiting the getTags() function
        8. Parsing Office metadata –
          1. Evaluating the officeParser() function
          2. The getTags() function for the last time
        9. Moving on to our writers
          1. Writing spreadsheets –
          2. Plotting GPS data with Google Earth –
          3. Supporting our framework with processors
            1. Creating framework-wide utility functions –
        10. Framework summary
        11. Additional challenges
        12. Summary
      17. 9. Uncovering Time
        1. About timestamps
          1. What is epoch?
        2. Using a GUI
          1. Basics of Tkinter objects
            1. Implementation of the Tkinter GUI
            2. Using Frame objects
          2. Using classes in Tkinter
        3. Developing the Date Decoder GUI –
          1. The DateDecoder class setup and __init__() method
          2. Executing the run() method
          3. Implementing the buildInputFrame() method
          4. Creating the buildOutputFrame() method
          5. Building the convert() method
          6. Defining the convert_unix_seconds() method
          7. Conversion using the convertWindowsFiletime_64() method
          8. Converting with the convertChromeTimestamps() method
          9. Designing the output method
          10. Running the script
        4. Additional challenges
        5. Summary
      18. 10. Did Someone Say Keylogger?
        1. A detailed look at keyloggers
          1. Hardware keyloggers
          2. Software keyloggers
            1. Detecting malicious processes
        2. Building a keylogger for Windows
          1. Using the Windows API
            1. PyWin32
            2. PyHooks
            3. WMI
          2. Monitoring keyboard events
          3. Capturing screenshots
          4. Capturing the clipboard
          5. Monitoring processes
        3. Multiprocessing in Python –
        4. Running Python without a command window
        5. Exploring the code
          1. Capturing the screen
          2. Capturing the clipboard
          3. Capturing the keyboard
          4. Keylogger controllers
          5. Capturing processes
          6. Understanding the main() function
          7. Running the script
        6. Citations
        7. Additional challenges
        8. Summary
      19. 11. Parsing Outlook PST Containers
        1. The Personal Storage Table File Format
        2. An introduction to libpff
          1. How to install libpff and pypff
        3. Exploring PSTs –
          1. An overview
          2. Developing the main() function
          3. Evaluating the makePath() helper function
          4. Iteration with the folderTraverse() function
          5. Identifying messages with the checkForMessages() function
          6. Processing messages in the processMessage() function
          7. Summarizing data in the folderReport() function
          8. Understanding the wordStats() function
          9. Creating the wordReport() function
          10. Building the senderReport() function
          11. Refining the heat map with the dateReport() function
          12. Writing the HTMLReport() function
          13. The HTML template
        4. Running the script
        5. Additional challenges
        6. Summary
      20. 12. Recovering Transient Database Records
        1. SQLite WAL files
          1. WAL format and technical specifications
            1. The WAL header
            2. The WAL frame
          2. The WAL cell and varints
          3. Manipulating large objects in Python
        2. Regular expressions in Python
        3. TQDM – a simpler progress bar
        4. Parsing WAL files –
          1. Understanding the main() function
          2. Developing the frameParser() function
          3. Processing cells with the cellParser() function
          4. Writing the dictHelper() function
            1. The Python debugger – pdb
          5. Processing varints with the singleVarint() function
          6. Processing varints with the multiVarint() function
          7. Converting serial types with the typeHelper() function
          8. Writing output with the csvWriter() function
          9. Using regular expression in the regularSearch() function
        5. Executing
        6. Challenge
        7. Summary
      21. 13. Coming Full Circle
        1. Frameworks
          1. Building a framework structure to last
          2. Data standardization
          3. Forensic frameworks
        2. Colorama
        3. FIGlet
        4. Exploring the framework –
          1. Exploring the Framework object
            1. Understanding the Framework __init__() constructor
            2. Creating the Framework run() method
            3. Iterating through files with the Framework _list_files() method
            4. Developing the Framework _run_plugins() method
          2. Exploring the Plugin object
            1. Understanding the Plugin __init__() constructor
            2. Working with the Plugin run() method
            3. Handling output with the Plugin write() method
          3. Exploring the Writer object
            1. Understanding the Writer __init__() constructor
            2. Understanding the Writer run() method
          4. Our Final CSV writer –
          5. The writer –
          6. Changes made to plugins
          7. Executing the framework
          8. Additional challenges
        5. Summary
      22. A. Installing Python
        1. Python for Windows
        2. Python for OS X and Linux
      23. B. Python Technical Details
        1. The Python installation folder
          1. The Doc folder
          2. The Lib folder
          3. The Scripts folder
          4. The Python interpreter
            1. Python modules
      24. C. Troubleshooting Exceptions
        1. AttributeError
        2. ImportError
        3. IndentationError
        4. IOError
        5. IndexError
        6. KeyError
        7. NameError
        8. TypeError
        9. ValueError
        10. UnicodeEncodeError and UnicodeDecodeError
      25. Index