You are previewing DevOps Troubleshooting: Linux® Server Best Practices.
O'Reilly logo
DevOps Troubleshooting: Linux® Server Best Practices

Book Description

“If you’re a developer trying to figure out why your application is not responding at 3 am, you need this book! This is now my go-to book when diagnosing production issues. It has saved me hours in troubleshooting complicated operations problems.”

Trotter Cashion, cofounder, Mashion

DevOps can help developers, QAs, and admins work together to solve Linux server problems far more rapidly, significantly improving IT performance, availability, and efficiency. To gain these benefits, however, team members need common troubleshooting skills and practices.

In DevOps Troubleshooting: Linux Server Best Practices, award-winning Linux expert Kyle Rankin brings together all the standardized, repeatable techniques your team needs to stop finger-pointing, collaborate effectively, and quickly solve virtually any Linux server problem. Rankin walks you through using DevOps techniques to troubleshoot everything from boot failures and corrupt disks to lost email and downed websites. You’ll master indispensable skills for diagnosing high-load systems and network problems in production environments.

Rankin shows how to

  • Master DevOps’ approach to troubleshooting and proven Linux server problem-solving principles

  • Diagnose slow servers and applications by identifying CPU, RAM, and Disk I/O bottlenecks

  • Understand healthy boots, so you can identify failure points and fix them

  • Solve full or corrupt disk issues that prevent disk writes

  • Track down the sources of network problems

  • Troubleshoot DNS, email, and other network services

  • Isolate and diagnose Apache and Nginx Web server failures and slowdowns

  • Solve problems with MySQL and Postgres database servers and queries

  • Identify hardware failures–even notoriously elusive intermittent failures

  • Table of Contents

    1. Title Page
    2. Copyright Page
    3. Dedication Page
    4. Contents
    5. Preface
    6. Acknowledgments
    7. About the Author
    8. Chapter 1. Troubleshooting Best Practices
      1. Divide the Problem Space
      2. Practice Good Communication When Collaborating
      3. Favor Quick, Simple Tests over Slow, Complex Tests
      4. Favor Past Solutions
      5. Document Your Problems and Solutions
      6. Know What Changed
      7. Understand How Systems Work
      8. Use the Internet, but Carefully
      9. Resist Rebooting
    9. Chapter 2. Why Is the Server So Slow? Running Out of CPU, RAM, and Disk I/O
      1. System Load
      2. Diagnose Load Problems with top
      3. Troubleshoot High Load after the Fact
    10. Chapter 3. Why Won’t the System Boot? Solving Boot Problems
      1. The Linux Boot Process
      2. BIOS Boot Order
      3. Fix GRUB
      4. Disable Splash Screens
      5. Can’t Mount the Root File System
      6. Can’t Mount Secondary File Systems
    11. Chapter 4. Why Can’t I Write to the Disk? Solving Full or Corrupt Disk Issues
      1. When the Disk Is Full
      2. Out of Inodes
      3. The File System Is Read-Only
      4. Repair Corrupted File Systems
      5. Repair Software RAID
    12. Chapter 5. Is the Server Down? Tracking Down the Source of Network Problems
      1. Server A Can’t Talk to Server B
      2. Troubleshoot Slow Networks
      3. Packet Captures
    13. Chapter 6. Why Won’t the Hostnames Resolve? Solving DNS Server Issues
      1. DNS Client Troubleshooting
      2. DNS Server Troubleshooting
    14. Chapter 7. Why Didn’t My Email Go Through? Tracing Email Problems
      1. Trace an Email Request
      2. Understand Email Headers
      3. Problems Sending Email
      4. Problems Receiving Email
    15. Chapter 8. Is the Website Down? Tracking Down Web Server Problems
      1. Is the Server Running?
      2. Test a Web Server from the Command Line
      3. HTTP Status Codes
      4. Parse Web Server Logs
      5. Get Web Server Statistics
      6. Solve Common Web Server Problems
    16. Chapter 9. Why Is the Database Slow? Tracking Down Database Problems
      1. Search Database Logs
      2. Is the Database Running?
      3. Get Database Metrics
      4. Identify Slow Queries
    17. Chapter 10. It’s the Hardware’s Fault! Diagnosing Common Hardware Problems
      1. The Hard Drive Is Dying
      2. Test RAM for Errors
      3. Network Card Failures
      4. The Server Is Too Hot
      5. Power Supply Failures
    18. Index