images

Storing and Managing Data in HDFS

WHAT YOU WILL LEARN IN THIS CHAPTER

  • Getting to Know the History and Fundamentals of HDFS
  • Interacting with HDFS to Manage Files
  • Administering HDFS Environments
  • Managing Your HDFS Data

This chapter discusses the basics of storing and managing data for use in your big data system. The options for storing data can vary, depending on whether you are using the HDInsight Service or the Hortonworks distribution. Both options offer the Hadoop Distributed File System (HDFS), which is the standard file storage mechanism. HDInsight also offers the option of using Azure Storage Vault (ASV), which presents a full HDFS file system that uses Azure Blob storage “under the hood.”

Quite a bit of complexity underlies the full HDFS implementation, and a complete description of it would take a book of its own. This chapter instead focuses on the core knowledge you need to leverage HDFS. It also provides some details of what happens under the hood, where appropriate, to help you understand how to best use the system. Fortunately, HDFS is a stable, mature product and used by a large number of companies on an ongoing basis. In the same way that you can use SQL Server to accomplish a great deal of work without understanding its internal workings, you can use HDFS without worrying about the low-level details.

Understanding the Fundamentals of HDFS

HDFS's origin can be traced ...

Get Microsoft Big Data Solutions now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.