Index
A
- accumulators, Programming with RDDs
- actions, Resilient Distributed Datasets
- add operator, Interactive Spark Using PySpark
- agents, Flume Data Flows-Flume Data Flows
- aggregation, Aggregation
- alternating least squares (ALS) algorithm, Collaborative Filtering
- analytics, with higher-level APIs, Analytics with Higher-Level APIs-Conclusion
- anonymous functions (closures), Programming with RDDs
- Apache Flume (see Flume)
- Apache HBase (see HBase)
- Apache Hive (see Hive)
- Apache Spark (see Spark)
- Apache Sqoop (see Sqoop)
- Apache Storm, Hadoop Streaming vs., Hadoop Streaming
- APIs, higher-level, Analytics with Higher-Level APIs-Conclusion
- architecture, distributed, Hadoop Architecture-YARN, Data Ingestion
- Avro, Ingesting Product Impression Data with Flume
B
- big data
- as term, Preface
- data science vs., Who This Book Is For
- Big Data
- Hadoop as OS for, An Operating System for Big Data-Conclusion
- bigrams, Counting Bigrams-Other Frameworks
- blocks, HDFS, Blocks
- bloom filtering, Bloom filtering-Bloom filtering
- broadcast variables, Programming with RDDs
- build phase, Conclusion
- byte array, NoSQL and Column-Oriented Databases
C
- cat command, Basic File System Operations, Executing Streaming Jobs
- centroids, Clustering
- classification, Scalable Machine Learning with Spark, Classification-Logistic regression classification: An example
- closures, Programming with RDDs
- Cloudera, Quick Start
- cluster-based systems, MapReduce: Implemented on a Cluster-MapReduce examples, Data Product Lifecycle
- clustering, unsupervised, Scalable Machine Learning with ...
Get Data Analytics with Hadoop now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.