Virtual Voice Sample
  • Big Data Processing and Analysis Frameworks

  • By: Koffka Khan
  • Narrated by: Virtual Voice
  • Length: 9 hrs and 52 mins

Prime logo Prime members: New to Audible?
Get 2 free audiobooks during trial.
Pick 1 audiobook a month from our unmatched collection.
Listen all you want to thousands of included audiobooks, Originals, and podcasts.
Access exclusive sales and deals.
Premium Plus auto-renews for $14.95/mo after 30 days. Cancel anytime.
Big Data Processing and Analysis Frameworks  By  cover art

Big Data Processing and Analysis Frameworks

By: Koffka Khan
Narrated by: Virtual Voice
Try for $0.00

$14.95/month after 30 days. Cancel anytime.

Buy for $3.99

Buy for $3.99

Pay using card ending in
By confirming your purchase, you agree to Audible's Conditions of Use and Amazon's Privacy Notice. Taxes where applicable.
Background images

This title uses virtual voice narration

Virtual voice is computer-generated narration for audiobooks
activate_primeday_promo_in_buybox_DT

Publisher's summary

This book part I focuses on Apache Hadoop. This is broken down into 16 chapters. Chapter 1 gives the Introduction. In chapter 2 we explore the Big Data Problem. Chapter 3 illustrates a Big Data Scenario while chapter 4 introduces Apache Hadoop. The Hadoop architecture is given in chapter 5. Chapter 6 introduces HDFS. The benefits of distributed file systems is discussed in chapter 7. Chapter 8 and 9 explains writing and reading files. In chapter 10 MapReduce is introduced with MapReduce Programming in chapter 11. YARN is introduced in chapter 12 and its architecture given in chapter 13. The Hadoop architecture is explored in chapter 14. In chapter 15 we discuss the Hadoop Cluster. Finally, in chapter 16 the Hadoop ecosystem is given.
Apache Kafka is a distributed data storage for ingesting and processing streaming data in real time. Streaming data is information that is continuously produced by hundreds of data sources that all send data records in at the same time. A streaming platform must be able to handle a steady stream of data and process it in a sequential and progressive manner. The technique is frequently used to build real-time streaming data pipelines that enable streaming analytics and mission-critical use cases with guaranteed ordering, no message loss, and processing that happens exactly once.
Apache Kafka is extremely scalable and quick because it allows data to be distributed across several servers. It decouples data streams and thereby reduces latency. It can also distribute and duplicate partitions over other servers, preventing server failure. This book part reviews the operations of this important distributed stream processing system. This book part consists of six chapters. Chapter 1 gives the introduction. In chapter 2 we describe the components of Kafka within its environment. Chapter 3 describes Zookeeper with Kafka events and streams explained in chapter 4. Use cases of Kafka are given in chapter 5. Finally, in chapter 6 the findings are stated.
This book part (Apache Spark) has six chapters. The first chapter is the Introduction. In the second chapter we discuss the components of Spark. In-memory processing is our area of discussion in chapter three. In chapter 4 we discuss MapReduce vs Spark. In chapter five we talk about Apache Spark Streaming. Finally, in chapter six we speak briefly about Apache Spark MLlib.
This book part (Apache Hive) consists of two chapters. The first gives an introduction to Apache Hive and the second describes its architecture. Apache Hive is an open source data warehouse program for reading, writing, and managing massive data sets stored in the Apache Hadoop Distributed File System (HDFS) or other data storage systems such as Apache HBase.

What listeners say about Big Data Processing and Analysis Frameworks

Average customer ratings

Reviews - Please select the tabs below to change the source of reviews.