Intro to Hadoop

Intro to Hadoop >> Introduction to Big Data

1. What does IaaS provide?

Hardware Only
Software On-Demand
Computing Environment

2. What does PaaS provide?

Hardware Only
Computing Environment
Software On-Demand

3. What does SaaS provide?

Hardware Only
Software On-Demand
Computing Environment

4. What are the two key components of HDFS and what are they used for?

NameNode for block storage and Data Node for metadata.
FASTA for genome sequence and Rasters for geospatial data.
NameNode for metadata and DataNode for block storage.

5. What is the job of the NameNode?

Coordinate operations and assigns tasks to Data Nodes
Listens from DataNode for block creation, deletion, and replication.
For gene sequencing calculations.

6. What is the order of the three steps to Map Reduce?

Map -> Reduce -> Shuffle and Sort
Map -> Shuffle and Sort -> Reduce
Shuffle and Sort -> Reduce -> Map
Shuffle and Sort -> Map -> Reduce

7. What is a benefit of using pre-built Hadoop images?

Less software choices to choose from.
Quick prototyping, deploying, and guaranteed bug free.
Guaranteed hardware support.
Quick prototyping, deploying, and validating of projects.

8. What are some examples of open-source tools built for Hadoop and what does it do?

Zookeeper, analyze social graphs.
Giraph, for SQL-like queries.
Zookeeper, management system for animal named related components.
Pig, for real-time and in-memory processing of big data.

9. What is the difference between low level interfaces and high level interfaces?

Low level deals with storage and scheduling while high level deals with interactivity.
Low level deals with interactivity while high level deals with storage and scheduling.

10. Which of the following are problems to look out for when integrating your project with Hadoop?

Advanced Alogrithms
Random Data Access
Infrastructure Replacement
Task Level Parallelism
Data Level Parallelism

11. As covered in the slides, which of the following are the major goals of Hadoop?

Facilitate a Shared Environment
Enable Scalability
Latency Sensitive Tasks
Provide Value for Data
Optimized for a Variety of Data Types
Handle Fault Tolerance

12. What is the purpose of YARN?

Allows various applications to run on the same Hadoop cluster.
Enables large scale data across clusters.
Implementation of Map Reduce.

13. What are the two main components for a data computation framework that were described in the slides?

Node Manager and Applications Master
Node Manager and Container
Resource Manager and Node Manager
Applications Master and Container
Resource Manager and Container

Leave a Comment Cancel reply