Intro to Hadoop

Intro to Hadoop >> Introduction to Big Data

1. What does IaaS provide?

  • Hardware Only
  • Software On-Demand
  • Computing Environment

2. What does PaaS provide?

  • Hardware Only
  • Computing Environment
  • Software On-Demand

3. What does SaaS provide?

  • Hardware Only
  • Software On-Demand
  • Computing Environment

4. What are the two key components of HDFS and what are they used for?

  • NameNode for block storage and Data Node for metadata.
  • FASTA for genome sequence and Rasters for geospatial data.
  • NameNode for metadata and DataNode for block storage.

5. What is the job of the NameNode?

  • Coordinate operations and assigns tasks to Data Nodes
  • Listens from DataNode for block creation, deletion, and replication.
  • For gene sequencing calculations.

6. What is the order of the three steps to Map Reduce?

  • Map -> Reduce -> Shuffle and Sort
  • Map -> Shuffle and Sort -> Reduce
  • Shuffle and Sort -> Reduce -> Map
  • Shuffle and Sort -> Map -> Reduce

7. What is a benefit of using pre-built Hadoop images?

  • Less software choices to choose from.
  • Quick prototyping, deploying, and guaranteed bug free.
  • Guaranteed hardware support.
  • Quick prototyping, deploying, and validating of projects.

8. What are some examples of open-source tools built for Hadoop and what does it do?  

  • Zookeeper, analyze social graphs.  
  • Giraph, for SQL-like queries.  
  • Zookeeper, management system for animal named related components.  
  • Pig, for real-time and in-memory processing of big data.  

9. What is the difference between low level interfaces and high level interfaces?

  • Low level deals with storage and scheduling while high level deals with interactivity.
  • Low level deals with interactivity while high level deals with storage and scheduling.

10. Which of the following are problems to look out for when integrating your project with Hadoop?

  • Advanced Alogrithms
  • Random Data Access
  • Infrastructure Replacement
  • Task Level Parallelism
  • Data Level Parallelism

11. As covered in the slides, which of the following are the major goals of Hadoop?

  • Facilitate a Shared Environment
  • Enable Scalability
  • Latency Sensitive Tasks
  • Provide Value for Data
  • Optimized for a Variety of Data Types
  • Handle Fault Tolerance

12. What is the purpose of YARN?

  • Allows various applications to run on the same Hadoop cluster.
  • Enables large scale data across clusters.
  • Implementation of Map Reduce.

13. What are the two main components for a data computation framework that were described in the slides?  

  •   Node Manager and Applications Master  
  •   Node Manager and Container  
  •   Resource Manager and Node Manager  
  •   Applications Master and Container  
  •   Resource Manager and Container  

Leave a Comment