Intro to Hadoop >> Introduction to Big Data
1. What does IaaS provide?
- Hardware Only
- Software On-Demand
- Computing Environment
2. What does PaaS provide?
- Hardware Only
- Computing Environment
- Software On-Demand
3. What does SaaS provide?
- Hardware Only
- Software On-Demand
- Computing Environment
4. What are the two key components of HDFS and what are they used for?
- NameNode for block storage and Data Node for metadata.
- FASTA for genome sequence and Rasters for geospatial data.
- NameNode for metadata and DataNode for block storage.
5. What is the job of the NameNode?
- Coordinate operations and assigns tasks to Data Nodes
- Listens from DataNode for block creation, deletion, and replication.
- For gene sequencing calculations.
6. What is the order of the three steps to Map Reduce?
- Map -> Reduce -> Shuffle and Sort
- Map -> Shuffle and Sort -> Reduce
- Shuffle and Sort -> Reduce -> Map
- Shuffle and Sort -> Map -> Reduce
7. What is a benefit of using pre-built Hadoop images?
- Less software choices to choose from.
- Quick prototyping, deploying, and guaranteed bug free.
- Guaranteed hardware support.
- Quick prototyping, deploying, and validating of projects.
8. What are some examples of open-source tools built for Hadoop and what does it do?
- Zookeeper, analyze social graphs.
- Giraph, for SQL-like queries.
- Zookeeper, management system for animal named related components.
- Pig, for real-time and in-memory processing of big data.
9. What is the difference between low level interfaces and high level interfaces?
- Low level deals with storage and scheduling while high level deals with interactivity.
- Low level deals with interactivity while high level deals with storage and scheduling.
10. Which of the following are problems to look out for when integrating your project with Hadoop?
- Advanced Alogrithms
- Random Data Access
- Infrastructure Replacement
- Task Level Parallelism
- Data Level Parallelism
11. As covered in the slides, which of the following are the major goals of Hadoop?
- Facilitate a Shared Environment
- Enable Scalability
- Latency Sensitive Tasks
- Provide Value for Data
- Optimized for a Variety of Data Types
- Handle Fault Tolerance
12. What is the purpose of YARN?
- Allows various applications to run on the same Hadoop cluster.
- Enables large scale data across clusters.
- Implementation of Map Reduce.
13. What are the two main components for a data computation framework that were described in the slides?
- Node Manager and Applications Master
- Node Manager and Container
- Resource Manager and Node Manager
- Applications Master and Container
- Resource Manager and Container