Data Science 101
Data Science 101 >> Introduction to Big Data
1. Which of the following are parts of the 5 P’s of data science and what is the additional P introduced in the slides?
- Perception
- Product
- Platforms
- People
- Process
- Purpose
- Programmability
2.Which of the following are part of the four main categories to acquire, access, and retrieve data?
- Traditional Databases
- Text Files
- Web Services
- Remote Data
- NoSQL Storage
3. What are the steps required for data analysis?
- Investigate, Build Model, Evaluate
- Regression, Evaluate, Classification
- Classification, Regression, Analysis
- Select Technique, Build Model, Evaluate
4. Of the following, which is a technique mentioned in the videos for building a model?
- Investigation
- Validation
- Analysis
- Evaluation
5. What is the first step in finding a right problem to tackle in data science?
- Assess the Situation
- Define Goals
- Ask the Right Questions
- Define the Problem
6. What is the first step in determining a big data strategy?
- Build In-House Expertise
- Business Objectives
- Organizational Buy-In
- Collect Data
7. According to Ilkay, why is exploring data crucial to better modeling?
Data exploration…
- enables a description of data which allows visualization.
- leads to data understanding which allows an informed analysis of the data.
- enables understanding of general trends, correlations, and outliers.
- enables histograms and others graphs as data visualization.
8. Why is data science mainly about teamwork?
- Exhibition of curiosity is required.
- Data science requires a variety of expertise in different fields.
- Engineering solutions are preferred.
- Analytic solutions are required.
9. What are the ways to address data quality issues?
- Remove data with missing values.
- Remove outliers.
- Generate best estimates for invalid values.
- Merge duplicate records.
- Data Wrangling
10. What is done to the data in the preparation stage?
- Cleaning, Integrating, and Packaging
- Build Models
- Select Analytical Techniques
- Retrieve Data
- Identify Data Sets and Query Data