# Mastering Data Analysis in Excel – Coursera Quiz Answers

## Excel Essentials Practice

1. Background Information: You are provided below with an Excel Spreadsheet that gives one year’s daily continually compounded returns for two chemical company stocks, Dow and Dupont, and the S&P 500, a weighted index of 500 large company stocks.

Excel Problem Type: Summing a column

Problem Information: Daily continuously compounded returns can be summed to obtain returns over longer time intervals. Sum the daily returns to calculate annual continuously compounded returns for 2010. Give each result in percent, rounded to two digits to the right of the decimal place – for example, 11.76%.

Solve: What is the Dow Chemical Annual return?
• 18.65%
• 23.23%
• 26.15%
• 20.51%

2. The Excel spreadsheet provided at the beginning of this practice quiz, gives one year’s daily continually compounded returns for two chemical company stocks, Dow and Dupont, and the S&P 500, a weighted index of 500 large company stocks. Use this spreadsheet to answer the question.

Excel Problem Type: Calculating correlation for a two-column array

Question: What is the correlation between daily continuously compounded returns for Dow Chemical and for the S&P 500 Index? Round your answer two digits to the right of the decimal place – for example, .84
• .57
• .78
• .79
• .48

3. The Excel spreadsheet provided at the beginning of this practice quiz, gives one year’s daily continually compounded returns for two chemical company stocks, Dow and Dupont, and the S&P 500, a weighted index of 500 large company stocks. Use this spreadsheet to answer the question.

Excel Problem Type: Identifying the maximum value in a column and sorting multiple columns while preserving rows.

Question: On what day in 2010 did Dow Chemical returns out perform S&P 500 Index returns the most?
• February 9, 2010
• October 25, 2010
• April 28, 2010
• February 1, 2010

4. The Excel spreadsheet provided at the beginning of this practice quiz, gives one year’s daily continually compounded returns for two chemical company stocks, Dow and Dupont, and the S&P 500, a weighted index of 500 large company stocks. Use this spreadsheet to answer the question.

Excel Problem Type: Using Excel “If” statements to determine how many days in 2010 Dow Chemical returns are higher than Dupont Returns.

Problem Information: Assuming Dow Chemical Returns are in Column B and Dupont Returns in Column C, the “If” statements will be of the form =IF(B3>C3, 1, 0).

Set up a column of “If” statements and then each day where Dow return > Dupont return will have a value of 1, otherwise 0.

Question: How many days out of the 252 trading days in 2010 did Dow outperform Dupont?
• 124
• 125
• 122
• 128

5. The Excel spreadsheet provided at the beginning of this practice quiz, gives one year’s daily continually compounded returns for two chemical company stocks, Dow and Dupont, and the S&P 500, a weighted index of 500 large company stocks. Use this spreadsheet to answer the question.

Excel Problem Type: Sorting multiple columns while preserving rows

Question: What was the fifth-worst performing day for the S&P 500 Index in 2010?
• May 20, 2010
• June 29, 2010
• February 4, 2010
• May 10, 2010

6. The Excel spreadsheet provided at the beginning of this practice quiz, gives one year’s daily continually compounded returns for two chemical company stocks, Dow and Dupont, and the S&P 500, a weighted index of 500 large company stocks. Use this spreadsheet to answer the question.

Excel Problem Type: Defining the Sharpe Ratio

Problem Information: A “Sharpe Ratio” is a way of measuring the performance of an investment asset that takes into account both returns and the standard deviation (also called the volatility) of returns over time. A stock’s Sharpe ratio is the difference between its returns and the return of a risk-free investment, such as a government bond, divided by the standard deviation of returns of the asset. For example, if a stock returns 15% per year, and the risk-free asset returns 3% per year, and the volatility of the stock is 18% per year, the Sharpe Ratio is 12%/18% = .67.

Question: Assume a risk-free asset returns 2% per year, and the standard deviation of returns of Dupont stock is 20%. What is the Sharpe Ratio for Dupont stock for 2010? Give the answer to two digits to the right of the decimal place.
• .84
• .93
• .88
• .83

7. Excel Problem Type: Optimization using the “Solver” plug-in

Problem Information: Assume that at a particular gas station, the quantity of automobile fuel sold in a week is a function of the fuel’s retail price.

The quantity of fuel sold in a week (in gallons) = (1,000 – 300x), where x is the price in dollars per gallon.

The function f(x) for revenues from weekly sales, in dollars, will equal x*(1000 – 300x) = 1000x – 300x^2.

Without using calculus or any other advanced math, the MS Solver plug-in can be used to find the input value for x that results in a maximum value for a function f(x). The price x is in the Solver “variable cell” and the function 1000x – 300x^2 is the Solver “objective.”

Question: What is the price x that maximizes weekly revenues?
• \$1.45 per gallon
• \$16.67 per gallon
• \$1.67 per gallon
• \$14.50 per gallon

8. The Excel spreadsheet provided at the beginning of this practice quiz, gives one year’s daily continually compounded returns for two chemical company stocks, Dow and Dupont, and the S&P 500, a weighted index of 500 large company stocks. Use this spreadsheet to answer the question.

Excel Problem Type: Scatter plots and trend line options

Solve: Generate a scatter plot that pairs the daily returns of Dow Chemical (y axis) “against” the S&P 500 returns (x axis). The slope of the regression line is also called “Beta.”

Question: What is Beta for Dow Chemical? Give the answer rounded two digits to the right of the decimal place.
• 1.55
• 1.00
• 1.62
• 1.66

## Excel Essentials

This spreadsheet contains monthly continuously compounded returns for two stock indexes – RSP and SPY – and two individual stocks – Amazon and Duke Energy – for the 12 years from May 2003 to May 2015.

Use Excel’s chart function to generate a scatter plot of SPY index monthly returns (y axis) against Amazon monthly returns (x axis)

When you use “trendline” option for slope, R-squared, and the y-intercept, double-check your results against the equivalent cell formula answers.

Question 1: What is the slope of the best-fit line (rounded to two decimal places)?
• 0.11
• 0.12
• 0.15
• 0.18

2. What is the coefficient of determination (R-squared)? Use the “rsq” Excel function (Trendline in Excel may give an inaccurate value for R-squared).
• 0.18
• 0.20
• 0.22
• 0.24

3. What is the Y-intercept, in percent? Use the “trendline” but double-check against the “intercept” function.
• 0.25%
• 0.35%
• 0.45%
• 0.55%

4. Answer Question 4 and 5 based on the information below:

The annual “Sharpe Ratio” is a metric that combines profitability and risk – it measures units of profitability per unit of risk.

First calculate the difference between the annual return of a stock and the annual return of a risk-free investment in government bonds. Second, divide that difference by the annualized population standard deviation of returns of the stock.

For example, if the annual return of a stock is 10%, the annual risk-free bond return is 2%, and the annualized population standard deviation of returns of the stock is 16%, then the Sharpe Ratio = 8%/16% = 0.5.

For this problem, you can estimate the annualized standard deviation of returns by multiplying your calculated value for the monthly population standard deviation of returns by the square root of 12.

Question 4: Assuming the risk-free rate is 1.5% per year over the full 12-year interval measured, which asset had the higher Sharpe ratio: SPY or RSP?
• SPY
• RSP

5. For the asset you chose in Question 4, what was the Sharpe ratio? Round your results to two decimal places.
• 0.56
• 0.53
• 0.50
• 0.48

6. In the month ending on which date did Amazon achieve the highest returns?

Note: Use “paste special” and choose “values and number formats” to keep return values from changing.
• September 1, 2010
• October 1, 2009
• April 2, 2007
• July 3, 2006

7. What was the monthly return from the question above?
• 22.9%
• 24.11%
• 43.27%
• 51.87%

8. What was Duke Energy’s return that same month?
• 0.51%
• 1.13%
• 3.04%
• 3.18%

9. Using the Solver plug-in (Solver Add-In) for Excel, answer Questions 9 and 10, based on the information below:

Between possible pricing of \$5 per pound to \$25 per pound, the quantity of coffee Egger’s Roast Coffee can sell each month is a linear function of the retail selling price per pound. The linear function is (quantity sold in pounds) = (-400*(Price per pound)) + 10,000.

Question 9: What is the revenue-maximizing selling price per pound for Egger’s Roast Coffee?

If this question is too challenging, there is another example below to review. This can also be found in “Course Resources” as a quick reference.

• \$5.00
• \$12.50
• \$13.50
• \$25.00

10. What is the monthly revenue at that price per pound? ( , indicates thousands)
• \$15,100
• \$62,500
• \$62,100
• \$40,000

## Binary Classification (practice)

1. What is one reason False Positive classifications were expensive in the Battle of Britain.
• Pilots needed more practice.
• German bombers were always present.
• German bombers would expect to be intercepted.
• Aviation fuel for fighter planes was scarce.

2. The portion of test outcomes that are True Negatives plus the portion that are False Negatives must equal:
• One, minus the classification incidence/test incidence
• The Negative Predictive Value (NPV)
• The False Negative (FN) Rate
• One, minus the condition incidence

3. Use the Cancer Diagnosis Spreadsheet to answer Questions 3 and 4.

This spreadsheet gives 10,000 pairs of scores – the level of a (fictional) cancer diagnostic protein in Column A – along with the actual condition: 1 = cancer, 0 = no cancer in Column C.

Change the cost per False Negative classification to \$20,000 [cell G3]. Change the cost per False Positive classification to \$1,000 [cell H3].

Question: What is the new minimum cost per event/cost per test (rounded to the nearest dollar)?
• \$183
• \$187
• \$181
• \$185

4. What is the lowest level of protein that should be classified “Positive” to achieve the minimum cost per test at the new costs per error given above?
• 18202.407
• 18213.7
• 18204.498
• 18204.545

5. Can a change in classification threshold change a diagnostic test’s True Positive Rate? Use logic – no need to calculate any numbers.
• No
• Yes

6. “Condition Incidence” is the portion of a population that actually has the Condition being studied. Can a change in threshold change the Condition incidence? Use logic – no need to calculate any numbers.
• No
• Yes

7. Does the change in threshold change the test’s “classification incidence” (also called “test incidence”)? Use logic – no need to calculate any numbers.
• No
• Yes

8. Does the change in threshold change the test’s Area under the ROC Curve? Use logic – no need to calculate any numbers.
• Yes
• No

What is the False Positive Rate if we use the sum of standardized height and standardized weight as the score; and set a threshold at -1.28?
• 0.4
• 0.6
• 0.33
• 0.67

1. A test for “driving while intoxicated” was given 100 times. 20 people tested were actually intoxicated, and 10 people were mis-classified as intoxicated. What would the False Positive rate be?
• 12.5%
• 10%
• 50%
• 30%

2. If a fire alarm malfunctions and fails to go off when there actually is a fire, that is a:
• False Negative
• False Positive
• True Positive
• True Negative

3. Use the Binary Classification Metrics Spreadsheet Definitions to answer the following:

If the “classification incidence/test incidence” is 10% for the whole population, and the true “condition incidence” is 12% for the whole population, the True Positive rate:
• must be 100%
• cannot be 100%
• can be 100%
• must be 0%

4. Use the Cancer Diagnosis Spreadsheet to answer Questions 4 to 6.

Keep the cost per False Positive test set at \$500. Use MS Solver to determine the maximum cost per False Negative test that permits an average cost per test of \$100.
• \$12,262
• \$17,082

5. Assume a cost of \$15,000 per False Negative (FN) and \$100 per False Positive (FP). What is the minimum average cost per test?
• \$1.00
• \$259,800
• \$25.98
• \$16,551

6.If, instead of assuming a cost \$15,000 per FN and \$100 per FP, the costs are assumed to be \$7,500 per FN and \$50 per FP, what changes?
• The minimum cost threshold of 16,551.930
• The True Positive Rate
• The False Positive Rate
• The minimum Cost per Test

7. Use logic and the definition in the Binary Performance Metrics Spreadsheet to answer the following question.

In general, increasing the cost per FN while keeping the cost per FP constant will cause the cost-minimizing threshold score to:
• Decrease
• Stay the Same
• Increase

8. Make a copy of the Bombers and Seagulls Spreadsheet to answer questions 8-10.

Modify the spreadsheet data so that there are 4 bombers instead of 3, and 16 seagulls instead of 17, by changing the actual condition for the radar score of 66 from a 0 to a 1 in cell D43.

What is the new Area Under the Curve:
• 0.72
• 0.78
• 0.824
• 0.75

9. Assuming the costs for classification errors are 5 million pounds per FN and 4 million pounds per FP, how much does changing the value at Cell D43 from 0 to 1 change the minimum cost per event?
• Increases by 950,000 pounds
• Increases by 5 million pounds.
• Unknown
• Increases by 250,000 pounds

10. Change the cost per FN to 50 million pounds. How does changing the data in cell D43 from a 0 to a 1 change the cost-minimizing threshold?
• Decreases it from 75 to 66.
• Decreases it from 75 to 62
• Decreases it from 75 to 70
• Increases it from 66 to 75.

11. Use the Binary Performance Metrics Spreadsheet definitions to answer the following question.

A population tested for “driving while intoxicated” has a Condition incidence of 20%. If the test has a true positive rate of 70% and a false positive rate of 10%, what is the test’s Positive Predictive Value (PPV)?
• 0.64
• 0.50
• 0.60
• 0.36

Rank the outcomes using soldier’s age as the score, with the oldest at the top. A threshold of 24 years represents what point on the ROC Curve?
• .33, .67
• .67, .33
• .5, .5
• .25, .75

## Using the Information Gain Calculator Spreadsheet (practice)

1. Using the information Gain Calculator, without changing any inputs in the confusion matrix, what is the conditional probability of getting a Positive Test, if you have a defective chip? Use the link below to access the spreadsheet. There is also an explanation about using the Information Gain Calculator that you may find helpful to review beforehand.
Information Gain Calculator.xlsx
• 37.5%
• 50%
• 25%
• 14%

2. The conditional probability of getting a Positive Test if you have a defective chip can be written p(Test POS | “+”). What is this probability called on the Confusion Matrix?
• The False Positive Rate
• The False Negative Rate
• The True Negative Rate
• The True Positive Rate

3. What is the remaining uncertainty or entropy of the test classification if we learn a chip is truly defective?
• 1 bit
• .9183 bits
• .8113 bits
• .5917 bits

4. What is the probability that a chip chosen at random from the assembly line is defective?

• .2
• .3
• .7
• .8

5. What is the conditional Probability of Getting a “Negative” Test classification if you have a non-defective chip?
• 75%
• 14%
• 25%
• 50%

6. The conditional probability of getting a Negative Test if you have a non-defective chip can be written P(Y = “NEG” | X = “-”). What is this probability called on the Confusion Matrix?
• True Positive Rate
• True Negative Rate
• False Negative Rate
• False Positive Rate

7. Challenging question: What is the remaining uncertainty, or entropy, of the Test Classification, if we know that a chip is not-defective?
• 1 bit
• .9183 bits
• .5917 bits
• .8113 bits

8. How frequently will a non-defective chip occur?
• .8
• .2
• .3
• .7

9. What is the expected, or average, uncertainty or entropy, remaining regarding a Test Outcome, give knowledge of whether or not a chip is defective?
• .8813 bits
• .8490 bits
• 1 bit
• .0323 bits

10.The optical scanner breaks down and begins to classify 30% of all chips as defective completely at random. What is the random test’s True Positive Rate and False Positive Rate?
• 30% and 70%
• 70% and 70%
• 30% and 30%
• 70% and 30%

1. Suppose we have two coins: one “fair” coin, where p(head) = p(tails) = .5; and an “unfair” coin where p(heads) does not equal p(tails). Which coin has a larger entropy prior to observing the outcome?
• The fair coin
• The unfair coin

2.If you roll one fair dice (6-sided), what is its entropy before the result is observed?
• 2.58 bits
• 0.46 bits
• 0.43 bits
• 2.32 bits

3. If your friend picks one number between 1001 to 5000, under the strategy used in video Entropy of a Guessing Game, what is the maximum number of questions you need to ask to find out that number?
• 13
• 12
• 10
• 11

4. Use the “Information Gain Calculator” spreadsheet to calculate the “Conditional Entropy” H(X|Y) given a = 0.4, c = 0.5, e = 0.11.
Information Gain Calculator.xlsx
• 0.97 bits
• 0.90 bits
• 1.87 bits
• 0.87 bits

5. On the “Information Gain Calculator” spreadsheet, given a = 0.3, c = 0.2, suppose now we also know that H(X,Y) = H(X) + H(Y). What is the joint probability e?
Information Gain Calculator.xlsx
• 0.5
• 0.04
• 0.06
• 0.3

6. Given a = 0.2, c = 0.5 on the Information Gain Calculator Spreadsheet suppose now we also know the true positive rate is 0.18. What is the Mutual Information?
Information Gain Calculator.xlsx
• 0.13 bits
• 0.72 bits
• 1.64 bits
• 0.08 bits

7. Consider the Monty Hall problem, but instead of the usual 3 doors, assume there are 5 doors to choose from. You first choose door #1. Monty opens doors #2 and #3. What is the new probability that there is a prize behind door #4?
• 0.67
• 0.5
• 0.2
• 0.4

8. Again, consider the Monty Hall problem, but with 5 doors to choose from instead of 3. You pick door #1, and Monty opens 2 of the other 4 doors. How many bits of information are communicated to you by Monty when you observe which two doors he opens?
• 1.52 bits
• 2.32 bits
• 0.80 bits
• 0.67 bits

9. B stands for “the coin is fair”, ~B stands for “the coin is crooked”. The p(heads | B) = 0.5, and p(heads | ~B) = 0.4. Your friend tells you that he often tests people to see if they can guess whether he is using the fair coin or the crooked coin, but that he is careful to use the crooked coin 70% of the time. He tosses the coin once and it comes up heads.

What is your new best estimate of the probability that the coin he just tossed is fair?
• 0.15
• 0.35
• 0.40
• 0.43

10. Suppose you are given either a fair dice or an unfair dice (6-sided). You have no basis for considering either dice more likely before you roll it and observe an outcome. For the fair dice, the chance of observing “3” is 1/6. For the unfair dice, the chance of observing “3” is 1/3. After rolling the unknown dice, you observe the outcome to be 3.

What is the new probability that the die you rolled is fair?
• 0.08
• 0.23
• 0.33
• 0.36