# Part 2: Should the Bank Buy Third-Party Credit Information? >> Week 6

## Part 2: Should the Bank Buy Third-Party Credit Information? >> Week 6 >> Mastering Data Analysis in Excel

Part 2: Should the Bank Buy Third-Party Credit Information? TOTAL POINTS 9 1. Question 1 Introduction Part 2 is intended to illustrate how binary classification performance metrics make it possible for you to put an exact value, in dollars per event, on new information that relates to a predictive model. Note that new information will be worth far more if it is compared to no forecasting model rather than the state of partial knowledge available from the current model. Sellers of information (and data science consultants!) love to take credit for any information gain they achieve over the base rate. Very often some intermediate state of knowledge is already available for which no additional spending is required. Evaluating the realistic incremental financial gain from new information, whether licensing a third-party commercial database or collecting new data internally, is therefore of great practical value, as this sets an upper bound on what your Company should be willing to pay to license or create the new information. In this case study, your boss has been in discussions with an advanced machine-learning predictive-analytics credit-risk analytics company that claims to score individual probability of default with very high information gain. Let’s call the company Eggertopia. Eggertopia sales representatives claim their pre-processed risk-scores can achieve AUC values as high as .85 or even higher. However, Eggertopia scores are sold per-event, and they are expensive! Your boss asks you to determine the incremental financial value to the bank of purchasing Eggertopia risk scores on future credit-card applicants. Eggertopia agrees to apply its algorithms to generate credit scores for the 400 individuals in the Training and Test Sets. Eggertopia scores do not need to be combined with anything else to make a model. However, since the scores range from approximately -600 (best credit risk) to 4900 (most likely to default) they will need to be standardized and adjusted to fit the -3.5 to 3.5 range of the AUC Calculator Spreadsheet (below) AUC_Calculator and Review of AUC Curve.xlsx You will determine the sustainable AUC of the Eggertopia scores, the sustainable cost-per-event, and the savings per event, when comparing Eggertopia data to the base rate forecast. You will then calculate the incremental savings per event if you compare use of Eggertopia data to use of your current model developed in Part 1. Question: What is the AUC of the Eggertopia Scores on the Training Set? Give your answer to two digits to the right of the decimal point. 1 point .83 .85 .88 .95 2. Question 2 What is the optimum threshold on the training set to minimize the average cost per test? 1 point .1 .25 .15 .2 3. Question 3 What is the average cost-per-event at the Training Set optimum threshold? 1 point $600$640 $540$500 4. Question 4 What is the AUC of the Eggertopia scores on the Test Set? 1 point .88 .80 .75 .85 5. Question 5 Using the same threshold as used on the training set, what is the cost per event of the Eggertopia scores on the Test Set? Round to the nearest dollar. 1 point $833$803 $838$823 6. Question 6 If the bank did not have your model, or any other way of forecasting default, what is the maximum (break-even) price per event that the bank could theoretically pay for Eggertopia scores? In other words, what are Eggertopia’s scores’ absolute savings-per-event? Hint: Calculate the difference between the cost-per-event at a 25% default rate, and the cost-per-event using Eggertopia scores 1 point $423$412 $418$425 7. Question 7 What is the True Positive rate of the forecasting model using Eggertopia Scores? 1 point .70 .74 .76 .72 8. Question 8 What is its Positive Predictive Value (PPV) of the forecasting model using Eggertopia scores? Hint: To calculate the PPV, divide the portion of True Positives by the total number of Positive Classifications. Review confusion matrix definitions and letter designations on the Information Gain Spreadsheet, [PPV is defined at Cell G41], obtain True Positive and False Positive Rates from the AUC Calculator Spreadsheet, and use algebra to solve. Information Gain Calculator.xlsx 1 point .54 .48 .50 .52 9. Question 9 Incremental Financial Value of Eggertopia Scores You calculated a cost per event for your own predictive model on Test Set data to answer Quiz 1 – Part 1, Question 6. Incremental Financial Value of Eggertopia Scores You calculated a cost per event for your own predictive model on Test Set data to answer Quiz 1 – Part 1, Question 6. Question: Assuming that the performance of the Eggertopia model and your model both remain stable on any future data (a big assumption), what is the maximum, or break-even, price that the bank could pay per score for Eggertopia, given that it already has your model and data? 1 point 700 Your answer cannot be more than 10000 characters.