Part 3: Comparing the Information Gain of Alternative Data and Models >> Week 6 >> Mastering Data Analysis in Excel
1. Question 1 Comparing the Information Gain of Eggertopia Scores and Your Model Both the Eggertopia Scores and your binary classification model can be thought of as tools to reduce uncertainty about future default outcomes of credit card applicants. Your own model, developed in Part 1, identifies dependencies between, on the one hand, the six types on input data collected by the bank, and on the other hand, the binary outcome default/no default. If we assume that the dependencies identified by Eggertopia Scores and by your model on the Test Set are stable and representative of all future data (a big assumption) we can draw some further conclusions about how much information gain, or reduction in uncertainty, is provided by each. Definitions are given in the Information Gain Calculator Spreadsheet, provided below. Information Gain Calculator.xlsx Question: On your model’s Test Set results, what is the conditional entropy of default, given your test classifications? Hint: you need your model’s true positive rate from Part 1, Question 12, and “test incidence” [proportion of events your model classifies as default] from Part 1, question 13. Use the condition incidence of 25% and your model’s True Positive rate to calculate the portion of TPs. Then you have the inputs needed to use the Information Gain Calculator Spreadsheet. 1 point There is no exact answer of this questions. Your answer will depend on your model Your answer cannot be more than 10000 characters. 2. Question 2 Recall that the entropy of the original base rate, minus the conditional entropy of default given your test classification, equals the Mutual Information between default and the test. I(X;Y) = H(X) – H(X|Y). The population of potential credit card customers consists of 25% future defaulters. The base rate incidence of default (.25, .75) has an uncertainty, or entropy, of H(.25, .75) = .25*log4 + .75*log1.333 = .8113 bits. Question: On your test set results, what is the Mutual Information, or information Gain, in average bits per event? 1 point What do you think? Your answer cannot be more than 10000 characters. 3. Question 3 Recall that Percentage Information Gain (P.I.G.) is the ratio of I(X;Y)/H(X). Question: on your Test Set results, what is the Percentage Information Gain (P.I.G.) of your model? 1 point What do you think? Your answer cannot be more than 10000 characters. 4. Question 4 Since you have, for you model on the Test Set, a savings-per-event, and a bits-per-event (Mutual Information) you can calculate a savings-per-bit. This is a powerful concept, because it places a financial value directly on the information content of a model (or additional data source, like the Eggertopia scores). Question: How many dollars does the bank save, for every bit of information gain achieved by your model? 1 point What do you think? Your answer cannot be more than 10000 characters. 5. Question 5 Information Gain of Eggertopia Scores over the Base Rate For questions in this section, assume your model and the data it uses are not available – the bank’s choice is between Eggertopia scores and the base rate. Question: What is the Mutual Information of the Eggertopia Scores? In other words, on the Test Set, What is the information gain, in average bits per event, over the base rate of (.25, .75) offered by the Eggertopia Scores? 1 point .1255 bits per event .1243 bits per event .1205 bits per event .1305 bits per event 6. Question 6 On the test set, what is the Eggertopia scores’ Percentage Information Gain (PIG)? 1 point 14.85% 15.35% 13.95% 15.25% 7. Question 7 If Eggertopia data were free, and your model was unavailable, what would the dollar savings per bit of information extracted be? Dollar savings are $412 rounded to the nearest dollar- from quiz 2, question 6 1 point Value would be $3,427 per bit. Value would be $3,627 per bit. Value would be $427 per bit. 8. Question 8 Incremental Information Gain of Eggertopia Scores Compared to Your Model and Available Data (any answer scores) (For this section, assume your Model and the Data it uses are available). Question: What is the incremental information gain of the Eggertopia scores, over your model from Part 1, in average bits per event, if any? 1 point What do you think? Your answer cannot be more than 10000 characters. 9. Question 9 What is the maximum (break-even) price the bank should pay for Eggertopia scores, per score, if your model from Part 1 and data are already available? 1 point What do you think? Your answer cannot be more than 10000 characters. 10. Question 10 At the above maximum (break-even) price per score, what would be the value per bit of incremental information gained from the Eggertopia scores? Give your answer in $/bit. 1 point What do you think? Your answer cannot be more than 10000 characters.
*Please Wait 15 Seconds To Get The Pdf Loaded