Let’s start this series by defining risk. I am going to use the FAIR definition of risk which is: the probable frequency and probable magnitude of future loss. From a modeling perspective, I need at least two variables to model the risk for any given risk issue: a loss frequency variable and a loss magnitude variable. Hopefully, you are using a risk analysis methodology that deconstructs risk into these two variables…
The examples I am sharing in this blog series are an example of stochastic modeling. The use of random values as an input to a probability distribution ensures there is variation in the output; thus making it stochastic. The variable output allows for analysis through many different lenses; especially when there are additional (meaningful) attributes associated with any given risk issue (policy section, business unit, risk type, etc…).
Part 1 and 2 of this series will focus on “probable or expected [loss] frequency”. Frequency implies a number of occurrences over a given period of time. Loss events are discrete in nature; there are no “partial” loss events. So, when we see probable loss frequency values like 0.10 or 0.25 – and our time period is a year – we interpret that to mean that there is a 10% or 25% chance of a loss event in any given year. Another way of thinking about it is in terms of time; we expect a loss event once every ten years (0.10) or once every four years (0.25). Make sense?
You may want to download this Excel spreadsheet to reference for the rest of the post (it should work in Excel 2003, Excel 2007 and Excel 2010; I have not tested it on Office for Mac).
Make sure you view it in Excel and NOT Google Apps.
In a simulation, how would we randomly draw loss frequency values for a risk issue whose expected loss frequency is 0.10, or once every ten years? I will share two ways; the first of which is the remainder of this post.
For any simulation iteration, we would generate a random value between 0 and 1; and compare the result to the expected loss value
a. The stated expected loss frequency is 0.10 (cell B1; tab “loss 1”)
b. For purposes of demonstration, we will number some cells to reflect the number of iterations (A6:A1005; A6=1; A7=A6+1; drag A7 down to you get to 1000).
c. In Excel, we would use the =RAND() function to generate the random values in cells B6:B1005.
d. We would then compare the randomly generated value to the expected loss frequency value in cell B1; with this code in C6 dragged down to C1005:
=IF(B6<=$B$1,1,0)
i. If the generated random value in cell B6 is equal to or less then 0.1000 (cell B1), then the number of loss events for that iteration is 1.
ii. If the generated random value in B6 is greater then 0.1000, then the number of loss events for that iteration is 0
e. Once you have values in all the cells, you can now look at how many iterations resulted in a loss and how many did not. Cell B2 counts the number of iterations you had a loss and cell B3 counts the number of iterations you did not have a simulated loss; their corresponding percentages are next to each other.
f. The pie chart shows the percentage and count for each loss versus no loss.
g. Press the F9 key; new random values will be generated. Every time you press F9 think of it as a new simulation with 1000 iterations. Press F9 lots of times and you will notice that in some simulations loss events occur greater then 10% of the time and in some simulations less then 10% of the time.
h. What you are observing is the effect of randomness. Over a large number of iterations and/or simulations we would expect the loss frequency to converge to 10%.
i. Another thing worth mentioning, is that output from the RAND() function is uniform in nature. Thus, there is equal probability of all values between 0 and 1 being drawn for any given iteration.
j. Since our expected loss frequency is 0.1000 and the RAND() functions output is uniform in nature – we would expect to see 10% of our iterations result in loss; some were more and some were less.
There are some limitations with this method for simulating the loss frequency portion of our risk model:
1. If the expected loss frequency is greater then 1 then using RAND() is not viable, because RAND() only generates values between 0 and 1.
2. In iterations where you had a loss event; this method does not reflect the actual number of loss events for that iteration. In reality, there could be some iterations (or years) where you have more then one loss event.
Some of the first models I built used this approach for generating loss frequency values. There is usefulness regardless of its simplicity. However, there are other methods to simulate loss frequency that are more appropriate for modeling and overcome the limitations listed above. In the next post, we will use random values, a discreet probability distribution and the expected loss frequency value to randomly generate loss frequency values.
NOTES / DISCLAIMERS: I am intentionally over-simplifying these modeling examples for a few reasons:
1. To demonstrate that IT Risk modeling is achievable; even to someone that is not an actuarial or modeling professional.
2. To give a glimpse of the larger forest past some of the trees blocking our view within the information risk management profession.
3. As with any model – simple or complex – there is diligence involved to ensure that the right probability distributions and calculations are being used; reflective of the data being modeled.
4. In cases where assumptions are being made in a model; they would be documented.
