In part one of this series we looked at how we can simulate loss frequency for values less then 1 (less then one occurrence per year). We generated a random value, compared it to 0.1000, and counted the number of times our random number was less then 0.1000. In some simulations the percentage of loss was less then or greater then 0.1000. As you will recall there are a couple of note-worthy shortcomings with the method outlined in part 1:
1. If the expected loss frequency is greater then 1 then using RAND() is not viable, because RAND() only generates values between 0 and 1.
2. In iterations where you had a loss event; method one does not reflect the actual number of loss events for that iteration. In reality, there could be some iterations (or years) where you have more then one loss event.
The focus of this post is too cover a method that is probably more appropriate for modeling information technology risk as it pertains to loss frequency and overcomes the limitations listed above.
For this method we need three things:
1. A probability distribution; preferably a discreet probability distribution; we’ll go with Poisson. Why poisson? (UMASS, VOSE)
a. In the binomial process, there are n discrete opportunities for an event (a ‘success’) to occur. In the Poisson process, there is a continuous and constant opportunity for an event to occur.
b. The event is something that can be counted in whole numbers; (no such thing as a partial loss event)
c. Loss events are independent, so that one occurrence neither diminishes nor increases the chance of another; (assumption in a lot of IT risk scenarios).
d. The average frequency of occurrence for the time period in question is known (in our case, could be based off evidence or subject matter estimates)
2. An expected loss frequency; like the first method, the expected loss frequency is 0.1000; or once every ten years we expect a loss event.
3. A random value. We will use Excel’s random number generator to generate a random value between 0 and 1.
How do these work together? In layman’s terms – given an expected loss frequency (0.1000) that is Poisson in nature – what is the integer or number of loss events (inverse cumulative value), given a probability (random value)?
You may want to download this Excel spreadsheet to reference for the rest of the post (it should work in Excel 2003, Excel 2007 and Excel 2010; I have not tested it on Office for Mac). Reference tab “loss 2”
Make sure you view it in Excel and NOT Google Apps.
a. The stated expected loss frequency is 0.1000 (cell B1; tab “loss 2”)
b. For purposes of demonstration, we will number some cells to reflect the number of iterations (A9:A1008; A9=1; A10=A9+1; drag A10 down to you get to 1000).
c. In Excel, we would use the =RAND() function to generate the random values in cells B9:B1008.
d. Now, in column C beginning in cell C9 – we are going to combine a Poisson probability distribution with our expected loss frequency and the random value(s) in column B to return an integer that is the number of loss events for that iteration. You would type the following in C9 and then drag C9 down to C1008:
=poisinv1(B9,$B$1)
i. Poisinv1 is a function to return the inverse cumulative value for a poisson random value. I used the SimTools VBA code to generate this value. In addition, this approach negates the need for you to have to install the SimTools Excel add-on.
ii. B9 is the random value we generated in step c.
iii. $B$1 is the expected loss frequency declared in step a.
e. Once you have values in all the cells, you can now look at how many iterations resulted in a loss and how many did not. Cell B2 counts the number of iterations you had a loss and cell B3 counts the number of iterations you did not have a simulated loss; their corresponding percentages are next to each other.
f. Cell B4 sums up the number of actual loss events for all 1000 iterations. You will notice that more then likely, the value in B4 is greater then the value in B2. This is the result of using the poisson probability distribution – there were some iterations where there was more then one loss event.
g. Take a look at cell B5 – the number in this cell is the maximum number of loss events for across our iterations. Given our input parameters – it will most likely be 2, maybe 3.
h. The pie chart shows the percentage and count for iterations that resulted in loss versus those that did not.
i. Press the F9 key; new random values will be generated. Every time you press F9 think of it as a new simulation with 1000 iterations. Press F9 lots of times and you will notice that in some simulations loss events occur greater then 10% of the time and in some simulations less then 10% of the time.
j. Like the first loss frequency simulation method, what you are observing is the effect of randomness. Over a large number of iterations and/or simulations we would expect the loss frequency to converge to 10%.
KEY TAKE-AWAYS
1. This method overcomes the shortcomings of the first method in that we can simulate the number loss events regardless of the expected loss frequency value and this method results in a few iterations where there was more then one loss event.
2. Given the stated input parameters and for those iterations where there was more then one loss event – look at the random value to the left of that cell (column B). More then likely the random value is 0.99xxxx. The higher the random value – the larger the returned integer will be. We often hear of “tail risk”. This method can help us better understand the “tail” of a risk distribution if we were to simulate a loss magnitude for every instance of a loss event.
In part three of this series we will look at simulating loss magnitude using randomness combined with an average loss amount and a standard deviation.
In part four we will combine simulated loss frequency and simulated loss magnitude into a single function.
Part five will be a wrap up.
Image may be NSFW.
Clik here to view.
Clik here to view.
