bayesian estimation and credibilityhome.cc.umanitoba.ca/~farhadi/asper/bayesian estimation and...
TRANSCRIPT
Bayesian Estimation and Credibility
πΘ|X(θ |x) =fX|Θ(x|θ)π(θ)∫fX|Θ(x|θ)π(θ)dθ
fY|X(y|x) =∫fY|Θ(y|θ)fX|Θ(x|θ)π(θ)dθ∫
fX|Θ(x|θ)π(θ)dθ
fY|X(y|x) =∫fY|Θ(y|θ)πΘ|X(θ |x)dθ
P(Y ∈ A|x) =∫P(Y∈A|θ)fX|Θ(x|θ)π(θ)dθ∫
fX|Θ(x|θ)π(θ)dθ
P(Y ∈ A|x) =∫P(Y ∈ A|θ)πΘ|X(θ |x)dθ
E(Y|x) =∫E(Y ∈ A|θ)πΘ|X(θ |x)dθ
P(Y ∈ A|s(x)) =∫P(Y∈A|θ)fS|Θ(s|θ)π(θ)dθ∫
fS|Θ(s|θ)π(θ)dθ
1
Example. Let X be a random variable Bernoulli(θ). Suppose that θ ∼ Beta(a , b). Calculate the
posterior pdf of θ .
Solution.
assumptions:
π(θ) = θ a−1(1−θ)b−1
B(a,b) 0 < θ < 1
fX|Θ(x|θ) = θ x (1−θ)1−x x = 0 , 1
joint density : fX ,Θ(x,θ) = fX|Θ(x|θ)π(θ) =θ a+x−1(1−θ)b−x
B(a,b)
marginal density : fX(x) =∫ 1
0fX ,Θ(x,θ)dθ =
∫ 1
0
θ a+x−1(1−θ)b−x
B(a,b)dθ
=B(a+x , b−x+1)
B(a,b)beta integral formula
posterior density : fΘ|X(θ |x) =fX ,Θ(x,θ)
fX(x)=
θ a+x−1(1−θ)b−x
B(a+x , b−x+1)∼ Beta(a+x , b−x+1)
Note. The calculations related to the Bayesian study is normally complicated as it requires the
calculation of the posterior distribution which mostly are difficult. However, for some classes of priors,
the posterir distribution belongs to the same class and just the metaparameters change. In such cases,
the computstion of the posterior pdf is done with ease.
Definition. A prior pdf f(θ |λ ) is said to be conjugate to the likelihood function f(x|θ) if the posterior
pdf is of the form f(θ |λ ) with the same functional form but with different hyperparameter λ .
Example. The previous example was based on one individual observation. Now calculate the posterior
pdf if there are n observations {X1 , ... , Xn}.
Solution. Let us denote {X1 , ... , Xn} by the bold X and denote {x1 , ... , xn} by the bold x. Then
2
fX|Θ(x|θ) =n
∏i=1
θ xi (1−θ)1−xi = θ ∑xi(1−θ)n−∑xi
fX ,Θ(x,θ) = fX|Θ(x|θ)π(θ) ={
θ ∑xi(1−θ)n−∑xi}{θ a−1(1−θ)b−1
B(a,b)
}=
cθ (a+nx)−1(1−θ)(b+n−nx)−1 c being a constant
fX(x) =∫ 1
0fX ,Θ(x,θ)dθ = c
∫ 1
0θ (a+nx)−1(1−θ)(b+n−nx)−1 dθ = cB(a+nx , b+n−nx)
fΘ|X(θ |x) =fX ,Θ(x,θ)
fX(x)=
θ (a+nx)−1(1−θ)(b+n−nx)−1
B(a+nx , b+n−nx)∼ Beta(a+nx , b+n−nx)
Corollary. Beta-Bernoulli (prior-likelihood) is a conjugate distribution.
Important Note. It is always true that
fΘ|X(θ |x) = c fX|Θ(x|θ)π(θ)
so there is no actual need to calculate the marginal pdf of X in order to find the functional form of
posterior distribution.
Example. Let X ∼ Binomial(m , θ) where m is fixed and θ ∼ Beta(a,b). Let {x1 , ... , xn} be observed.
Find the posterior distribution of θ .
Solution.
fΘ|X(θ |x) = cfX|Θ(x|θ)π(θ) = c{
θ ∑xi(1−θ)∑(m−xi)}{
θ a−1(1−θ)b−1}=cθ (a+nx)−1(1−θ)(b+mn−nx)−1
⇒ posterior distribution ∼ Beta(a+nx , b+mn−nx)
Corollary. Beta-Binomial (prior-likelihood) is a conjugate distribution.
3
Example (Gamma-Poisson). Suppose {X1 , ... , Xn} is an i.i.d. from Poisson(λ ) where
λ ∼ Gamma(α,β ). Then the posterior distribution of λ is Gamma(α , β ) where
α = α +nx β =β
1+βn
Example (Gamma-Exponential). Suppose {X1 , ... , Xn} is an i.i.d. from Exponential(λ ) where
λ ∼ Gamma(α,β ). Then the posterior distribution of λ is Gamma(α , β ) where
α = α +nx β =β
1+βnx
Example (Beta-Geometric). Suppose {X1 , ... , Xn} is an i.i.d. from Geometric(β ) where
β ∼ Beta(a,b). Then the posterior distribution of β is Beta(a , b) where
a = α +n b = b+nx
4
Bayesian Estimation
Suppose that some sample x = {x1 , ... , xn} has been collected, and we want to estimate
µXn+1(θ) = E(Xn+1|θ) after x has been observed. This value is a function g(θ) of θ (for example, it
might be θ 2), and we may estimate it using a point estimator similar to the way we did in the
non-Bayesian case where for example we used the MLE estimator. A point estimator, as in the case of
MLE, is a function θ(x1 , ... , xn) of the observed data set; for example in the case of MLE for an
exponential population it was x = x1+···+xnn . Similarly, for Bayesian approach to point estimation, we
have some function θ(x1 , ... , xn) of the observed data set , as the point estimator. In the MLE case, we
minimized the likelihood function (or equivalently, the log-likelihood function) to find this estimator,
but in Bayesian estimation , we have the square-error loss function
L(θ , g(θ)) =(
θ −g(θ))2
and we minimize the expected loss
E[(θ −g(θ))2 | x
]=∫(θ −g(θ))2 fθ |X(θ |x)dθ
Note that the value θ = θ(x1 , ... , xn) is a function of x1 , ... , xn (such as x) and therefore does not
depend on θ , hence we can simplify the expected loss as follows:
=∫(θ −g(θ))2fθ |X(θ |x)dθ =
∫ {g(θ)2 −2θ g(θ)+ θ 2
}fθ |X(θ |x)dθ
=
{∫g(θ)2fθ |X(θ |x)dθ
}−2θ
{∫g(θ)fθ |X(θ |x)dθ
}+ θ 2
But we know from algebra that a quadratic form x2 −2bx+ c is minimized if and only if x = b (just
differentiate with respect to x and set the derivative equal to zero). Therefore the expected loss is
minimized if and only if we have (viewing θ as the variable of interest)
θ =∫
g(θ) fθ |X(θ |x)dθ =∫
E(Xn+1|θ) fθ |X(θ |x)dθ = E[Xn+1 | x
]5
Definition. The Bayesian point estimation E[Xn+1 | x
]is called the Bayesian Premium.
Example. Let the model distribution for a single observation be Bernoulli(θ), and that θ ∼ Beta(a,b).
If the data x = {x1 , ... , xn} has been observed, calculate the Bayesian premium E(Xn+1|x).
Solution. We already learned that the posterior distribution of θ is
Beta(a , b) a = a+nx , b = b+n−nx
On the other hand, since Xn+1 ∼ Bernoulli(θ), we have E(Xn+1|θ) = θ . Therefore:
E(Xn+1 | x) =∫
E(Xn+1|θ)fθ |X(θ |x)dθ =∫
θ fθ |X(θ |x)dθ =
expected value of θ when posterior distribution is assumed =
expected value of θ when θ is assumed to follow Beta(a , b) =a
a+ b=
a+nxa+b+n
Example. Let the model distribution for a single observation be Binomial(2 , θ) and that
θ ∼ Beta(a,b). If the data x = {x1 , ... , xn} has been observed, calculate the Bayesian premium
E(Xn+1|x).
Solution. We already learned that the posterior distribution of θ is
Beta(a , b) a = a+nx , b = b+2n−nx
On the other hand, since Xn+1 ∼ Binomial(2 , θ), we have E(Xn+1|θ) = 2θ . Therefore:
E(Xn+1 | x) =∫
E(Xn+1|θ)fθ |X(θ |x)dθ = 2∫
θ fθ |X(θ |x)dθ =
2 times the expected value of θ when posterior distribution is assumed =
2 times the expected value of θ when θ is assumed to follow Beta(a , b) =2a
a+ b=
2a+2nxa+b+2n
Example (from the textbook). There are two types of driver. Good drivers make up 75% of the
population and in one year have zero claims with probability 0.7, one claim with probability 0.2, and
6
two claims with probability 0.1. Bad drivers make up the other 25% of the population and have zero,
one, or two claims with probabilities 0.5, 0.3, and 0.2, respectively.
(i) Describe this process and how it relates to an unknown risk parameter.
(ii) For a particular policyholder, suppose we have observed x1 = 0 and x2 = 1. Determine the
predictive distribution of X3|(X1 = 0 , X2 = 1) and the posterior distribution of
θ |(X1 = 0 , X2 = 1).
(iii) Calculate the Bayesian premium EX3|X1 = 0 , X2 = 1) in two ways.
Solution to part (i). When a driver buys this insurance, we do not know to which class (good driver or
bad driver) he/she belongs. Therefore, the risk parameter takes one of two values θ = G for good
drivers and θ = B for bad drivers. Corresponding to the above information, we have the following
table:
x P(X = x |θ = G) P(X = x |θ = B) θ P(θ)
0 0.7 0.5 G 0.75
1 0.2 0.3 B 0.25
2 0.1 0.2
Claim Probabilities Given State
State (B or G) Number of Claims
0 1 2 Total
G 0.7 0.2 0.1 1
B 0.5 0.3 0.2 1
Solution to part (ii).
7
fX(0,1) = ∑θ
fX1|Θ(0,θ) fX1|Θ(1,θ)π(θ)
= (0.7)(0.2)(0.75)+(0.5)(0.3)(0.25)
= 0.1425
fX,X3(0,1,0) = (0.7)(0.2)(0.7)(0.75)+(0.5)(0.3)(0.5)(0.25) = 0.09225
fX,X3(0,1,1) = (0.7)(0.2)(0.2)(0.75)+(0.5)(0.3)(0.3)(0.25) = 0.03225
fX,X3(0,1,2) = (0.7)(0.2)(0.1)(0.75)+(0.5)(0.3)(0.2)(0.25) = 0.01800
predictive distribution
fX3|X(0|0,1) = 0.09225
0.1425 = 0.647368
fX3|X(1|0,1) = 0.032250.1425 = 0.226316
fX3|X(2|0,1) = 0.018000.1425 = 0.126316
posterior distribution
π(G|0,1) = f(0|G) f(1|G)π(G)f(0,1) = (0.7)(0.2)(0.75)
0.1425 = 0.736842
π(B|0,1) = f(0|B) f(1|B)π(B)f(0,1) = (0.5)(0.3)(0.25)
0.1425 = 0.263158
Solution to part (iii).
We first calculate the (unobservable) hypothetical means: µ3(G) = (0)(0.7)+(1)(0.2)+(2)(0.1) = 0.4
µ3(B) = (0)(0.5)+(1)(0.3)+(2)(0.2) = 0.7
Using the formula E(Xn+1|X = x) =∫
xn+1 fXn+1|X(xn+1|x)dxn+1 we will have:
Bayesian premium E(X3|x1 = 0,x2 = 1) = (0)(0.647368)+(1)(0.226316)+(2)(0.126316)
= 0.478948
But, by using the formula E(Xn+1|X = x) =∫
µn+1(θ) πΘ|X(θ |x)dθ we get to:
E(X3|x1 = 0,x2 = 1) = (0.4)(0.736842)+(0.7)(0.263158) = 0.478947
8
Buhlmann Model
In this model we have an independent identically distributed process {X1 , ... , XN , XN+1 , ....} with
common mean and variance:
Hypothetical Mean : µ(θ) = E(X1|θ) = E(X2|θ) = · · ·
Process Variance : σ 2(θ) = Var(X1|θ) = Var(X2|θ) = · · ·
The portion {X1 , ... , XN} is used to forecast the future outcomes {XN+1 , Xn+1 , ....}. Now we define
the following quantities:
(1) Population mean: µ = E[µ(θ)] = E[E[Xt|θ ]]
(2) Expected Value of Process Variance: EPV = E[σ2(θ)] = E[Var[Xt|θ ]
](3) Variance of Hypothetical Means: VHM = Var[µ(θ)] = E
[(µ(θ)−µ)2
]If no prior information is available, then the population mean is used as an estimate for the expected
values of the Xt′s.
Example (from the Dean’s note). The number of claims Xt during the t-th period for a risk has a
Poisson distribution with parameter θ :
P[Xt = x] =θ xeθ
x!
The risk was selected at random from a population for which θ is uniformly distributed over the
interval [0,1]. It will be assumed that θ is constant through time for each risk.
(1) Hypothetical mean for risk with parameter θ is µ(θ) = E[Xt|θ ] = θ because the mean of the
Poisson random variable is the parameter θ .
(2) Process variance for risk with parameter θ is
σ 2(θ) = Var[Xt |θ ] = θ
because the variance equals the parameter θ for the Poisson.
9
(3) Variance of the Hypothetical Means (VHM) is
Var(
E[Xt |θ ])= Var(θ) = E(θ 2)−E(θ)2 =
∫ 1
0θ 2dθ −
(∫ 1
0θdθ
)2
=1
12
(4) Expected Value of the Process Variance (EPV) is
E[Var(Xθ |θ)
]= E[θ ] =
∫ 1
0θdθ =
12
(5) Unconditional Variance (or total variance) is
Var[Xθ ] =V HM+EPV =112
+12=
712
10
Derivation of the Credibility factor in Buhlmann Model
By setting X = X1+···+XNN = 1
N ∑Ni=1 Xi we have
E(X |θ) = E
[1N
N
∑i=1
Xi|θ
]=
1N
N
∑i=1
E(Xi|θ) =1N
N
∑i=1
µ(θ) = µ(θ)
So , in other words , X is an unbiased estimator for µ(θ) . Now we seek a and b so as to minimize the
expected value:
min E[a+bX −µ(θ)
]2
where the expectation is taking with respect to the joint distribution of (X1 , ... , XN , θ) .
For simplicity, set
Y = X −µ(θ)
Then X = Y+µ(θ) and of course we have
E(Y |θ) = E(X |θ)−E(µ(θ)|θ) = E(X |θ)−µ(θ) = 0
Now note that [a+bX−µ(θ)
]2=
[a+bY+(b−1)µ(θ)
]2
= (bY+ c(θ))2 c(θ) = a+(b−1)µ(θ)
= b2Y2 +2bc(θ)Y+ c(θ)2
Then
E[a+bX −µ(θ)
]2= b2E(Y 2)+2bE
[c(θ)Y
]+E
[c(θ)2
](1)
But:
E[c(θ)Y
]= E
[E[c(θ)Y |θ
]]= E
[c(θ)E
[Y |θ
]]= E
[c(θ)zero
]= E(zero) = 0
So the equality (1) reduces to
E[a+bX −µ(θ)
]2= b2E(Y 2)+E
[c(θ)2
](2)
11
To minimize this , we must set the partial derivative of it equal to zero:
∂∂a
= 2E[c(θ)
∂c(θ)∂a
]= 2E
(c(θ)
)= 2{
a+(b−1)E[µ(θ)]}= 2{
a+(b−1)µ}
Then
if∂
∂a= 0 ⇒ a = (1−b)µ
Next Step. Using the equality a = (1−b)µ we can now write the right-hand side of equation (2) as
b2E(Y2)+E[(1−b)2(µ(θ)−µ)2
]= b2E(Y2)+(1−b)2E
[(µ(θ)−µ)2
]
= b2E(Y2)+(1−b)2Var(µ(θ))
= b2E(Y2)+(1−b)2VHM (3)
Further note that
E(Y2) = E{
E[Y2|θ ]}
= E{
E[(X−µ(θ))2|θ
]}
= E{
Var[X|θ
]}= E
{1N Var
[X1|θ
]}
= 1N E{
Var[X1|θ
]}= 1
N EPV (4)
Putting this into (3) , the right-hand side of (3) reads:
b2 EPVN
+(1−b)2V HM
Now differentiate this with respect to b and set it equal to zero:
∂∂b
= 0 ⇒ 2bEPV
N−2(1−b)V HM
⇒ b =V HM
V HM+ EPVN
=N
N + EPVV HM
=N
N +Kwhere K =
EPVV HM
This quantity is denoted by Z:
Z =V HM
V HM+ EPVN
12
Then
a = (1−b)µ = (1−Z)µ
Then our estimate for µ(θ) will be
µ(θ) = a+bX = (1−Z)µ +ZX
Note. As we saw in the calculations in (4) we have:
E{
Var[X |θ
]}=
1N
EPV
Also:
Var{
E[X |θ
]}=Var
{E[ 1
N
N
∑i=1
Xi|θ]}
=Var{ 1
N
N
∑i=1
E[Xi|θ
]}=Var
{ 1N
N
∑i=1
µ(θ)}=Var(µ(θ))=V HM
Now by adding up these expressions , we get:
E{
Var[X |θ
]}+Var
{E[X |θ
]}=
EPVN
+V HM ⇒ Var(X) =EPV
N+V HM
Z =V HM
V HM+ EPVN
=Var(µ(θ))
Var(X)=
Variance of the Hypothetical MeansTotal Variance of the Estimator X
Also note that
K =E(Var[X |θ ])Var(E[X |θ ])
Example (from the Dean’s notes). Two risks have the following severity distributions and that Risk 1
is twice as likely to be observed as Risk 2.
Probability of Claim Probability of Claim
Amount of Claim Amount of Risk 1 Amount of Risk 2
250 0.5 0.7
2500 0.3 0.2
60000 0.2 0.1
13
A claim of 250 is observed. Determine the Buhlmann credibility estimate of the second claim amount
from the same risk.
Solution
Let us denote the claim amount by X.
Step 1. Calculate the variance of the hypothetical means :
E[X |Risk 1] = (0.5)(250)+(0.3)(2500)+(0.2)(60000) = 12875
E[X |Risk 2] = (0.7)(250)+(0.2)(2500)+(0.1)(60000) = 6675
E[X ] = (23)(12875)+(
13)(6675) = 10808.33
V HM = (23)(12875−10808.33)2 +(
13)(6675−10808.33)2 = 8542,222.2
Step 2. Calculate the expected value of the process variance :
Var[X |Risk 1] = (0.5)(250−12875)2+(0.3)(2500−12875)2+(0.2)(60000−12875)2 = 55,6140,625.0
Var[X |Risk 2] = (0.7)(250−6,675)2 +(0.2)(2500−6675)2 +(0.1)(60000−6675)2 = 316,738,125.0
EPV = (23)(556,140,625.0)+(
13)(316,738,125.0) = 476,339,791.7
K =EPVV HM
=476,339,791.7
8,542,222.2= 55.76
Z =N
N +K=
11+55.76
=1
56.76
Buhlmann credibility estimate = (1
56.76)(250)+(
55.7656.76
)(10,808.33) = 10,622
Example ∗. You are given the following:
(i) The number of claims made by an individual insured follows a Poisson distribution.
(ii) The expected number of claims, λ , for insureds in the population has the probability density
function
f (λ ) = 4λ−5 for 1 ≤ λ < ∞
14
Determine the value of the Buhlmann k used for estimating the expected number of claims for an
individual insured.
Solution. Here X denotes the number of claims.
E[X |λ ] = E(Poisson(λ )) = λ
E(λ 2) = 4∫ ∞
1λ 2λ−5 dλ = 4
∫ ∞
1λ−3 dλ =
4−2
λ−2]∞
1= 2
E(λ ) = 4∫ ∞
1λλ−5 dλ = 4
∫ ∞
1λ−4 dλ =
4−3
λ−3]∞
1=
43
Var(E[X |λ ]) = Var(λ ) = E(λ 2)−E(λ )2 = 2− 169
=29
Var[X |λ ] = Var(Poisson(λ )) = λ
E(Var[X |λ ]) = E(λ ) =43
K =E(Var[X |λ ])Var(E[X |λ ])
=4329
= 6
Definition. If the Bayesian estimate equals the Buhlmann estimate, then we say that the Buhlmann
credibility estimate has exact credibility.
Note. In the above examples of conjugate distributions exact credibility occurs. We verify it for the
Gamma-Poisson conjugate:
Example . Suppose {X1 , ... , Xn} is an i.i.d. from Poisson(λ ) where λ ∼ Gamma(α,β ). Calculate
both the Bayesian and Buhlmann estimates and verify the exact credibility for this case.
Solution.
Xi ∼ Poisson(λ ) ⇒ µ(λ ) = E[Xi|λ ] = λ
EPV = E[Var[Xi|λ ]
]= E[λ |] = αβ
15
VHM = Var[E[Xi|λ ]] = Var(λ ) = αβ 2
k =EPVVHM
=1β
z =n
n+k=
nn+ 1
β=
nβnβ +1
total expectation µ = E[E(Xi|λ )] = E(λ ) = αβ
Buhlmann credibility estimate zx+(1− z)µ =
(nβ
nβ +1
)x+(
βnβ +1
)=
β (nx+α)
nβ +1
Step 2.
Bayesian estimate = E[µ(λ ) | x] = E(λ | x) = αβ = (α +nx)(
βnβ +1
)=
β (nx+α)
nβ +1
Buhlmann credibility estimate = Bayesian estimate ✓
Exercise. Verify the exact credibility for each of the conjugate distributions:
Beta-Bernoulli
Beta-Binomial
Gamma-Exponential
Beta-Geometric
16
Buhlmann-Straub Model
The Buhlmann’s model cannot be applied to group insurances because that model does not allow for
changes in the number of insured members of the group. Therefore we appeal to the Buhlmann-Straub
model for such cases. In the Buhlmann-Straus Model we assume that there are n policy years and for
each year t there are mt exposures , and that Xt is the claim size , number of claims , or ...
per unit of exposure during period t. Note that the (loss, claim size, or number of claims) “per unit of
exposure” is used because the exposure can vary through time and from risk to risk.
So , if the aggregate claim size in year t is Yt , then we actually have Xt =Ytmt
. Note that since Xt
measures a quatity per unit of exposure, the Xt’s are no longer assumed to have the same distribution.
Risk Periods of Experience
1X11 X12 · · · X1N1
m11 m12 · · · m1N1
2X21 X22 · · · · · · · · · X2N2
m21 m22 · · · · · · · · · m2N2
......
......
RXR1 XR2 · · · · · · XRNR
mR1 mR2 · · · · · · mRNR
The number of periods of experience can vary by risk, and that the experience periods do not have to
start at the same time either.
Example (from the Dean’s lectures). ABC Insurance, Inc. sells dental insurance plans to companies
with fewer than one hundred employees. An actuary is analyzing the number of claims per employee.
Looking at the first company in her file, she sees that the company has three full years of plan coverage.
In the first year there were 40 employee-years with 84 claims, in the second year there were 44
employee-years with 88 claims, and in the third year there were 42 employee-years with 105 claims.
Designating this selected company as Risk 1, then:
X11 = 84 claims / 40 employee-years = 2.1 claims/employee-year
17
X12 = 88 claims / 44 employee-years = 2.0 claims/employee-year
X13 = 105 claims / 42 employee-years = 2.5 claims/employee-year
The exposures are m11 = 40 employee-years, m12 = 44 employee-years, and m13 = 42
employee-years. ■
18
Buhlmann-Straub Model for one policyholder when underlyingprobabilities are known
As in the Buhlmann’s model we assume that the conditional random variables X1|θ , X2|θ , ... are
independent. Further assumption is that the process variances , Var(Xt|θ) , are inversely proportional
to the size (i.e., exposure) of the risk during each observation period , in other words , the product
σ 2(θ) := mt Var(Xt |θ)
is constant (for all t).
Now we define the following quantities:
1. Hypothetical Mean for risk θ per unit of exposure:
µ(θ) = E(X1|θ) = E(X2|θ) = · · ·
2. Process Variance for risk θ :
Var(X1|θ) =σ2(θ)
m1· · · Var(Xt |θ) =
σ 2(θ)mt
· · ·
3. Population mean: µ = E[µ(θ)] = E[E[Xt|θ ]]
4. Expected Value of Process Variance: EPV = E[σ 2(θ)]
5. Variance of Hypothetical Means: VHM = Var[µ(θ)]
Example (Dean’s notes page 12). The annual numbers of claims for truck drivers in a homogeneous
population are independently and identically distributed. [The population might represent the work
force of a large trucking company with strict hiring standards and good safety training for each driver.]
For each driver the number of claims per year has a mean of µ(θ) and a variance of σ2(θ). (The θ
parameter applies to every driver in the group.).
19
A group of 10 drivers is selected from the larger population.
(1) What is the expected annual claims frequency for the group of 10 drivers?
(2) What is the variance of the annual claims frequency for the group?
Solution (from the Dean’s notes). Let X1t, X2t,..., X10t be random variables representing the number
of claims in year t for each of the ten selected drivers. Then Xt =110 ∑10
i=1 Xit is the annual claims
frequency for the group; that is, it is the annual number of claims per driver. The exposure is mt = 10
and the unit of exposure is one driver. The expected value and variance for the annual claims frequency
for the group are
E[Xt |θ ] = E
[110
10
∑i=1
Xit |
]=
110
10
∑i=1
E [Xit |] =1
10
10
∑i=1
µ(θ) = µ(θ)
Var[Xt |θ ] = Var
[110
10
∑i=1
Xit |θ
]=
1(10)2
10
∑i=1
Var [Xit |θ ] =1
100
10
∑i=1
σ 2(θ) =σ 2(θ)
10
In this example, the exposure is the number of drivers in the group, which is 10. The expected claims
frequency is the same whether there is one driver, 10 drivers, or 100 drivers in the group; however, the
variance in the groups claims frequency is inversely proportional to the number of drivers in the group.
■
In the Buhlmann-Straub model one seeks a point estimation for E[µ(θ)|X1 = x1 , ... , Xn = xn]. But as
we have argued before , this conditional expectation is the same as the conditional expectation
E[Xn+1|X1 = x1 , ... , Xn = xn]:
E[µ(θ)|X1 = x1 , ... , Xn = xn] = E[Xn+1|X1 = x1 , ... , Xn = xn]
We set:
X =N
∑i=1
(mt
m
)Xt where m =
N
∑t=1
mt
20
Then
E(X |θ) = E
(N
∑t=1
(mt
m
)Xt |θ
)=
N
∑t=1
(mt
m
)E(Xt |θ) =
N
∑t=1
(mt
m
)µ(θ) = µ(θ)
Var(X |θ) = Var
(N
∑t=1
(mt
m
)Xt |θ
)=
N
∑t=1
(mt
m
)2Var(Xt |θ)
=N
∑t=1
(mt
m
)2 σ2(θ)mt
=σ 2(θ)
m
The unconditional mean and variance of X are :
E[X ] = E[E[X |θ ]] = E[µ(θ)] = µ
Var[X ] = Var[E[X |θ ]]+E[Var[X |θ ]] = Var[µ(θ)]+E[σ 2(θ)]
m=V HM+
EPVm
In Buhlmann-Straub model , the credibility assigned to X (to estimate µ(θ)) is
Z =Variance of the Hypothetical Means
Total Variance of the Estimator
Upon simplifying , we get:
Z =m
m+K
where the value K is defined by:
K =EPVV HM
The credibility estimate is
µ(θ) = Z · X +(1−Z) ·µ
Note. The Buhlmann’s Model is a special case of the Buhlmann-Straub Model with mt = 1 for all time
t.
21
Buhlmann-Straub Model for more-than-one policyholder (nonparametricestimation)
Here we have r group policyholders and for each group i we have ni policy years ; the start of the years
for different groups may differ. We adopt the following notations:
Xit = the average loss/claim for policyholder i in year t:
Xi = (Xi1 , · · · , Xini)
mit denote the number of exposure units for policyholder i in year t:
The total number of exposure units over all years for each group i is
mi =T
∑t=1
mit
The total exposure units for all policyholders over all years is
m =r
∑i=1
mi
The average loss experience of policyholder i over all the years
Xi =1mi
ni
∑t=1
mitXit
The overall average losses is
X =1m
r
∑i=1
miXi
Assumptions:
1. The random vectors {X1 , ... , Xr} are assumed to be mutually statistically independent.
2. The distribution of each vector Xi depends on a risk parameter θi , and we assume that the
random variables {θ1 , ... , θr} form an i.i.d.
22
3. within any group i , the variables
Xi1|θi , ... , Xini |θi
are independent.
Set
µ(θi) = E[Xit |θi]
so, for each group , the hypothetical means are constant over time. Here we have:
σ2(θi) = mitVar(Xit |θi)
µ = E[µ(θi)]
EPV = E[σ 2(θi)]
VHM = Var[µ(θi)]
We are going to estimate these parameters , which are called structural parameters.
Unbiased Estimation for µ :
µ = X
Unbiased Estimation for σ2(θi) :
σ 2i =
1ni −1
ni
∑t=1
mit(Xit − Xi)2
Unbiased Estimation for EPV :
EPV =r
∑i=1
wi σ 2i =
1∑r
i=1(ni −1)
r
∑i=1
ni
∑t=1
mit(Xit − Xi)2 wi =
ni −1∑r
i=1(ni −1)
23
Unbiased Estimation for VHM :
V HM =m
m2 −∑m2i
{r
∑i=1
mi(Xi − X)2 − (r−1)EPV
}
If we set
k =EPV
V HMzi =
mi
mi + k
then the credibility estimate for the credibility premium
E[Xi,n+1|Xi,1 = xi,1 , ... , Xn,i = xn,i]
and for µ(θi) is
ZiXi +(1− Zi)X
Estimate of the premium for policyholder i is :
mi ,ni+1
(ZiXi +(1− Zi)X
)(i) Determine the number of periods ni for each of the policyholders.
(ii) Determine the exposure measure mit for each policyholder i during each period t.
(iii) Calculate the claim amounts xit.
(iv) Calculate the average claim amounts xi for each policyholder over all periods.
(v) Calculate the estimated µ = x.
(vi) Calculate the estimated EPV =r∑
i=1wi σ 2
i wi =ni−1
∑ri=1(ni−1)
(vii) Calculate the estimated VHM = mm2−∑m2
i
{r∑
i=1mi(Xi − X)2 − (r−1)EPV
}(viii) Calculate k = EPV
VHM
24
(ix) Calculate the credibility factors: zi =mi
mi+k
(x) Calculate the average claim amount per exposure unit for policyholder i:
ZiXi +(1− Zi)X
(xi) Calculate the aggregate claim amount for (policyholder) group i:
mi ,ni+1
(ZiXi +(1− Zi)X
)
Example. The aggregate claim amount for two groups over three years are given in the following table:
Policy Year
Group ↓ 1 2 3 4
1Aggregate Claim 8,000 11,000 15,000 ?
size of group 40 50 70 75
2Aggregate Claim 20,000 24,000 19,000 ?
size of group 100 120 115 95
Estimate the aggregate claim amount to be observed during the fourth year for each group.
Solution.
Group 1. Exposure measures
m11 = 40 , m12 = 50 , m13 = 70.
m1 = 40+50+70 = 160
Average claim amounts:
x11 =8,000
40= 200 x12 =
11,00050
= 220 x13 =15,000
70= 214.29
x1 =8,000+11,000+15,000
160= 212.50
25
Group 2. Exposure measures
m21 = 100 , m22 = 120 , m23 = 115.
m2 = 100+120+115 = 335
Average claim amounts:
x21 =20,000
100= 200 x22 =
24,000120
= 200 x23 =19,000
115= 165.22
x2 =20,000+24,000+19,000
335= 188.06
Overall exposure units for the first three years:
m = m1 +m2 = 160+335 = 495
Estimate for overall mean:
µ = x =m1x1 +m2x2
m=
(160)(212.50)+(335)(188.06)495
= 195.96
Estimate of the EPV:
EPV =
2∑
i=1
3∑
j=1mij(xij − xi)
2
2∑
i=1(3−1)
=40(200−212.5)2 +50(212−212.5)2 +70(214.29−212.5)2 +100(200−188.06)2 + · · ·
2+2
= 25160.58
r
∑i=1
mi(Xi − X)2 = (160)(212.5−195.96)2 +335(188.06−195.96)2 = 64678.806
26
mm2 −∑m2
i=
1m− 1
m ∑m2i=
1
495− 1495
{(160)2 +(335)2
} = 0.0046
VHM =m
m2 −∑m2i
{r
∑i=1
mi(Xi − X)2 − (r−1)EPV
}= 0.0046
{64678.806− (1)(25160.58)
}= 182.48
k =EPV
VHM=
25160.58182.48
= 137.88
the credibility factors for the two policyholders:
z1 =m1
m1 + k=
160160+137.88
= 0.537
z2 =m2
m2 + k=
335335+137.88
= 0.708
Buhlmann-Straub estimates of the average claim amounts per exposure unit:
Z1X1 +(1− Z1)X = (0.537)(212.50)+(0.463)(195.96) = 204.84
Z2X2 +(1− Z2)X = (0.708)(188.06)+(0.292)(195.96) = 190.37
The aggregate claim amount for each of the two groups:
m1 ,4
(Z1X1 +(1− Z1)X
)= (75)(204.84) = 15363.00
m2 ,4
(Z2X2 +(1− Z2)X
)= (95)(190.37) = 18085.15
Example. The aggregate claim amount for two groups over three years are given in the following table:
27
Policy Year
Group ↓ 1 2 3 4
1Aggregate Claim —- 11,000 15,000 ?
size of group —- 50 70 75
2Aggregate Claim 20,000 24,000 19,000 ?
size of group 100 120 115 95
Estimate the aggregate claim amount to be observed during the fourth year for each group.
Solution.
Note that there is no data available for policyholder 1 for the first year, so the calculations would start
like this:
m11 = 50 , m12 = 70.
m1 = 50+70 = 120
Average claim amounts:
x11 =11,000
50= 220 x12 =
15,00070
= 214.29 x13 =15,000
70= 214.29
x1 =11,000+15,000
120= 216.67
students will do the rest.
Note. In some situations we might have VHM ≤ 0. In this case we set VHM ≤ 0 which then results in
k =+∞ and then Z = 0.
Example (from the Dean’s notes - page 20). Two risks were selected at random from a population.
Risk 1 had 0 claims in year one, 3 claims in year two, and 0 claims in year three: (0 , 3 , 0). The claims
28
by year for Risk 2 were (2 , 1 , 2). In this case, R = 2 and N = 3. Use the Buhlmann’s model to estimate
the expected number of claims per year for each risk for the fourth year.
Solution.x1 =
0+3+03 = 1
x2 =2+1+2
3 = 53
x =1+( 5
3 )
2 = 43
σ2
1 = (0−1)2+(3−1)2+(0−1)2
3−1 = 3
σ21 =
(2− 53 )
2+(1− 53 )
2+(2− 53 )
2
3−1 = 13
EPV =σ2
1+σ22
2 =3+ 1
32 = 5
3
V HM =1
2−1
{(1− 4
3)2 +(
53− 4
3)2}−
533= −1
3
this happened to be negative, so we make it zero
Then
Z = 0
29
Semiparametric estimation
It may be possible to have information about the conditional distribution fXij|Θi(x|θi) of the loss
variables. For example, in our study, Xij may be the number of claims per exposure unit, and the
number of claims for policyholder i is then mijXij, and this might be distributed as Poisson(mijθi). Then E[mijXij |θi] = mijθi
Var[mijXij |θi] = mijθi
⇒
µ(θi) = E[Xij |θi] = θi
σ2(θi) = mijVar[Xij |θi] = θi
take expectation⇒ µ = EPV
According to this equality and the fact that X is the MLE and unbiased for µ , we approximate EPV by
X.
Special case of Bulmann credibility (uniform exposures)
In the special case of B ulmann model (uniform exposures) the assumption is this: there is an i.i.d.
{X1 , ... , Xn} with Xi ∼ Poisson(θ). In this case, the same as above we have µ = EPV , so we use x as
the estimate for EPV
EPV = x
Furthermore:
Law of Total Variance ⇒ Var(Xi) = EPV+VHM ⇒
VHM = Var(Xi)−EPV ≈ s2 − x where s2 =1
n−1
n
∑i=1
(Xi − X)2
V HM = s2 − x
Once the estimations v and VHM have been calculate, then we calculate K and z. Then the
semiparametric estimate will be (bearing in mind that µ = x)
z(average of the values used for prediction)+(1−z)(average of the values used to get the structural parameters)
Example (SOA sample question #240). For a group of auto policyholders, you are given:
30
(i) The number of claims for each policyholder has a conditional Poisson distribution.
(ii) During Year 1, the following data are observed for 8000 policyholders:
Number of Claims Number of Policyholders
0 5000
1 2100
2 750
3 100
4 50
5+ 0
A randomly selected policyholder had one claim in Year 1.
Calculate the semiparametric empirical Bayes estimate of the number of claims in Year 2 for the same
policyholder.
Solution. In here we have {X1 , ... , X8000} with the conditional distribution Xi|θ ∼ Poisson(θ). We
want to estimate E[Xnew|Xold = 1]. The prior distribution is not given.
EPV = x =(5000)(0)+(2100)(1)+(750)(2)+(100)(3)+(50)(4)
8000= 0.5125
s2 =(5000)(0−0.5125)2 +(2100)(1−0.5125)2 +(750)(2−0.5125)2 +(100)(3−0.5125)2 +(50)(4−0.5125)2
8000=
0.5874
VHM = s2 − x = 0.5874−0.5125 = 0.0749
k =EPV
VHM=
0.51250.0749
= 6.8425
The prediction is being done based on N = 1 observation.
z =N
N+k=
11+6.8425
= 0.1275
31
Bayes estimate = z(1)+(1− z)(0.5125) = (v)(1)+(1−0.1275)(0.5125) = 0.5747
32
Talk about Regression on page 21
33
Limited Fluctuation Credibility
Limited Fluctuation Credibility (also called the classical approach):
Update the prediction of loss, as a weighted average of the prediction based on recent data and the rate
taken from the insurance manual.
Limited Fluctuation Credibility
Full Credibility : the updated prediction is based on recent data only
Partial Credibility : the recent data is insufficient for updating prediction
We apply credibility theory to these measures:
• (i): Claim Frequency N.
• (ii): Aggregate Loss S.
• (iii): Claim Severity The average claim severity is SN .
• (iv): Pure Premium If E denotes the number of exposure units, then the quotient SE is called the
pure premium.
Note. The claim frequency N is random, but number of exposure units E is fixed over time (like the
number of workers covered for work compensation plan).
Note. If the predicted loss value based on the companies manual is denoted by M, and the predicted
value based on the recent data is denoted by D, then the updated prediction is some weighted
combination
ZD+(1−Z)M
The value Z is called the credibility factor. If Z = 1, then we say that full credibility has been
obtained. If 0 < Z < 1, then partial credibility has been obtained.
In the classical credibility approach, the minimum size of data required for full credibility is called
standard for full credibility.
34
Full credibility for claim frequency
Convention. In the classical credibility, we say that full credibility has been reached if there is a large
probability of p (large enough to give enough confidence) that the relative error N−E(N)E(N) is small in
absolute value,∣∣∣N−E(N)
E(N)
∣∣∣< r , r a small number, with a probability of at least p.
p ≤ P(∣∣∣∣N−E(N)
E(N)
∣∣∣∣< r)= P(|N−E(N)|< rE(N)) = P
(∣∣∣∣N−E(N)
s(N)
∣∣∣∣< rE(N)
s(N)
)= P(|N(0,1)|< rµ
σ)
⇒ z 1+p2
<rµσ
In particular, if N ∼ Poisson(λ ), then
z 1+p2
<rλ√
λ= r
√λ
Therefore, full credibility is attained if
λ ≥(z 1+p
2
r
)2
In practice, if n is the observed number of claims under the assumption of Poisson, then for full
credibility we check for
n ≥(z 1+p
2
r
)2
Example. An insurance company wants to assign full credibility to 800 claims or more. What is the
required coverage probability for the number of claims to be within 8% of the true value. Assume that
the claims frequency is Poisson and normal approximation applies (i.e. λ is large).
Solution.
800 =
( z 1+p2
0.08
)2
⇒ z 1+p2
= 2.2627 ⇒ p = 97.63%
Example. Recent experience has given the mean accident rate to be 0.045 and the standard for full
35
credibility of claims to be 1200. For a group with similar risk, what is the number of exposure units for
full credibility?.
Solution. The standard for full credibility based on exposure unit:
12000.045
= 26,667 exposure units
Full credibility for claim severity
Here we assume an i.i.d. {X1 , ... , Xn} of severity random variables with mean µ and variance σ 2. In
this case, we say that full credibility is attained if
p ≤ P(∣∣∣∣ X−E(X)
E(X)
∣∣∣∣< r)
But, E(X) = µ and s(X) = σ√n . So we can write:
p ≤ P(∣∣∣∣ X−E(X)
E(X)
∣∣∣∣< r)= P(|X−E(X)|< rE(X)) = P
(∣∣∣∣ X−E(X)
s(X)
∣∣∣∣< rE(X)
s(X)
)
= P(|N(0,1)|<√
n rµσ
)
⇒ z 1+p2
≤√
n rµσ
⇒ n ≥(z 1+p
2
r
)2(σµ
)2
Example. Suppose that the estimates for mean and variance of the severity are 1000 and 2,000,000
respectively. Find the standard of full credibility for p = 0.99 and r=0.05 .
Solution.
z 1+p2
= z0.995 = 2.5758
standard =
(z 1+p2
r
)2(σµ
)2
=
(2.5758
0.05
)2 σ 2
µ2 =
(2.57580.05
)2 2,000,000(1000)2 = 5308
36
Full credibility for aggregate loss
Here we have S = X1 + · · ·+XN , where the Xi’s have common mean µX and common variance σ 2X. If
N is Poisson, then µS = µNµX = λ µX
σ2S = λ (µ2
X +σ 2X)
Criterion for full credibility is:
p ≤ P(∣∣∣∣S−E(S)
E(S)
∣∣∣∣< r)
Similar to the previous section, one gets:
p ≤ P(∣∣∣∣S−E(S)
E(S)
∣∣∣∣< r)= P(|S−E(S)|< rE(S)) = P
(∣∣∣∣S−E(S)s(S)
∣∣∣∣< rE(S)s(S)
)
= P(|N(0,1)|< rµ(S)
σ(S)
)= P
|N(0,1)|< rλ µX√λ ((µ2
X +σ2X))
⇒ z 1+p2
≤ rλ µX√λ ((µ2
X +σ 2X))
⇒ λ ≥(z 1+p
2
r
)2{
1+(
σµ
)2}
In practice, if n is the observed number of claims under the assumption of Poisson, then for full
credibility we check for
n ≥(z 1+p
2
r
)2{
1+(
σµ
)2}
Note. This expression on the right-hand side is:
(z 1+p2
r
)2{
1+(
σµ
)2}
=
(z 1+p2
r
)2
+
(z 1+p2
r
)2(σµ
)2
So for Poisson claim distribution we have :
37
standard for full credibility of aggregate loss =
standard for full credibility of claim frequency + standard for full credibility of claim severity
Full credibility for pure premium
Pure premium P = SE , where the number of exposure units, E, is a constant, is the premium charged to
cover losses before taking into consideration the profits and expenses. Since P and S differ by a
constant only, we have µPσP
= µSσS
. Therefore in the expression
p ≤ P(∣∣∣∣S−E(S)
s(S)
∣∣∣∣< r E(S)s(S)
)one can substitute µP
σPfor µS
σS, and then the standard for full credibility of the pure premium would be the
same as that of the aggregate loss. See problem 2 of the SOA sample questions.
Partial Credibility
Assume that W is any of the loss measures claim frequency, claim severity, or aggregate loss. When
the risk group is not large enough, the full credibility may not be achieved in which case a combination
ZW+(1−Z)M is taken, where 0 < Z < 1. The number Z is determined such that the event
|ZW−ZE(W)| ≤ rE(W) occur with probability p. For example, for the case of claims frequency N we
want to have
p = P(|ZN−ZE(N)|< rE(N)) = P(∣∣∣∣N−E(N)
s(N)
∣∣∣∣< rE(N)
Zs(N)
)= P(|N(0,1)|< rµ
Zσ)
⇒ z 1+p2
=rµZσ
=rλ
Z√
λ=
r√
λZ
⇒ Z =
(r
z 1+p2
)√
λ
For other two cases :
38
claim frequency : Z =
(r
z 1+p2
)√λ =
√λ
standard =√
the thing to be compared with the standard for full credibilitystandard
claim severity : Z =
(r
z 1+p2
)√N
C2X=√
Nstandard =
√the thing to be compared with the standard for full credibility
standard
CX being coefficient of variation
aggregate loss : Z =
(r
z 1+p2
)√λ
1+C2X=√
λstandard =
√the thing to be compared with the standard for full credibility
standard
Example (exercise 17.7 of the textbook ∗). The average claim size for a group of insureds is 1,500
with a standard deviation of 7,500. Assume that claim counts have the Poisson distribution. Determine
the expected number of claims so that the total loss will be within 6% of the expected total loss with
probability 0.90.
Solution.
z 1+p2
= z0.95 = 1.645
(z 1+p2
r
)2{
1+(
σµ
)2}
=
(1.6450.06
)2{
1+(
75001500
)2}
= 19543.51 ⇒ 19544 claims
Example (exercise 17.8 of the textbook ∗). A group of insureds had 6,000 claims and a total loss of
15,600,000. The prior estimate of the total loss was 16,500,000. Determine the limited fluctuation
credibility estimate of the total loss for the group. Use the standard for full credibility determined in
the previous example.
Solution.√λ
standard=
√6000
19543.51= 0.55408
ZW+(1−Z)M = (0.55408)(15,600,000)+(1−0.55408)(16,500,000) = 16,001,328
Example. A portfolio of policies has 896 claims in the current period with mean loss of 45 and
39
variance being 5067. Full credibility is based on a coverage probability of 98% for a range of within
10% of the true mean. The mean frequency of claims is 0.09 per policy and the portfolio has 18600
policies. Calculate Z for the claim frequency, claim severity, and aggregate loss.
Solution.
Part 1.
expected claim frequency for the portfolio = (18600)(0.09) = 1674
z 1+p2
= z0.99 = 2.3263
Full credibility standard for claim frequency:
(z 1+p2
r
)2
=
(2.3263
0.1
)2
= 541.17 < λ = 1674 ⇒ full credibility for claim frequency ✓
Part 2.
Coefficient of variation for claim severity:
CX =
√506745
= 1.5818
Full credibility standard for claim severity:
standard for claim frequency times C2X = (541.17)(1.5818)2 = 1354.13 > 896
⇒ partial credibility for claim severity ✓
Now we calculate the partial credibility factor for claim severity:
Z =
√896
1354.13= 0.8134
Part 3.
40
The full credibility standard for aggregate loss:
sum of two standards found above = 541.17+1354.13 = 1895.30 > λ = 1674 ⇒
partial credibility for aggregate loss
Partial credibility factor for aggregate claim:
Z =
√1674
1895.30= 0.9398
Example (exercise 17.13 of the textbook ∗). The number of claims has the Poisson distribution. The
number of claims and the claim severity are independent. Individual claim amounts can be for 1, 2, or
10 with probabilities 0.5, 0.3, and 0.2, respectively. Determine the expected number of claims needed
so that the total cost of claims is within 10% of the expected cost with 90% probability.
Solution.
E(X) = (0.5)(1)+(0.3)(2)+(0.2)(10) = 3.1
E(X2) = (0.5)(1)2 +(0.3)(2)2 +(0.2)(10)2 = 21.7
Var(X) = E(X2)−E(X)2 = 21.7− (3.1)2 = 12.09
z 1+p2
= z0.95 = 1.645
(z 1+p2
r
)2{
1+(
σµ
)2}
=
(1.6450.01
)21+
(√12.093.1
)2= 611.04 ⇒ 612 claims
41