Thank you Paul, Sue, Gregor. I think I understand this problem now : )
Here is the definitive 11/97 Q20: (Feel free to add or challenge or question)
For a minute I'm gonna ignore the question that 11/97 Q20 is asking and focus
on interpreting the set up.
I wanted to calculate the DF as DF = (# of categories) - (# of estimated
parameters) - 1. This is appropriate when one is estimating parameters of
the distribution in addition to testing how well the sample fits the proposed
distribution. IN THIS CASE p (of the population) is known so use DF = (# of
categories) - 1.
>>>observe 100 risks, 80 have 0 claims, 20 have 1 claim. ==> this is the
sample, p-bar is .2, and is the sample probability of success (AKA the sample
mean). It is not directly relevant to this problem.
>>>Ho: number of claims per risk follow a Bernoulli with mean p ==> this
says assume we have a Bernoulli with parameter p, assume that this
population parameter is a known value. Which means p is not estimated from
the sample? This is the crux of my error. WHY ASSUME p IS KNOWN?
This is an *english* question, not statistics. I think we can assume p is
known for these reasons:
1) You get the correct answer (admittedly not very helpful in an exam
situation).
2) NOWHERE does the problem mention estimating p with the sample mean.
3) The question asks >>determine the smallest value of p for which Ho will
be accepted at the .01 significance level<< If the sample mean was to be used
(p-bar = .2) there would be NO need to determine the smallest value of p!
(if the sample mean WERE to be used, the question would probably ask whether
or not Ho is to be rejected)
Now on to the problem:
Set up a table with two categories (rows): 0 claims and 1 claim.
There's two columns observed and fitted
Observed comes from the sample, 0 claims = 80 and 1 claim = 20
Fitted are calculated values, * use p * , 0 claims = 100(1-p) and 1 claim =
100p
(IF p is not known, then use p-bar to estimate p)
Calculate the intermediate values (obs - fitted)^2/fitted for each
category.
The chi square statistic is the sum of these intermediate values.
Which gives a formula for the chi square statistic in terms of p.
The value of the chi square statistic must be less than 6.63, which is read
from the given table, 1DF, .01 significance level.
Set the formula for the chi square statistic to be less than 6.63.
Plug in test points from each interval given in the possible answers.
OR solve for p.
select answer E (accept Ho for p => .117 so the smallest value of p that
accepts Ho is E. at least .11)
An even more challenging question is trying to think of a situation where one
might actually use this particular procedure.... perhaps when trying to come
up with an initial first guess for a population parameter. For instance, in
the above it could be said that there is at least a 99% chance that the
population follows Bernoulli with p between 0.117 and 0.2. So if the senior
actuary was going to select a value to become the Official Value, it would
behoove him or her to select within that range.
Well, I had fun. I hope you did too.
-DP