Let's think about this in terms of "4B".
Let P be the portion of your population (your total data) that will fail
under Y2k.
So given a random element of data, the chance of its failure is P.
Given a set of N data element taken at RANDOM - the # of failures will be
a binomial.
4B teaches us we determine a confidence interval for the true value of P,
based on our sample - but I don't think that's what you want.
I think the normal approximation to the binomial would give:
E[P] = F/ N
Prob ( P within interval E[P]+/- 1.96 * F*(N-F)/ (N^3) ) ~ 95%.
So if you had 32 failures in your sample of 3200, you could estimate
for the population [.007<p<.013].
But I think you are hoping to see ZERO failures, and would like the
confidence assigned to P=0. The confidence interval formulas don't work
well for that!
A second approach would try setting a null hypothesis
Ho: there ARE Y2k failures in the data
and try to reject it. You won't like this result -- you'd need 90%+
of the data!
So here's a more realistic (but complicated and subjective) Bayesian
approach:
Estimate the Prior distribution of P,
perhaps you and your company believes there is:
20% of P=1.000 (Total system collapse)
20% of P=0.100 (One major section of data flawed)
20% of P=0.010 (One interaction of fields flawed)
20% of P=0.001 (One or more errant records)
and 20% of P=0.000 (Serendipitous Y2K compliance)
Now, the question becomes after how many error-free samples does the P=0.000
case have a 90% posterior probability?
In this case the answer is just 2,200 samples, but it depends heavily on
the choices made, especially the probability + prior assigned to "one or
more errant records".
For example if you used 20% of P=0.0001 you would need 22,000 samples.
[Just as a disclaimer, this is just a discussion about the mathematics
involved. It does not represent an opinion on behalf myself or my company
about your Y2k readiness or testing procedures!]
Jason Israel
Increased Limits and Rating Plans
16-1
E-mail: JIsrael@ISO.COM
Phone:(212)-898-5829, Fax -5565
Visit our website: http://www.iso.com
> ----------
> From: Rich Sieger[SMTP:RichSieger@cnico.com]
> Sent: Wednesday, August 11, 1999 7:00 PM
> To: studygroup4b@lists.casact.org
> Subject: Confidence interval
>
> Here's a work application. My company wants to test our data for Y2K
> compliance. The systems guy wants to test 1% of the data (3200 of 320000
> policies). Would the 3200 policies (1% of data) predict the validity of
> our Y2K conversion with 90% accuracy? i.e. is this within the 90%
> confidence interval? If not, what percentage of data would represent the
> data with 90% accuracy?
>
> Thanks,
>
> Rich Sieger
>
> richsieger@cnico.com
>
>
>