statistics - lecture 02statnipa.appspot.com/cours/02/02.pdf · 9/12/2018 statistics - lecture 02...

39
9/12/2018 Statistics - Lecture 02 file:///users/home/npaul/enseignement/esbs/2018-2019/cours/02/index.html#39 1/39 Statistics - Lecture 02 Nicodème Paul Faculté de médecine, Université de Strasbourg

Upload: others

Post on 23-Jul-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Statistics - Lecture 02statnipa.appspot.com/cours/02/02.pdf · 9/12/2018 Statistics - Lecture 02 file:///users/home/npaul/enseignement/esbs/2018-2019/cours/02/inde x.html#39 2/39

9/12/2018 Statistics - Lecture 02

file:///users/home/npaul/enseignement/esbs/2018-2019/cours/02/index.html#39 1/39

Statistics - Lecture 02Nicodème Paul Faculté de médecine, Université de Strasbourg

Page 2: Statistics - Lecture 02statnipa.appspot.com/cours/02/02.pdf · 9/12/2018 Statistics - Lecture 02 file:///users/home/npaul/enseignement/esbs/2018-2019/cours/02/inde x.html#39 2/39

9/12/2018 Statistics - Lecture 02

file:///users/home/npaul/enseignement/esbs/2018-2019/cours/02/index.html#39 2/39

Statistical inference

2/39

Page 3: Statistics - Lecture 02statnipa.appspot.com/cours/02/02.pdf · 9/12/2018 Statistics - Lecture 02 file:///users/home/npaul/enseignement/esbs/2018-2019/cours/02/inde x.html#39 2/39

9/12/2018 Statistics - Lecture 02

file:///users/home/npaul/enseignement/esbs/2018-2019/cours/02/index.html#39 3/39

Probability - motivationSuppose we have a drug that we know, from long experience, cures a patient with some speci�c illness in 70% ofcases. A new drug is proposed as having a higher cure rate than the present one. To assess this claim, the new drugis given to 1000 people su�ering from the illness, among these, 741 are cured. Do we have signi�cant evidence thatthis new drug is better than the current one?

Consider the following hypotheses:

Probability calculation - If the new drug is equally e�ective as the current one, how likely isit that, by chance, 741 or more people given the new drug will be cured?

Statisical inference - Based on the above probability calculation, the data may provideconvincing evidence that the new drug is better than the current one.

·

: the new drug is equally e�ective than the the current one (hypothesis of no e�ect orno di�erence or not better)

: the new drug is better than the current one

- H0

- H1

·

·

3/39

Page 4: Statistics - Lecture 02statnipa.appspot.com/cours/02/02.pdf · 9/12/2018 Statistics - Lecture 02 file:///users/home/npaul/enseignement/esbs/2018-2019/cours/02/inde x.html#39 2/39

9/12/2018 Statistics - Lecture 02

file:///users/home/npaul/enseignement/esbs/2018-2019/cours/02/index.html#39 4/39

Probability - motivation an observable random sample

are independent and indentically distributed (i.d.d)

Let and we want to calculate:

It follows:

Alternatively, Let , is easier to calculate.

· , , . . . ,X1 X2 X1000

· , , . . . ,X1 X2 X1000

· L( ) = B(p)Xi

· Y = + +. . . +X1 X2 X1000 (Y ≥ 741)Pp0

· L(Y ) = B(1000; )p0

·

(Y ≥ 741) = ( ) (1 − = 0.002335Pp0∑i=741

1000 1000

ipi0 p0)1000−i

· = Y /1000X̄1000 ( ≥ 0.741)Pp0X̄1000

4/39

Page 5: Statistics - Lecture 02statnipa.appspot.com/cours/02/02.pdf · 9/12/2018 Statistics - Lecture 02 file:///users/home/npaul/enseignement/esbs/2018-2019/cours/02/inde x.html#39 2/39

9/12/2018 Statistics - Lecture 02

file:///users/home/npaul/enseignement/esbs/2018-2019/cours/02/index.html#39 5/39

Population distribution

5/39

Page 6: Statistics - Lecture 02statnipa.appspot.com/cours/02/02.pdf · 9/12/2018 Statistics - Lecture 02 file:///users/home/npaul/enseignement/esbs/2018-2019/cours/02/inde x.html#39 2/39

9/12/2018 Statistics - Lecture 02

file:///users/home/npaul/enseignement/esbs/2018-2019/cours/02/index.html#39 6/39

Sampling distribution of the sample proportion

6/39

Page 7: Statistics - Lecture 02statnipa.appspot.com/cours/02/02.pdf · 9/12/2018 Statistics - Lecture 02 file:///users/home/npaul/enseignement/esbs/2018-2019/cours/02/inde x.html#39 2/39

9/12/2018 Statistics - Lecture 02

file:///users/home/npaul/enseignement/esbs/2018-2019/cours/02/index.html#39 7/39

Population distribution

7/39

Page 8: Statistics - Lecture 02statnipa.appspot.com/cours/02/02.pdf · 9/12/2018 Statistics - Lecture 02 file:///users/home/npaul/enseignement/esbs/2018-2019/cours/02/inde x.html#39 2/39

9/12/2018 Statistics - Lecture 02

file:///users/home/npaul/enseignement/esbs/2018-2019/cours/02/index.html#39 8/39

Sampling distribution of the sample mean

8/39

Page 9: Statistics - Lecture 02statnipa.appspot.com/cours/02/02.pdf · 9/12/2018 Statistics - Lecture 02 file:///users/home/npaul/enseignement/esbs/2018-2019/cours/02/inde x.html#39 2/39

9/12/2018 Statistics - Lecture 02

file:///users/home/npaul/enseignement/esbs/2018-2019/cours/02/index.html#39 9/39

Sampling distribution: population mean and : population variance

: population mean and : population variance

· μ σ2

· p0 n (1 − )p0 p0

9/39

Page 10: Statistics - Lecture 02statnipa.appspot.com/cours/02/02.pdf · 9/12/2018 Statistics - Lecture 02 file:///users/home/npaul/enseignement/esbs/2018-2019/cours/02/inde x.html#39 2/39

9/12/2018 Statistics - Lecture 02

file:///users/home/npaul/enseignement/esbs/2018-2019/cours/02/index.html#39 10/39

Check yourselfImagine that you have a very large barrel that contains tens of thousands of M&M's. According to the o�cialM&M website, 20% of the M&M's produced by the Mars Corporation are orange. 5 students each take arandom sample of 50 M&M's and record the percentage of orange in each sample. Which sequence is the mostplausible for the percentage of orange candies obtained in these 5 samples?

0.20, 0.20, 0.20, 0.20, 0.20

0.15, 0.25, 0.22, 0.20, 0.28

0.5, 0.80, 0.8, 0.65, 0.70

Each of the sequences is equally plausible.

Submit Show Hint Show Answer Clear

10/39

Page 11: Statistics - Lecture 02statnipa.appspot.com/cours/02/02.pdf · 9/12/2018 Statistics - Lecture 02 file:///users/home/npaul/enseignement/esbs/2018-2019/cours/02/inde x.html#39 2/39

9/12/2018 Statistics - Lecture 02

file:///users/home/npaul/enseignement/esbs/2018-2019/cours/02/index.html#39 11/39

Check yourselfAccording to the o�cial M&M website, 24% of the plain milk chocolate M&M’s produced by Mars Corporationare blue. Annie buys a large family-size bag of M&M's. Sarah buys a small fun-size bag. Which bag is more likelyto have more than 40% blue M&M's?

Annie, because there are more M&M's in her bag, so she will have more blue ones.

Annie, because there is more variability in the proportion of blues among largersamples.

Sarah, because there is more variability in the proportion of blues among smallersamples.

Both have the same chance because they are both random samples.

Submit Show Hint Show Answer Clear

11/39

Page 12: Statistics - Lecture 02statnipa.appspot.com/cours/02/02.pdf · 9/12/2018 Statistics - Lecture 02 file:///users/home/npaul/enseignement/esbs/2018-2019/cours/02/inde x.html#39 2/39

9/12/2018 Statistics - Lecture 02

file:///users/home/npaul/enseignement/esbs/2018-2019/cours/02/index.html#39 12/39

Sampling distribution of the meanLet be a random sample independent and identically distributed (i.i.d) froma distribution with mean value and standard deviation .

is the standard deviation of the mean

If then ,

If then:

· , , . . . ,X1 X2 Xn

μ σ

E( ) = μ V( ) =X̄n X̄n

σ2

n

· σ

n√

· = + +. . . +T0 X1 X2 Xn E( ) = nμT0 V( ) = nT0 σ2

· L( ) = N (μ; )Xi σ2

- L( ) = N (μ; )X̄nσ2

n

- L( ) = N (nμ;n )T0 σ2

12/39

Page 13: Statistics - Lecture 02statnipa.appspot.com/cours/02/02.pdf · 9/12/2018 Statistics - Lecture 02 file:///users/home/npaul/enseignement/esbs/2018-2019/cours/02/inde x.html#39 2/39

9/12/2018 Statistics - Lecture 02

file:///users/home/npaul/enseignement/esbs/2018-2019/cours/02/index.html#39 13/39

ExampleThe time that it takes a randomly selected rat of a certain subspecies to �nd its way through a maze is a normallydistributed random variable with min and min. Suppose �ve rats are selected. Let

denote their times in the maze. Assuming the 's to be a random sample from this normaldistribution, what is the probability that the total time for the �ve is between 6 and 8min?

μ = 1.5 σ = 0.35, , . . . ,X1 X2 X5 Xi

= + + ⋅ ⋅ ⋅ +T0 X1 X2 X5

and

It follows:

· = 5T0 X̄5 ∼ N (5 × 1.5; 5 × ) = N (7.5; 0.6125)T0 0.352

· = × 0.35 = 0.783σT0 5√

·

P(6 ≤ ≤ 8)T0 =

=

=

=

P( ≤ ) ≤6 − 7.5

0.783

− 7.5T0

0.783

8 − 7.5

0.783P(−1.92 ≤ Z ≤ 0.64)

Φ(0.64) − (1 − Φ(1.92))

0.7115

13/39

Page 14: Statistics - Lecture 02statnipa.appspot.com/cours/02/02.pdf · 9/12/2018 Statistics - Lecture 02 file:///users/home/npaul/enseignement/esbs/2018-2019/cours/02/inde x.html#39 2/39

9/12/2018 Statistics - Lecture 02

file:///users/home/npaul/enseignement/esbs/2018-2019/cours/02/index.html#39 14/39

Central Limit Theorem (CLT)

And

Let be a random sample from a distribution with mean and variance · , , . . . ,X1 X2 Xn μ σ2

P( ≤ z) = P(Z ≤ z) = Φ(z)limn→∞

− μX̄n

σ/ n√

P( ≤ z) = P(Z ≤ z) = Φ(z)limn→∞

− nμT0

σn√

Rule of thumb:·

For continous random variables, if the Central Limit Theorem can be used.

If the 's are Bernoulli random variables , , and ,then the Central Limit Theorem can also be used.

- n > 30

- Xi B(p) n ≥ 50 np ≥ 15 n(1 − p) ≥ 15

14/39

Page 15: Statistics - Lecture 02statnipa.appspot.com/cours/02/02.pdf · 9/12/2018 Statistics - Lecture 02 file:///users/home/npaul/enseignement/esbs/2018-2019/cours/02/inde x.html#39 2/39

9/12/2018 Statistics - Lecture 02

file:///users/home/npaul/enseignement/esbs/2018-2019/cours/02/index.html#39 15/39

Check yourselfThe proportion of left-handed people in the general population is about 0.1. Suppose a random sample of 225people is observed. Approximately, what is the sampling distribution of the sample proportion ?

Binomial(225, 0.1)

Normal(0.1, 0.02)

Normal(0, 0.1)

Normal(0.1, 0.0004)

Submit Show Hint Show Answer Clear

15/39

Page 16: Statistics - Lecture 02statnipa.appspot.com/cours/02/02.pdf · 9/12/2018 Statistics - Lecture 02 file:///users/home/npaul/enseignement/esbs/2018-2019/cours/02/inde x.html#39 2/39

9/12/2018 Statistics - Lecture 02

file:///users/home/npaul/enseignement/esbs/2018-2019/cours/02/index.html#39 16/39

Drug trial exampleRecall that and :

Checking conditions: are i.i.d. with distribution , , , . By CLT, we have:

· = 0.7p0

(Y ≥ 741) = ( ) (1 − = 0.002335Pp0∑i=741

1000 1000

ipi0 p0)1000−i

· , , . . . ,X1 X2 Xn B(0.7) n = 1000 ≥ 501000 × = 700 ≥ 15p0 1000 × (1 − ) = 300 ≥ 15p0

( ≥ 0.741)Pp0X̄1000 =

=

=

=

1 − ( < 0.741)Pp0X̄1000

1 − P( < )− 0.7X̄1000

0.7(1−0.7)

1000

− −−−−−−√

0.741 − 0.7

0.7(1−0.7)

1000

− −−−−−−√

1 − P(Z < 2.83)

0.0023

16/39

Page 17: Statistics - Lecture 02statnipa.appspot.com/cours/02/02.pdf · 9/12/2018 Statistics - Lecture 02 file:///users/home/npaul/enseignement/esbs/2018-2019/cours/02/inde x.html#39 2/39

9/12/2018 Statistics - Lecture 02

file:///users/home/npaul/enseignement/esbs/2018-2019/cours/02/index.html#39 17/39

Chi-square distributionLet , a sequence of random variables independently distributed from anormal distribution . Then has a chi-square distribution with degrees of freedom. We note .

If then and .

Parameters of the sampling distribution of

· ( , i = 1, . . ,n)Xi

N (0; 1) K = ∑ni=1 X

2i n

L(K) = χ2n

· L(K) = χ2n E(K) = n V ar(K) = 2n

· S2

= ( −S2 1

n − 1∑i=1

n

Xi X̄)2

L( ( − ) =1

σ2∑i=1

n

Xi X̄)2χ2n−1

E( ) = and V ar( ) =S2 σ2 S2 2σ4

n − 1

17/39

Page 18: Statistics - Lecture 02statnipa.appspot.com/cours/02/02.pdf · 9/12/2018 Statistics - Lecture 02 file:///users/home/npaul/enseignement/esbs/2018-2019/cours/02/inde x.html#39 2/39

9/12/2018 Statistics - Lecture 02

file:///users/home/npaul/enseignement/esbs/2018-2019/cours/02/index.html#39 18/39

Chi-Square Distribution

18/39

Page 19: Statistics - Lecture 02statnipa.appspot.com/cours/02/02.pdf · 9/12/2018 Statistics - Lecture 02 file:///users/home/npaul/enseignement/esbs/2018-2019/cours/02/inde x.html#39 2/39

9/12/2018 Statistics - Lecture 02

file:///users/home/npaul/enseignement/esbs/2018-2019/cours/02/index.html#39 19/39

Chi-square table

Example

, P( ≤ 4.575) = 0.05χ2(11)P( ≤ 12.242) = 0.8χ2(9)

19/39

Page 20: Statistics - Lecture 02statnipa.appspot.com/cours/02/02.pdf · 9/12/2018 Statistics - Lecture 02 file:///users/home/npaul/enseignement/esbs/2018-2019/cours/02/inde x.html#39 2/39

9/12/2018 Statistics - Lecture 02

file:///users/home/npaul/enseignement/esbs/2018-2019/cours/02/index.html#39 20/39

Parameter versus estimate

Parameter Estimate

PROPORTION

MEAN

STANDARD DEVIATION

A population characteristic is called a parameter. For example and are populationparameters.

A point estimate or an estimate of a parameter is a single number that is based on sampledata and represents a plausible value of the characteristic.

· μ σ2

·

p p̂

μ x̄

σ s

20/39

Page 21: Statistics - Lecture 02statnipa.appspot.com/cours/02/02.pdf · 9/12/2018 Statistics - Lecture 02 file:///users/home/npaul/enseignement/esbs/2018-2019/cours/02/inde x.html#39 2/39

9/12/2018 Statistics - Lecture 02

file:///users/home/npaul/enseignement/esbs/2018-2019/cours/02/index.html#39 21/39

Estimate and EstimatorLet be an observable of random sample. Let be an arbitrary real-valuedfunction of n real variables. Then the random variable is called astatistic. Its distribution is the sampling distribution of .

An estimator of a population parameter , is a statistic used to evaluate . For an observed sample , the estimate of is the value .

For example, if the 's have a common distribution with population mean and variance , let a sample data observed. Then:

is an estimator of and is an estimate of .

is the standard error of the mean.

· , , . . . ,X1 X2 Xn r

T = r( , , . . . , )X1 X2 Xn

T

· θ r( , , . . . , )X1 X2 Xn

θ , , . . . ,x1 x2 xn θ r( , , . . . , )x1 x2 xn

· Xi μ

σ2 , , . . . ,x1 x2 xn

=X̄1

n∑i=1

n

Xi

μ =x̄ 1n∑n

i=1 xi μ

· σ

n√

21/39

Page 22: Statistics - Lecture 02statnipa.appspot.com/cours/02/02.pdf · 9/12/2018 Statistics - Lecture 02 file:///users/home/npaul/enseignement/esbs/2018-2019/cours/02/inde x.html#39 2/39

9/12/2018 Statistics - Lecture 02

file:///users/home/npaul/enseignement/esbs/2018-2019/cours/02/index.html#39 22/39

Estimate and Estimator

In the drug trial example, the success rate of the new drug is unknown. We know that thereis strong evidence for . Using the sample data, an estimate of is . Thestandard error on the estimated proportion is . As is unknown, anestimate of the stard error is .

Suppose that form a random sample from a common distribution withpopulation mean and population variance and an observation. We have:

is an estimator of and is an estimate of .

· p

p > 0.7 p 0.741p(1 − p)/1000− −−−−−−−−−−

√ p

= 0.0140.741(1 − 0.741)/1000− −−−−−−−−−−−−−−−−

· , , . . . ,X1 X2 Xn

μ σ2 , , . . . ,x1 x2 xn

= ( −S2 1

n − 1∑i=1

n

Xi X̄)2

σ2 = ( −s2 1n−1

∑ni=1 xi x̄)2

σ2

22/39

Page 23: Statistics - Lecture 02statnipa.appspot.com/cours/02/02.pdf · 9/12/2018 Statistics - Lecture 02 file:///users/home/npaul/enseignement/esbs/2018-2019/cours/02/inde x.html#39 2/39

9/12/2018 Statistics - Lecture 02

file:///users/home/npaul/enseignement/esbs/2018-2019/cours/02/index.html#39 23/39

Check yourselfWhich of the following is false?

As the sample size increases, the variability of the sampling distribution decreases.

Standard error computed based on a sample standard deviation will always be lowerthan the standard deviation of that sample.

Standard error measures the variability in means of samples of the same size takenfrom the same population.

In order to reduce the standard error by half, sample size should be doubled.

Submit Show Hint Show Answer Clear

23/39

Page 24: Statistics - Lecture 02statnipa.appspot.com/cours/02/02.pdf · 9/12/2018 Statistics - Lecture 02 file:///users/home/npaul/enseignement/esbs/2018-2019/cours/02/inde x.html#39 2/39

9/12/2018 Statistics - Lecture 02

file:///users/home/npaul/enseignement/esbs/2018-2019/cours/02/index.html#39 24/39

Check yourselfWhat is the main problem with point estimates of population parameters ?

Nothing. Point estimates are 100% accurate estimates of their respective populationparameters

They do not account for sampling error

They tell us nothing about the population parameters they are estimating

We can't use them to estimate population means

Submit Show Hint Show Answer Clear

24/39

Page 25: Statistics - Lecture 02statnipa.appspot.com/cours/02/02.pdf · 9/12/2018 Statistics - Lecture 02 file:///users/home/npaul/enseignement/esbs/2018-2019/cours/02/inde x.html#39 2/39

9/12/2018 Statistics - Lecture 02

file:///users/home/npaul/enseignement/esbs/2018-2019/cours/02/index.html#39 25/39

Estimator - property

Let be an estimator of the parameter . We call biais of for the value :

If . We say that is an unbiaised estimator for .

If be an observable random sample i.i.d. with mean and variance ,

is an unbiased estimator as .

Similarly is an unbiased estimator of as .

Good estimators are generally unbiased with minimum variance.

· Tn θ Tn θ

b( ) = E( ) − θTn Tn

b( ) = 0Tn Tn θ

· , , . . . ,X1 X2 Xn μ σ2

· X̄ E( ) − μ = 0X̄

· S2 σ2E( ) − = 0S2 σ2

·

25/39

Page 26: Statistics - Lecture 02statnipa.appspot.com/cours/02/02.pdf · 9/12/2018 Statistics - Lecture 02 file:///users/home/npaul/enseignement/esbs/2018-2019/cours/02/inde x.html#39 2/39

9/12/2018 Statistics - Lecture 02

file:///users/home/npaul/enseignement/esbs/2018-2019/cours/02/index.html#39 26/39

Con�dence intervalLet be a sequence of random variables i.i.d. with probability density where . We call a con�dence interval procedure of level of ( ) eachcouple such that:

The observation obtained from a sample is called con�dence interval with level for or a % con�dence interval.

, , . . . ,X1 X2 Xn f(x; θ)θ ∈ R 1 − α 0 ≤ α ≤ 1( , )T1 T2

P(θ ∈ [ , ]) ≥ 1 − αT1 T2

[ , ]t1 t2 1 − α

θ 100(1 − α)

26/39

Page 27: Statistics - Lecture 02statnipa.appspot.com/cours/02/02.pdf · 9/12/2018 Statistics - Lecture 02 file:///users/home/npaul/enseignement/esbs/2018-2019/cours/02/inde x.html#39 2/39

9/12/2018 Statistics - Lecture 02

file:///users/home/npaul/enseignement/esbs/2018-2019/cours/02/index.html#39 27/39

Con�dence interval for the mean known, are iid with the common distribution σ2 , , . . . ,X1 X2 Xn N (μ; )σ2

Choose an estimator for , obviously .

Find a transformation of the estimator leading to distribution that we know about and thatwe we can use for probabilty calculation.

· μ X̄

·

L( ) = N (0; 1)− μX̄

σ/ n√

By using the normal distribution table, we �nd the quantile values and

such that:

· = −zα/2 z1−α/2

z1−α/2

P(− ≤ ≤ ) = 1 − αz1−α/2− μX̄

σ/ n√z1−α/2

27/39

Page 28: Statistics - Lecture 02statnipa.appspot.com/cours/02/02.pdf · 9/12/2018 Statistics - Lecture 02 file:///users/home/npaul/enseignement/esbs/2018-2019/cours/02/inde x.html#39 2/39

9/12/2018 Statistics - Lecture 02

file:///users/home/npaul/enseignement/esbs/2018-2019/cours/02/index.html#39 28/39

Con�dence interval for the mean knownσ2

P(− ≤ ≤ ) = 1 − αz1−α/2− μX̄

σ/ n√z1−α/2

P( − ≤ μ ≤ + ) = 1 − αX̄ z1−α/2σ

n√X̄ z1−α/2

σ

n√

The con�dence interval for of con�dence level is then:· μ 1 − α

[ − , + ]x̄ z1−α/2σ

n√x̄ z1−α/2

σ

n√

The length of the con�dence interval is :·

= 2 ×lc z1−α/2σ

n√

28/39

Page 29: Statistics - Lecture 02statnipa.appspot.com/cours/02/02.pdf · 9/12/2018 Statistics - Lecture 02 file:///users/home/npaul/enseignement/esbs/2018-2019/cours/02/inde x.html#39 2/39

9/12/2018 Statistics - Lecture 02

file:///users/home/npaul/enseignement/esbs/2018-2019/cours/02/index.html#39 29/39

ExampleGiven a sample of 9 people, we measure the weight of each of them. Suppose that the weight is randomlydistributed as a normal distribution . Given , we can �nd a con�dence interval withcon�dence level 95%. We have :

Exercise

Calculate the length of the interval. Compare the lengths for and .

Answer

, and

[Precision has increased]

N (μ, 170) = 67.70x̄

[67.70 − 1.96 ; 67.70 + 1.96 ] = [59.18; 76.22]170− −−

9√

170− −−

9√

n = 15 n = 20

= 2 × 1.96 = 17.037l9170√

9√= 2 × 1.96 = 13.197l15

170√

15√

= 2 × 1.96 = 11.429l20170√

20√

29/39

Page 30: Statistics - Lecture 02statnipa.appspot.com/cours/02/02.pdf · 9/12/2018 Statistics - Lecture 02 file:///users/home/npaul/enseignement/esbs/2018-2019/cours/02/inde x.html#39 2/39

9/12/2018 Statistics - Lecture 02

file:///users/home/npaul/enseignement/esbs/2018-2019/cours/02/index.html#39 30/39

Check yourselfWhich con�dence level gives a con�dence interval with larger length, n and held contant ?

95%

98%

Submit Show Hint Show Answer Clear

σ

30/39

Page 31: Statistics - Lecture 02statnipa.appspot.com/cours/02/02.pdf · 9/12/2018 Statistics - Lecture 02 file:///users/home/npaul/enseignement/esbs/2018-2019/cours/02/inde x.html#39 2/39

9/12/2018 Statistics - Lecture 02

file:///users/home/npaul/enseignement/esbs/2018-2019/cours/02/index.html#39 31/39

Check yourselfAs the sample size increases, the length of the con�dence interval

increases

decreases

Submit Show Hint Show Answer Clear

31/39

Page 32: Statistics - Lecture 02statnipa.appspot.com/cours/02/02.pdf · 9/12/2018 Statistics - Lecture 02 file:///users/home/npaul/enseignement/esbs/2018-2019/cours/02/inde x.html#39 2/39

9/12/2018 Statistics - Lecture 02

file:///users/home/npaul/enseignement/esbs/2018-2019/cours/02/index.html#39 32/39

Check yourselfAs the population standard deviation increases, the length of the con�dence interval

increases

decreases

remains contant

Submit Show Hint Show Answer Clear

32/39

Page 33: Statistics - Lecture 02statnipa.appspot.com/cours/02/02.pdf · 9/12/2018 Statistics - Lecture 02 file:///users/home/npaul/enseignement/esbs/2018-2019/cours/02/inde x.html#39 2/39

9/12/2018 Statistics - Lecture 02

file:///users/home/npaul/enseignement/esbs/2018-2019/cours/02/index.html#39 33/39

Check yourselfA given con�dence interval is calculated based on a random sample of n observations. If we want to make this interval narrower (1/3 of what it is now), how manyobservations should we sample?

9n

3n

4n

1/3n

1/9n

Submit Show Hint Show Answer Clear

33/39

Page 34: Statistics - Lecture 02statnipa.appspot.com/cours/02/02.pdf · 9/12/2018 Statistics - Lecture 02 file:///users/home/npaul/enseignement/esbs/2018-2019/cours/02/inde x.html#39 2/39

9/12/2018 Statistics - Lecture 02

file:///users/home/npaul/enseignement/esbs/2018-2019/cours/02/index.html#39 34/39

Con�dence interval for the variance unknownσ2

is an estimator of .

By using the chi-square distribution table, we can �nd the

quantile value of order and the quantile value of order such that:

· S2 σ2

· L( ) =n−1σ2 S2 χ2

n−1 χ2(n−1)α/2

α/2 χ2(n−1)1−α/2 1 − α/2

P( ≤ ≤ ) = 1 − αχ2(n−1)α/2

n − 1

σ2S2 χ

2(n−1)1−α/2

P( ≤ ≤ ) = 1 − αn − 1

χ2(n−1)1−α/2

S2 σ2 n − 1

χ2(n−1)α/2

S2

The con�dence interval for with a con�dence level is then:· σ2 1 − α

[ , ]n − 1

χ2(n−1)1−α/2

s2 n − 1

χ2(n−1)α/2

s2

34/39

Page 35: Statistics - Lecture 02statnipa.appspot.com/cours/02/02.pdf · 9/12/2018 Statistics - Lecture 02 file:///users/home/npaul/enseignement/esbs/2018-2019/cours/02/inde x.html#39 2/39

9/12/2018 Statistics - Lecture 02

file:///users/home/npaul/enseignement/esbs/2018-2019/cours/02/index.html#39 35/39

ExampleGiven a sample of 9 people, we measure the weight of each of them. Suppose that the weight is randomlydistributed as a normal distribution . We want to �nd a con�dence interval of with a con�dence levelof 95% given from the sample we have . We obtain:

Question

Caculate a con�dence interval for with a con�dence level of 95%.

Answer

N (μ, )σ2 σ2

= 171.09s2

[ ; ] = [78.08; 627.85](9 − 1) ∗ 171.09

17.53

(9 − 1) ∗ 171.09

2.18

σ

IC = [8.84; 25.06]

35/39

Page 36: Statistics - Lecture 02statnipa.appspot.com/cours/02/02.pdf · 9/12/2018 Statistics - Lecture 02 file:///users/home/npaul/enseignement/esbs/2018-2019/cours/02/inde x.html#39 2/39

9/12/2018 Statistics - Lecture 02

file:///users/home/npaul/enseignement/esbs/2018-2019/cours/02/index.html#39 36/39

Con�dence interval for a proportion, , n ≥ 50 np ≥ 15 n(1 − p) ≥ 15

is an estimator de .

By the central limit theorem, we have:

· X̄n p

·

L( ) = N (0; 1)− pX̄n

p(1−p)

n

− −−−−√

Using the normal distribution table, we �nd the quantile values et such that:

· = −zα/2 z1−α/2 z1−α/2

P(− ≤ ≤ ) = 1 − αz1−α/2− pX̄n

p(1−p)

n

− −−−−√

z1−α/2

36/39

Page 37: Statistics - Lecture 02statnipa.appspot.com/cours/02/02.pdf · 9/12/2018 Statistics - Lecture 02 file:///users/home/npaul/enseignement/esbs/2018-2019/cours/02/inde x.html#39 2/39

9/12/2018 Statistics - Lecture 02

file:///users/home/npaul/enseignement/esbs/2018-2019/cours/02/index.html#39 37/39

Con�dence interval for a proportion, ,

Where .

n ≥ 50 np ≥ 15 n(1 − p) ≥ 15

P(− ≤ ≤ ) = 1 − αz1−α/2− pX̄n

p(1−p)n

− −−−−√

z1−α/2

P( − ≤ p ≤ + ) = 1 − αX̄n z1−α/2p(1 − p)

n

− −−−−−−√ X̄n z1−α/2

p(1 − p)

n

− −−−−−−√

The con�dence interval for with con�dence level is then:· p 1 − α

[ − , + ]p̂ z1−α/2(1 − )p̂ p̂

n

− −−−−−−−√ p̂ z1−α/2

(1 − )p̂ p̂

n

− −−−−−−−√

=p̂ x̄n

37/39

Page 38: Statistics - Lecture 02statnipa.appspot.com/cours/02/02.pdf · 9/12/2018 Statistics - Lecture 02 file:///users/home/npaul/enseignement/esbs/2018-2019/cours/02/inde x.html#39 2/39

9/12/2018 Statistics - Lecture 02

file:///users/home/npaul/enseignement/esbs/2018-2019/cours/02/index.html#39 38/39

Drug trial example is unknown and its estimate is .

Using the estimate to check for CLT conditions: , , .

A 95% con�dence interval is written as:

· p = 0.741p̂

· n = 1000 ≥ 50 1000 × = 741 ≥ 15p̂

1000 × (1 − ) = 259 ≥ 15p̂

·

IC = [ − , + ]p̂ z0.975(1 − )p̂ p̂

n

− −−−−−−−√ p̂ z0.975

(1 − )p̂ p̂

n

− −−−−−−−√

As , It follows:· = 1.96z0.975

IC = 0.741 − 1.96 × , 0.741 + 1.96 ×⎡

0.741(1 − 0.741)

1000

− −−−−−−−−−−−−−

√0.741(1 − 0.741)

1000

− −−−−−−−−−−−−−

√⎤

· IC = [0.713, 0.768]

38/39

Page 39: Statistics - Lecture 02statnipa.appspot.com/cours/02/02.pdf · 9/12/2018 Statistics - Lecture 02 file:///users/home/npaul/enseignement/esbs/2018-2019/cours/02/inde x.html#39 2/39

9/12/2018 Statistics - Lecture 02

file:///users/home/npaul/enseignement/esbs/2018-2019/cours/02/index.html#39 39/39

See you next time

39/39