symboisis statistics

Upload: pradeep-joshi

Post on 03-Apr-2018

223 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/28/2019 Symboisis Statistics

    1/102

  • 7/28/2019 Symboisis Statistics

    2/102

  • 7/28/2019 Symboisis Statistics

    3/102

  • 7/28/2019 Symboisis Statistics

    4/102

    Median and Mean of a Density Curve

    The median of a density curve is the equal-areas point, the point that dividesthe area under the curve in half.The mean of a density curve is the balance point, at which the curve wouldbalance ifmade of solid material.The median and mean are the same for a symmetric density curve. They both lie atthe center of the curve. The mean of a skewed curve is pulled away from the

    median in the direction of the long tail.

  • 7/28/2019 Symboisis Statistics

    5/102

    Statistics

    Founded in 1890, the Literary Digest magazine was famous for its success in conducting pollsto predict winners in presidential elections. The magazine correctly predicted the winners in thepresidential elections of 1916, 1920, 1924, 1928, and 1932. In the 1936 presidential contest

    between Alf Landon and Franklin D. Roosevelt, the magazine sent out 10 million ballots andreceived 1,293,669 ballots for Landon and 972,897 ballots for Roosevelt, so it appeared thatLandon would capture 57% of the vote.

    Well, Landon received 16,679,583 votes to the 27,751,597 votes cast for Roosevelt. Insteadof getting 57% of the vote as suggested by the Literary Digest poll, Landon received only37% of the voteIn that same 1936 presidential election, George Gallup used a much smaller

    poll of 50,000 subjects, and he correctly predicted that Roosevelt would win.

  • 7/28/2019 Symboisis Statistics

    6/102

  • 7/28/2019 Symboisis Statistics

    7/102

    Flipping of coin

  • 7/28/2019 Symboisis Statistics

    8/102

    Data A plural noun (the singular form is datum) which means a set of known or given things,facts. Note that data can be numerical (e.g. age of people) or non-numerical (e.g. gender ofpeople).

    statistics Without a capital letter, i.e. in its lower-case form, this means a set of numerical

    data or figures that have been collected systematically.

    Statistics With a capital letter this is a proper noun that means the set of methods andtheories that can be used to arrange, analyse and interpret statistics.

    A variable A quantity that varies, the opposite of a constant. For example, the number of

    mobile phones sold per day in a shop is a variable, whereas the number of hours in a day is aconstant. In the expressions that we will use to summarize methods a capital letter, usuallyXor Y, will be used to represent a variable.

    Value A specific amount that it is possible for a variable to be. For example, the number ofmobile phones sold per day could be 25 or 43 or 51. These are all possible values of thevariable number of phones sold.

  • 7/28/2019 Symboisis Statistics

    9/102

    Random This adjective refers to something that occurs in an unplanned way. A randomvariable is a variable whose observedvalues arise by chance. The number of new accounts abank opens during a month is a variable that is random, whereas the number of days in amonth is a variable that is not random, i.e. its observed values are pre-determined.

    Distribution The pattern exhibited by the observed values of a variable when they arearranged in order of magnitude. A theoretical distribution is one that has been deduced, ratherthan compiled from observed values.

    Population Generally this means the total number of persons residing in a defined area at agiven time. In Statistics apopulation is the complete set of things we want to investigate.These may be human such as all the people who have visited a supermarket, or inanimate

    such as all the policies issued by an insurance company.

    Sample A subset of the population, that is, a smaller numberof items picked from thepopulation. A random sample is a sample whose components have been chosen in a randomway, that is, on the basis that any single item in the population has no more or less chancethan any other to be included in the sample.

  • 7/28/2019 Symboisis Statistics

    10/102

  • 7/28/2019 Symboisis Statistics

    11/102

  • 7/28/2019 Symboisis Statistics

    12/102

    Copyright 2004Pearson Education, Inc.

    BusinessThe etymology of "business" relates to the state of being busy either as an

    individual or society as a whole, doing commercially viable and profitablework

    A business (also known as enterprise or firm) isan organization engaged in the trade ofgoods, services, or bothto consumers.[

    business statistics can be described as the collection, summarization,

    analysis, and reporting of numerical findings relevant to a business

    decision or situation.

    http://en.wikipedia.org/wiki/Etymologyhttp://en.wikipedia.org/wiki/Organizationhttp://en.wikipedia.org/wiki/Tradehttp://en.wikipedia.org/wiki/Good_(economics)http://en.wikipedia.org/wiki/Service_(economics)http://en.wikipedia.org/wiki/Consumerhttp://en.wikipedia.org/wiki/Businesshttp://en.wikipedia.org/wiki/Businesshttp://en.wikipedia.org/wiki/Consumerhttp://en.wikipedia.org/wiki/Service_(economics)http://en.wikipedia.org/wiki/Good_(economics)http://en.wikipedia.org/wiki/Tradehttp://en.wikipedia.org/wiki/Organizationhttp://en.wikipedia.org/wiki/Etymology
  • 7/28/2019 Symboisis Statistics

    13/102

    Copyright 2004Pearson Education, Inc.

    Why Statistics

    The time has three phases Past ,Present and Future

    To continue and growth of any business depends on strategic decisions basedon finance, operations or market

    The decision making is very crucial either it is based on intuition or information/Knowledge

    The Data ( Facts of present) Analysis Information Knowledge

    Knowledge base decisions are based on some model

    There is a time lag between awareness of impeding event or need andoccurrence of that event

    This is lead time and hence planning and forecasting is needed

    Occurrence is either random or has a causal relation.

    The statistics helps here

  • 7/28/2019 Symboisis Statistics

    14/102

    Properties of Estimators

    Statistics:

    1. Sufficiency2. Un-biased3. Resistance4. Efficiency

    Parameters: Describe the population

    Describe

    But we use it to estimate population parameters

    samples.

  • 7/28/2019 Symboisis Statistics

    15/102

  • 7/28/2019 Symboisis Statistics

    16/102

    Samples of Two from the above population

    If

    Sample y: 1, 2

    If

    25.0

    2

    2

    n

    yys

    50.01

    22

    n

    yys

  • 7/28/2019 Symboisis Statistics

    17/102

  • 7/28/2019 Symboisis Statistics

    18/102

  • 7/28/2019 Symboisis Statistics

    19/102

  • 7/28/2019 Symboisis Statistics

    20/102

    (1) Carefully defining the situation, (2) gathering data, (3) accurately summarizing thedata, and (4) deriving and communicating meaningful conclusions.

    Statistics: The science of collecting, describing, and interpreting data.

    Population: A collection, or set, of individuals, objects, or events whose

    properties are to be analyzed.

    Sample: A subset of a population.

    Variable (or response variable): A characteristic of interest about each

    individual element of a population or sample.Data value: The value of the variable associated with one element of a

    population or sample. This value may be a number, a word, or a symbol

    Data: The set of values collected from the variable from each of the elements

    that belong to the sample.

    Experiment: A planned activity whose results yield a set of data.

    Parameter: A numerical value summarizing all the data of an entire pulation.

    Statistic: A numerical value summarizing the sample data.

    Qualitative, or attribute, or categorical, variable: A variable that describes orcategorizes an element of a population.

  • 7/28/2019 Symboisis Statistics

    21/102

    A variable is simply something that can vary: that is, it can take on many different

    values or categories. Examples of variables are gender, typing speed, top speed ofa car, number of reported symptoms of an illness, temperature, attendances at rockfestivals (e.g. the Download festival), level of anxiety, number of goals scored infootball matches, intelligence, number of social encounters while walking your dog,amount of violence on television, occupation and favourite colours. These are allthings that we can measure and record and that vary We are generally interested

    in variables because we want to understand why they vary as they do.

  • 7/28/2019 Symboisis Statistics

    22/102

    Ordinal variable: A qualitative variable that incorporates an ordered

  • 7/28/2019 Symboisis Statistics

    23/102

    Ordinal variable: A qualitative variable that incorporates an ordered

    position, orranking.

    Discrete variable: A quantitative variable that can assume a countable

    number ofvalues. Intuitively, the discrete variable can assume any valuescorresponding to isolated points along a line interval. That is, there is a gap

    between any two values.Continuous variable: A quantitative variable that can assume an

    uncountable number of values. Intuitively, the continuous variable can assumeany value along a line interval, including every possible value between any twovalues.

    Biased sampling method: A sampling method that produces data thatsystematically differ from the sampled population. An unbiased sampling methodis one that is not biased

    Sampling frame: A list, or set, of the elements belonging to the population

    from which the sample will be drawn.

  • 7/28/2019 Symboisis Statistics

    24/102

    Data is a numerical information

    Data

    Information

    Analysis

    Knowledge

    Only data is useless it has to be organized summarized and presented

    based on it is analyzed or estimated these are the functions of statistics

    Measurement is done is either quantitative or qualitative

    Scales used

    Nominal Scale

    Ordinal Scale.

    Interval Scale.

    Ratio Scale

  • 7/28/2019 Symboisis Statistics

    25/102

    event is more likely to occur Probabilities closer to 0 indicate that the event

  • 7/28/2019 Symboisis Statistics

    26/102

    event is more likely to occur. Probabilities closer to 0 indicate that the eventis less likely to occur.P(A), read P of A, denotes the probability of event A.

    IfP(A) 1, the event A is certain to occur.IfP(A) 0, the event A is certain not to occur.

    Probability is base for inferential statisticsEvent is outcome of an experiment

    Sample space collection of all events

    1. All sample point probabilities lie between 0 and 1

    2. Sum of probabilities of all sample point within sample space =1

  • 7/28/2019 Symboisis Statistics

    27/102

    Mutually exclusive events are statistically independent

    When two events are mutually exclusive then the probability ofA or B occurringcan be expressed by the following addition rule for mutually exclusiveevents P(A, or B) P(A) P(B)

    A queen of sped and Ace of sped has probability

    P(As or Qs)1/52+1/52 with replacement and 1/52+1/51 without replacement

    If two events are non-mutually exclusive

    addition rule for no mutually exclusive events P(A, or B)= P(A) + P(B)- P(AB)

    joint probability. This is calculated by the product of the individual marginal

    probabilities P(AB) = P(A) * P(B)The concept ofstatistical dependence implies that the probability of acertain event is dependent on the occurrence of another event

  • 7/28/2019 Symboisis Statistics

    28/102

    or successes o e o a num er o ou comes. xpresse as a ormu a,

  • 7/28/2019 Symboisis Statistics

    29/102

    p ,

    The classic theory assumes that all outcomes have equal likelihood ofoccurring. In the example just cited, each card must have an equal chanceof being chosenno card is larger than any other or in any way more likelyto be chosen than any other card. The classic theory pertains only to outcomes thatare mutually exclusive (ordisjoint), which means that those outcomes maynot occur at the same time. For example, one coin flip can result in a head or a

    tail, but one coin flip cannot result in a head and a tail. So the outcome of a headand the outcome of a tail are said to be mutually exclusive in one coin flip, as isthe outcome of an ace and a king as the outcome of one card being drawn.

    A probability assignment based on equally likely outcomes uses the formula

  • 7/28/2019 Symboisis Statistics

    30/102

    11.30

    Chapter 11

    Introduction to Hypothesis

    Testing

  • 7/28/2019 Symboisis Statistics

    31/102

    11.31

    Nonstatistical Hypothesis Testing

    A criminal trial is an example of hypothesistesting without the statistics.

    In a trial a jury must decide between twohypotheses. The null hypothesis is

    H0: The defendant is innocent

    The alternative hypothesis or researchhypothesis is

    H1: The defendant is guilty

  • 7/28/2019 Symboisis Statistics

    32/102

  • 7/28/2019 Symboisis Statistics

    33/102

    11.33

    Nonstatistical Hypothesis Testing

    There are two possible errors.

    A Type I error occurs when we reject atrue null hypothesis. That is, a Type I erroroccurs when the jury convicts an innocentperson. We would want the probability ofthis type of error [maybe 0.001 beyond a

    reasonable doubt] to be very small for acriminal trial where a conviction results inthe death penalty, whereas for a civil trial,

    where conviction might result in someone

  • 7/28/2019 Symboisis Statistics

    34/102

    11.34

    Nonstatistical Hypothesis Testing

    A Type II error occurs when we dont

    reject a false null hypothesis [accept thenull hypothesis]. That occurs when a guilty

    defendant is acquitted. In practice, this type of error is by far the

    most serious mistake we normally make.

    For example, if we test the hypothesis thatthe amount of medication in a heart pill isequal to a value which will cure your heart

    problem and accept the hull hypothesis

  • 7/28/2019 Symboisis Statistics

    35/102

    11.35

    Nonstatistical Hypothesis Testing

    The probability of a Type I error is denotedas (Greek letteralpha). The probabilityof a type II error is (Greek letterbeta).

    The two probabilities are inversely related.Decreasing one increases the other, for a

    fixed sample size.

    In other words, you cant have and both real small for an old sam le size.

  • 7/28/2019 Symboisis Statistics

    36/102

    11.36

    Types of Errors

    A Type I error occurs when we rejectat ruenull hypothesis (i.e. Reject H0 when itis TRUE)

    H0 T F

    Reject I

    Reject II

  • 7/28/2019 Symboisis Statistics

    37/102

    11.37

    Nonstatistical Hypothesis Testing

    The critical concepts are theses:

    1. There are two hypotheses, the null and thealternative hypotheses.

    2. The procedure begins with the assumption that thenull hypothesis is true.

    3. The goal is to determine whether there is enoughevidence to infer that the alternative hypothesis is true,orthe null is not likely to be true.

    4. There are two possible decisions:

    Conclude that there is enough evidence to supportthe alternative hypothesis. Reject the null.

    Conclude that there is notenough evidence tosupport the alternative hypothesis. Fail to reject the

  • 7/28/2019 Symboisis Statistics

    38/102

    11.38

    Concepts of Hypothesis Testing(1)

    The two hypotheses are called the nul lhypothes isand the other the alternativeorresearch hypothesis. The usual

    notation is:

    H0: the null hypothesis

    H1: the alternative or research

    pronouncedH nought

    C f

  • 7/28/2019 Symboisis Statistics

    39/102

    11.39

    Concepts of HypothesisTesting

    Consider mean demand for computersduring assembly lead time. Rather thanestimate the mean demand, our

    operations manager wants to knowwhether the mean is d i f ferent from 350uni ts. In other words, someone is claimingthat the mean time is 350 units and we

    want to check this claim out to see if itappears reasonable. We can rephrase thisrequest into a test of the hypothesis:

    H0: = 350

    C f H h i

  • 7/28/2019 Symboisis Statistics

    40/102

    11.40

    Concepts of HypothesisTesting

    For example, if were trying to decide

    whether the mean is not equal to 350, alarge value of (say, 600) would provide

    enough evidence.

    If is close to 350 (say, 355) we could not

    say that this provides a great deal ofevidence to infer that the population meanis different than 350.

    C t f H th i T ti

  • 7/28/2019 Symboisis Statistics

    41/102

    11.41

    Concepts of Hypothesis Testing(4)

    The two possible decisions that can be made:

    Conclude that there isenough evidenceto support thealternative hypothesis

    (also stated as: reject the null hypothesis in favor of thealternative)

    Conclude that there i s notenough evidenceto supportthe alternative hypothesis

    (also stated as: failing to reject the null hypothesis in favorof the alternative)

    NOTE: we do not say that we accept the null hypothesis ifa statistician is around

    C t f H th i T ti

  • 7/28/2019 Symboisis Statistics

    42/102

    11.42

    Concepts of Hypothesis Testing(2)

    The testing procedure begins with theassumpt ion that the nul l hypo thesis is

    true.

    Thus, until we have further statisticalevidence, we will assume:

    H0: = 350 (assumed to be TRUE)

    The next step will be to determine the

  • 7/28/2019 Symboisis Statistics

    43/102

    11.43

    Is the Sample Mean in the Guts of the SamplingDistribution??

  • 7/28/2019 Symboisis Statistics

    44/102

    11.44

    Three ways to determine this: First way

    1. Unstandardized test statistic: Is inthe guts of the sampling distribution?Depends on what you define as the guts

    of the sampling distribution.

    If we define the guts as the center 95% of

    the distribution [this means = 0.05],then the critical values that define theguts will be 1.96 standard deviations of X-

    Bar on either side of the mean of the

  • 7/28/2019 Symboisis Statistics

    45/102

    11.45

    1. Unstandardized Test Statistic Approach

  • 7/28/2019 Symboisis Statistics

    46/102

    11.46

    Three ways to determine this: Second way

    2. Standardized test statistic: Since wedefined the guts of the sampling

    distribution to be the center 95% [ =

    0.05], If the Z-Score for the sample mean is

    greater than 1.96, we know that will be

    in the reject region on the right side or If the Z-Score for the sample mean is

    less than -1.97, we know that will be in

    the reject region on the left side.

  • 7/28/2019 Symboisis Statistics

    47/102

    11.47

    2. Standardized Test Statistic Approach

  • 7/28/2019 Symboisis Statistics

    48/102

    11.48

    Three ways to determine this: Third way

    3. The p-valueapproach (which is generally used with acomputer and statistical software): Increase theRejection Region until it captures the sample mean.

    For this example, since is to the right of the mean,calculate

    P( > 370.16) = P(Z > 1.344) = 0.0901

    Since this is a two tailed test, you must double this areafor the p-value.

    p-value = 2*(0.0901) = 0.1802

    Since we defined the guts as the center 95% [ = 0.05],the reject region is the other 5%. Since our samplemean, , is in the 18.02% region, it cannot be in our 5%

  • 7/28/2019 Symboisis Statistics

    49/102

    11.49

    3. p-value approach

  • 7/28/2019 Symboisis Statistics

    50/102

    11.50

    Statistical Conclusions:

    Unstandardized Test Statistic:

    Since LCV (320.6) < (370.16) 170 (this is what we want todetermine)

  • 7/28/2019 Symboisis Statistics

    53/102

    11.53

    Example 11.1

    What we want to show:

    H1: > 170

    H0: < 170 (well assume this is true)

    Normally we put Ho first.

    We know:

    n = 400, = 178, and

    = 65

    = 65/SQRT(400) = 3.25

    Example 11 1 Rejection

  • 7/28/2019 Symboisis Statistics

    54/102

    11.54

    Example 11.1 Rejection

    Region The reject ion regionis a range of values

    such that if the test statistic falls into thatrange, we decide to reject the null

    hypothesis in favor of the alternativehypothesis.

    is the critical value of to reject H0.

  • 7/28/2019 Symboisis Statistics

    55/102

    11.55

    Example 11.1

    At a 5% significance level (i.e. =0.05), we get [all inone tail]

    Z

    = Z0.05 = 1.645

    Therefore, UCV = 170 + 1.645*3.25 =175.35

    Since our sample mean (178) is greater thanthe criticalvalue we calculated (175.35), we reject the null

    hypothesis in favor of H1 OR

    (>1.645)Reject null

    Example 11 1 The Big

  • 7/28/2019 Symboisis Statistics

    56/102

    11.56

    Example 11.1 The Big

    Picture

    =175.34

    =178

    H1: > 170H0: = 170

    Reject H0 in favor of

  • 7/28/2019 Symboisis Statistics

    57/102

    11.57

    Interpreting the p-value

    The smaller the p-value, the morestatistical evidence exists to support thealternative hypothesis.

    If the p-value is less than 1%, there isoverwhelm ing evidencethat supportsthe alternative hypothesis.

    If the p-value is between 1% and 5%,there is a strong evidencethat supportsthe alternative hypothesis.

    If the p-value is between5% and 10%

    there is a weak evidencethat supports

  • 7/28/2019 Symboisis Statistics

    58/102

    11.58

    Interpreting the p-valueOverwhelming Evidence(Highly Significant)

    Strong Evidence(Significant)

    Weak Evidence

    (Not Significant)

    No Evidence(Not Significant)

    0 .01 .05 .10

    p=.0069

    Conclusions of a Test of

  • 7/28/2019 Symboisis Statistics

    59/102

    11.59

    Conclusions of a Test ofHypothesis

    If we reject the null hypothesis, weconclude that there is enough evidence toinfer that the alternative hypothesis is true.

    If we fail to reject the null hypothesis, weconclude that there is not enough

    statistical evidence to infer that thealternative hypothesis is true. This doesnot mean that we have proven that the null

    hypothesis is true!

    One tail test with rejection

  • 7/28/2019 Symboisis Statistics

    60/102

    11.60

    One tail test with rejectionregion on right

    The last example was a one tai l test,because the rejection region is located inonly one tail of the sampling distribution:

    More correctly, this was an example of a

    One tail test with rejection

  • 7/28/2019 Symboisis Statistics

    61/102

    11.61

    One tail test with rejectionregion on left

    The rejection region will be in the left tail.

    T t il t t ith j ti i i b th

  • 7/28/2019 Symboisis Statistics

    62/102

    11.62

    Two tail test with rejection region in bothtails

    The rejection region is split equallybetween the two tails.

  • 7/28/2019 Symboisis Statistics

    63/102

    11.63

    Example 11.2 Students work

    AT&Ts argues that its rates are such that

    customers wont see a difference in their

    phone bills between them and their

    competitors. They calculate the mean andstandard deviation for all their customersat $17.09 and $3.87 (respectively). Note:

    Dont know the true value for , so weestimate from the data [ ~ s = 3.87]large sample so dont worry.

    They then sample 100 customers at

  • 7/28/2019 Symboisis Statistics

    64/102

    11.64

    Example 11.2

    The rejection region is set up so we canreject the null hypothesis when the teststatistic is large orwhen it is small.

    stat is small stat is large

  • 7/28/2019 Symboisis Statistics

    65/102

    11.65

    Example 11.2

    At a 5% significance level (i.e. = .05),we have

    /2 = .025. Thus, z.025 = 1.96 and our

    rejection region is:

    z 1.96

    z-z.025 +z.0250

  • 7/28/2019 Symboisis Statistics

    66/102

    11.66

    Example 11.2

    From the data, we calculate = 17.55

    Using our standardized test statistic:

    We find that:

    Since z = 1.19 is not greater than 1.96, nor

    less than1.96 we cannot reject the null

    Summary of One- and Two-Tail

    http://e/TT%20PowerPoint%20slides/References/Xm11-02.xlshttp://e/TT%20PowerPoint%20slides/References/Xm11-02.xls
  • 7/28/2019 Symboisis Statistics

    67/102

    11.67

    Summary of One- and Two-TailTests

    One-Tail Test

    (left tail)

    Two-Tail Test One-Tail Test

    (right tail)

  • 7/28/2019 Symboisis Statistics

    68/102

    11.68

    Probability of a Type II Error

    A Type II error occurs when a false nullhypothesis is not rejected or you acceptthe null when it is not true but dont say it

    this way if a statistician is around.

    In practice, this is by far the most serious

    error you can make in most cases,especially in the quality field.

  • 7/28/2019 Symboisis Statistics

    69/102

    11.69

    Judging the Test

    A statistical test of hypothesis is effectivelydefined by the significance level ( ) andthe sample size (n), bo th of wh ich are

    selectedby the statistics practitioner.

    Therefore, if the probability of a Type II

    error ( ) is too large [we have insufficientpower], we can reduce it by

    increasing , and/or

    increasin the sam le size n.

  • 7/28/2019 Symboisis Statistics

    70/102

    11.70

    Judging the Test

    The power of a testis defined as 1 . It represents the probability of rejecting the null

    hypothesis when it is false and the true mean issomething other than the null value for the mean.

    If we are testing the hypothesis that the average amountof medication in blood pressure pills is equal to 6 mg(which is good), and we fail to reject the nullhypothesis, ship the pills to patients worldwide, only to

    find out later that the true average amount ofmedication is really 8 mg and people die, we get introuble. This occurred because the P(reject the null / truemean = 7 mg) = 0.32 which would mean that we have a68% chance on not rejecting the null for these BAD pills

    and shipping to patients worldwide.

  • 7/28/2019 Symboisis Statistics

    71/102

    11.71

    Probability you ship pills whose mean amount of medication is 7 mg approximately 67%

    DefinitionWhen we select a sample from a population and then try to estimate the

  • 7/28/2019 Symboisis Statistics

    72/102

    population parameter from the sample, we will not be entirely accurate. Thedifference between the population parameter and the sample statistic is thesampling error.

  • 7/28/2019 Symboisis Statistics

    73/102

  • 7/28/2019 Symboisis Statistics

    74/102

  • 7/28/2019 Symboisis Statistics

    75/102

    Data collection

  • 7/28/2019 Symboisis Statistics

    76/102

    Statistics is the study of how to collect, organize, analyze, and interpret

    numerical information from data.The goal of stat is t icsis to gain understanding from data

    Individuals are the people or objects included in the study.A variable isa characteristic of the individual to be measured or observed

    A quantitative variable has a value or numerical measurement for whichoperations such as addition or averaging make sense.A qualitative variable describes an individual by placing the individual into a

    category or group,such as male or female it is categorical variablesIn population data, the data are from every ind iv idu al of interest.In sample data, the data are from only some of the indiv iduals ofinterest.

    A parameter is a numerical measure that describes an aspect of a

    population.A statistic is a numerical measure that describes an aspect of a sample.

    DATA

  • 7/28/2019 Symboisis Statistics

    77/102

    Summarizing the data: Summarization is a process in which the data isreduced for interpretation without sacrificing any important information.Finding hidden relation ship, Anomalies ,trends, estimating ,predicting

    Data analysis task

    Data

  • 7/28/2019 Symboisis Statistics

    78/102

    Element

    Dictums'

    Discrete

    Variable

    Continuous Discrete

    QualitativeQuantitativeInterval

    Ratio scaleFor measurement

    NominalOrdinal scale

    For measurement

    investigatoris interested. The population is also called the universe.

  • 7/28/2019 Symboisis Statistics

    79/102

    A sample is a subset of measurements selected from the population.Sampling from the population is often done randomly, such that everypossible sample ofn elements will have an equal chance of beingselected. A sample selected in this way is called a simple random sample,or just a random sample. A random sample allows chance to determineits elements.

    A survey by an electric company contains questions on the following:1. Age of household head.2. Sex of household head.

    3. Number of people in household.4. Use of electric heating (yes or no).5. Number of large appliances used daily.6. Thermostat setting in winter.7. Average number of hours heating is on.8. Average number of heating days.

    9. Household income.10. Average monthly electric bill.11. Ranking of this electric company as compared with two previous electricitysuppliers.Describe the variables implicit in these 11 items as quantitative or qualitative, anddescribe the scales of measurement

    Given a set of numerical observations, we may order them according to magnitude.Once we have done this, it is possible to define the boundaries of the set. Anyt d t

  • 7/28/2019 Symboisis Statistics

    80/102

    studentwho has taken a nationally administered test, such as the Scholastic Aptitude Test(SAT), is familiar withpercentiles. Your score on such a test is compared with thescores

    of all people who took the test at the same time, and your position within this groupisdefined in terms of a percentile. If you are in the 90th percentile, 90% of the peoplewho took the test received a score lower than yours. We define a percentile asfollows.The Pth percent i le of a group o f num bers is that value below which l ie P%(P percent) of the numbers in the group. The position of the Pth percentileis given by (n 1)P/100, where n is the number of data points.

    The magazine Forbes publishes annually a list of the worlds wealthiest individuals.For 2007, the net worth of the 20 richest individuals, in billions of dollars, in noparticular

    order, is as follows:33, 26, 24, 21, 19, 20, 18, 18, 52, 56, 27, 22, 18, 49, 22, 20, 23, 32, 20, 18Find the 50th and 80th percentiles of this set of the worlds top 20 net worths.

  • 7/28/2019 Symboisis Statistics

    81/102

  • 7/28/2019 Symboisis Statistics

    82/102

  • 7/28/2019 Symboisis Statistics

    83/102

    Basic concept of Probability

    It is better to be roughly right than precisely wrong

  • 7/28/2019 Symboisis Statistics

    84/102

    It is better to be roughly right than precisely wrong.

    John Maynard Keynes

    You all have probably heard the story about Malcolm Forbes, who once got lostfloating for miles in one of his famous balloons and finally landed in the middle of a

    cornfield. He spotted a man coming toward him and asked, Sir, can you tell mewhere I am? The man said, Certainly, you are in a basket in a field of corn.Forbes said, You must be a statistician. The man said, Thats amazing, how did

    youknow that? Easy, said Forbes, your information is concise, precise, and

    absolutelyuseless!

  • 7/28/2019 Symboisis Statistics

    85/102

    Basic concept of Probability

  • 7/28/2019 Symboisis Statistics

    86/102

    Nominal scale A scale of measurement for a variable that uses a label or name

    to identify

    an attribute of an element. Nominal data may be nonnumeric or numeric.

    Ordinal scale A scale of measurement for a variable that has the properties ofnominal

    data and can be used to rank or order the data. Ordinal data may be nonnumeric ornumeric.Interval scale A scale of measurement for a variable that has the properties of

    ordinal

    data and the interval between observations is expressed in terms of a fixed unit ofmeasure.Interval data are always numeric.Ratio scale A scale of measurement for a variable that has all the properties of

    interval

    data and the ratio of two values is meaningful. Ratio data are always numeric.

    Measure of variation

  • 7/28/2019 Symboisis Statistics

    87/102

    Data

    Qualitative or

    attribute

    Discret

    eContinuou

    s

    Type of car owned.

    Color of pens.

    Number of children. Time taken for

    an exam.

    Presentation

    The pineapples are the objects (individuals) of the

  • 7/28/2019 Symboisis Statistics

    88/102

    The pineapples are the objects (individuals) of thestudy. If the researchers are

    interested in the individual weights of pineapples in thefield, then the variable

    consists of weights. At this point, it is important tospecify units ofmeasurement and degree of accuracy ofmeasurement. The weights could be

    measured to the nearest ounce or gram. Weight is aquantitative variable

    because it is a numerical measure. If weights ofall theready-to-harvest pineapples

    in the field are included in the data, then we have apopulation. The average

    weight of all ready-to-harvest pineapples in the field is aparameter.

    (b) Suppose the researchers also want data on taste. A panel of tasters rates thepineapples according to the categories poor, acceptable, and good. Onlysome of the pineapples are included in the taste test. In this case, the variable istaste. This is a qualitative or categorical variable. Because only some of thepineapples in the field are included in the study, we have a sample. The proportionof pineapples in the sample with a taste rating of good is a statistic.

  • 7/28/2019 Symboisis Statistics

    89/102

    OrderedArray

    OgivePolygonHisto-

    gram

    FrequencyDistributions

    NumericalData

    Stem-&-Leaf

    Display

    Numerical (Quantitative)

    Data Presentation

  • 7/28/2019 Symboisis Statistics

    90/102

    Numerical (Quantitative)

    Data Presentation

    summar z nginformation from samples or populations.Inferential statistics involves methods of using information from a sample to

  • 7/28/2019 Symboisis Statistics

    91/102

    draw conclusions regarding the population.

    A simple random sample ofn measurements from a populat ion is a subsetof the population selected in a manner such that every sample of size n fromthe population has an equal chance of being selected.

    Probability

  • 7/28/2019 Symboisis Statistics

    92/102

    Basic concept of Probability

  • 7/28/2019 Symboisis Statistics

    93/102

    Central tendencyMean is summarizing the data in to one fig. summarize a wide range of

    measurements with a single value?Mean X number = TotalWhen there is no trend and values are fluctuating arithmetic mean is a bestrepresentative. Distribution is normal and not skewArithmetic mean > Geometric mean > Harmonic mean

    Probably the least understood, the harmonic mean is best used in situations whereextreme outliers exist in the population. The harmonic mean can be manuallycalculated; however, most people will find it much easier to just use Excel. In Excel,the harmonic mean can be calculated by using the HARMEAN() function

    The arithmetic mean is best used in situations where:

    the data are not skewed (no extreme outliers)the individual data points are not dependent on each other (see the section below forexamples of where data are interrelated, e.g., financial analysis)

    Geometric means are often useful summaries for highly skewed dataWhenthere is growth or trend observed geometric mean is best

    Functions of statisticsSome important functions of statistics are as follows1 To collect and present facts in a systematic manner

  • 7/28/2019 Symboisis Statistics

    94/102

    1. To collect and present facts in a systematic manner.2. Helps in formulation and testing of hypothesis.3. Helps in facilitating the comparison of data.4. Helps in predicting future trends.

    5. Helps to find the relationship between variable.6. Simplifies the mass of complex data.7. Help to formulate polices.8. Helps Government to take decisions.Limitations of statistics

    1. Does not study qualitative phenomenon.

    2. Does not deal with individual items.3. Statistical results are true only on an average.4. Statistical data should be uniform and homogeneous.5. Statistical results depends on the accuracy of data.6. Statistical conclusions are not universally true.7. Statistical results can be interpreted only if person hassound knowledge ofstatistics

    Data collection

    Central tendency and Dispersion

  • 7/28/2019 Symboisis Statistics

    95/102

    Central tendency and DispersionCentral tendency is middle point of distribution measures of central tendency is alsocalled measure of locationDispersion is spread of data in distribution extent to which data is scatteredThere are two more characteristics skewness and kurtosis

    Mean of individual data x/nMean for grouped data (fXx)/n x= midpoint of classArithmetic mean has following advantages1. Simple to understand2. It is one and only one for data set3. Mean is suitable for statistical procedureDisadvantage1.Afected by extreme observation2.It is not representative of whole data

    Weighted average meanGeometric meanMedian

    Basic concept of ProbabilityRolling of die

  • 7/28/2019 Symboisis Statistics

    96/102

    Taking out card from deck of cards

    Probability of 5 or 6 P(5) or P(6) = 1/6+1/6

    Probability of sped and queen P(s) or P(q) = 13/52+4/52-1/52P(s) +P(q)P(s AND q)

    And or are called as operators

    Basic concept of Probability

  • 7/28/2019 Symboisis Statistics

    97/102

    Basic concept of Probability

  • 7/28/2019 Symboisis Statistics

    98/102

    Basic concept of Probability

  • 7/28/2019 Symboisis Statistics

    99/102

    Basic concept of Probability

  • 7/28/2019 Symboisis Statistics

    100/102

    Basic concept of Probability

  • 7/28/2019 Symboisis Statistics

    101/102

    Basic concept of Probability

  • 7/28/2019 Symboisis Statistics

    102/102