# Management homework help

Management homework help

As you know, we’ve spent a lot of time studying key concepts in business analytics, but we haven’t spent much time focusing on “how” our learning takes place.  So, in one page or less, please elaborate upon some of the things that you’ve done well to achieve your learning goals and objectives in our course.  In other words, is there anything that you’ve found that works particularly well – e.g., a specific studying strategy used to get through the material?  More importantly, is there anything that you can do to improve upon the process by which your learning takes place?  For example, a refined approach use to enhance peer-to-peer engagement in our discussion boards?  Again, think of learning as a process, and just like any process in business or life, there are things that we can do to improve upon it! Management homework help

here are the notes to help the writing of reflection paper

Q1: Consider the following sample of six gasoline mileages:

32.3,30.5,31.7,31.6,31.4,32.6

Calculate the mean

Calculate the sample variance

Calculate the sample standard deviation

Note: to receive full credit for this problem, show all of your work (that is, solve by hand).

= 31.68

1. s =

Q2. Use the first 30 observations in the mileages data attached to solve for the measurements below.  Explain what each measurement means and how decision-makers can use this information.

Calculate the sample mean

Calculate the sample variance

Calculate the sample standard deviation

Construct a histogram for the data. Management homework help

1. = 0.769
2. s = 0.877

d.

Q3. Suppose that the population of all gasoline mileages is normally distributed with mean µ = 31.5 mpg and standard deviation σ = 0.8 mpg.  Let y denote a mileage randomly selected from this population.  Find the probabilities:

P(30.7 ≤ y ≤ 32.3)

P(31.0 ≤ y ≤ 31.3)

P(y  ≤29.5)

P(y ≥ 33.4)

1. We wish to find .
2. We wish to find
3. We wish to find
4. We wish to find

Q4. Assume that the population of all mileages is normally distributed.  Use the following sample of n = 5 mileages:

32.3,30.5,31.7,31.4,32.6

to find 95% and 99% confidence intervals (please make sure to show all of your work).

A confidence interval for  is

A 95% confidence interval for  is

A 99% confidence interval for is

Q5. The bad debt ratio for a financial institution is defined to be the dollar values of loans defaulted divided by the total dollar values of all loans made.  A random sample of seven Ohio banks is selected.  The bad debt ratios for these banks are 7, 4, 6, 7, 5, 4, and 9%.

The mean bad debt ratio for all federally insured banks is 3.5%.  Federal banking officials claim that the mean bad debt ratio for Ohio banks is higher than the mean for all federally insured banks.  Set up the null and alternative hypotheses that should be used to justify this claim statistically.Management homework help

Assuming bad debt ratios for Ohio banks are normally distributed, use the sample results give above to test the hypotheses you set up in part (a) with a α = .01.  Interpret the outcome of the test.

1. The mean bad debt ratio for all federally insured banks is 3.5%. Federal banking officials claim that the mean bad debt ratio for Ohio banks is higher than the mean for all federally insured banks.  Set up the null and alternative hypotheses that should be used to justify this claim statistically.
2. Assuming bad debt ratios for Ohio banks are normally distributed, use the sample results give above to test the hypotheses you set up in part (a) with a α = .01. Interpret the outcome of the test.

Since we can

reject in favor of

Basic Statistical Concepts. Management homework help

POPULATIONS

Population – the entire collection of elements about which information is desired.

The size of a finance population – i.e., a population that contains a finite measure of elements – is denoted by the symbol N.

Elements can also be referred to as units of analysis, and units of analysis are usually people, places, things, or times – e.g., individuals, countries, organizations, students, years, semesters, or even student-semesters.

We are interested in studying properties of some numerical characteristic of the population elements.

Table 2.1 lists examples of population elements and numeric characteristics (another example might be students’ grade point averages).

Once we have identified a population of numerical values of interest, we often take a sample to estimate the following parameters:

1. Mean – denoted as µ, is the average of the values in the population.
2. Range – denoted as RNG, is the difference between the largest value and the smallest value in the population.
3. Variance – denoted as σ2, is the average of the squared deviations of the values in the population from the population mean µ.
4. Standard deviation – denoted as σ, is the positive square root of the population variance.

Note: We take a sample b/c in most situations, obtaining data on a population parameter would be too costly (e.g., the time associated with measuring the GPAs of all students in the United States).

Statistical inference – the science of using sample data to make a generalization about a population.

Statistical estimation – the science of using information contained in a sample to 1) find an estimate of an unknown population parameter and 2) to place a bound on how far the estimate might deviate from an unknown population parameter.

Example 1:

Consider the population of data analytics students enrolled during the 2016 spring semester.  Solve for the mean, range, variance, and standard deviation of GPA – our numerical estimate of an element (students enrolled in econometrics).Management homework help

PROBABILITY

The concept of probability is employed in describing populations and in using sample information to make statistical inferences.

Experiment -any process of observation that has an uncertain outcome (flipping a coin, drawing a card, rolling a die, asking for a number, etc.).

Event – an experimental outcome that may or may not take place (heads, ace of spades, etc.).

Probability – is a number that measures the chance that the event will occur when the experiment is performed.

If A is an event that may or may not occur when an experiment is performed (e.g., a coin landing on heads when performing the experiment of flipping), then the probability that the event A will occur is denoted P(A).

If the experiment is performed n­EXP­ times and the event A occurs n­A­ of these nEXP­ times, then the proportion of the time event A has occurred is

If we repeat the experiment a number of times approaching infinity, the limit of the sequence of numbers obtained by calculating the ratio n­A­/nEXP­­after each repetition is the probability P(A), or

So, what’s P(A) for the experiment of flipping a coin an infinite number of times?

For all practical purposes, the probability of an event is simply the proportion of the time the event would occur if the experiment were performed a large number of times (since we can’t perform an experiment an infinite number of times).Management homework help

Note: 0<P(A)<1

We can also estimate probabilities from experience – subjective probability.

We can also estimate probabilities from continuous probability distributions – i.e., if we know how a variable is distributed, we can solve for the probability of observing certain values.

RANDOM SAMPLES AND SAMPLE STATISTICS

Remember, the calculation of population parameters requires knowledge of all the values in the population.  If we don’t know all the values, we must randomly select a sample of n values from the population.

We can use sample data to make inferences about the population.

Random sample – we obtain a random sample of n elements from the population if each element has the same probability, or chance, or being selected in the sample.

With replacement – elements selected on a particular selection are placed back into the population for future selections.

Without replacement – we do not place selected elements back into the population for future selections (this is the preferred method).

ORDER NOW

If we have randomly selected a sample of n elements, the values of the numerical characteristic of interest possessed by these elements make up a randomly selected sample of n values.

For i=1, 2, …, n we let y­i­ denote the value of the numerical characteristic under study possessed by the ith randomly selected element.  The set of

denotes the randomly selected sample of n numerical values.

Sample statistic – a descriptive measure of the randomly selected sample of numerical values.

We use sample statistics as point estimates, or estimates that are single numerical values, of population parameters.

Commonly used sample statistics that are used as point estimates include the mean, variance, and standard deviation.

Suppose that sample

has been randomly selected from a population.

The sample mean is calculated as

and is a point estimate of the mean µ.

The sample variance is defined as

and is a point estimate of the variance σ2­. Management homework help

­The sample standard deviation is defined to be

and is a point estimate of the population standard deviation σ.

Note: Make sure that you can reproduce sample statistic calculations in the text. 😉

CONTINUOUS PROBABILITY DISTRIBUTIONS

Before we randomly select a value y from a population, y potentially can be any of the values in the population.

We can use continuous probability distributions to calculate probabilities concerning the value y might attain.

Consider the closed interval from a to b (a<b) on the real number line.  Denoting this interval as [a, b], we often wish to find

which can be written more simply as

This probability can be interpreted as the proportion of values in the population that are greater than or equal to a and less than or equal to b.

Continuous probability distributions assign probabilities to intervals of numbers on the real number line.

Suppose f(v) is a continuous function of numbers on the real number line.  Consider the continuous curve that results when f(v) is graphed.

We can say that the curve f(v) is the continuous probability distribution of y if the probability that y will be in the interval [a,b] is the area under the curve f(v) corresponding to the interval [a,b].

As an example, suppose that the curve f(v) illustrated in Figure 2.1 is the continuous probability distribution of y, the mileage of a randomly selected automobile.

Assume that we wish to find the probability that y will be between 31 and 33 mpg.  Then we would find the area under the curve f(v) corresponding to the interval [31,33].

If this area is equal to .7023, then 70.23% of all mileages are between 31 and 33 mpg. Below is a hypothetical continuous probability curve f(v):

Suppose that the curve f(v) is the continuous probability distribution of y.Management homework help

Then we say that the population of values from which y will be randomly selected is distributed according to the continuous probability curve f(v).

In other words, the population has the continuous probability distribution defined by f(v).

The height of the curve f(v) at a given point on the real line represents the relative probability, or chance, that y will be in a small interval of numbers around the given point.

In other words, the height represents the relative proportion of values in the population that are in a small interval of numbers around the given point.

Two properties satisfied by continuous probability distributions:

1. For any number v, f(v) ≥ 0. Remember the height of the curve represents a relative probability, and probabilities have to be between 0 and 1.
2. The total area under a continuous probability curve equals 1. This holds b/c the total area under f(v) equals the probability that y will fall between -∞ and +∞, and y is sure to fall b/w -∞ and +∞.

THE NORMAL PROBABILITY DISTRIBUTION

Many of the variables we study in business and economics are normally distributed.  The normal distribution is a continuous probability distribution that is defined by the probability curve

.

If a population is distributed according to a normal distribution, we say that y is normally distributed with mean µ and standard deviation σ.

The normal probability distribution is denoted by N(µ,σ), which means that the shape of the normal probability curve depends on the mean and standard deviation of the population.

Important properties of the normal curve are:

1. The normal curve is centered at the population mean µ.
2. The mean µ corresponds to the highest point on the normal curve.
3. The normal curve is symmetrical around the mean (50% above and below the mean).
4. The total area under the normal curve is equal to 1 (since it is a probability distribution).

The population mean centers the normal curve on the real number line.

The variance measures the spread of the normal curve.

Note: Draw two normal curves with different means but the same variance.  Draw two normal curves with the same mean but different variances.

If a variable y is normally distributed with mean µ and standard deviation σ, then

equals the area under the normal curve with mean µ and standard deviation σ corresponding to the interval [a,b]. This is depicted below:

Three important areas under the normal curve are emphasized in the figure below.

1. As you can see = .6826, or 68.26% of the values in the population are within plus or minus 1 standard deviation from the mean.
2. = .9544, or 95.44% of the values in the population are within plus or minus 2 standard deviations from the population mean.
3. = .9973, or 99.73% of the values in the population are within plus or minus 3 standard deviations from the population mean. Management homework help

Example 2:

Suppose the population mean is 31.5 and the population standard deviation is .8, solve for the 68.26%, 95.44%, and 99.73% intervals.

What if we don’t know the true values of the population mean and standard deviation?  Can we obtain sample estimates of the mean and standard deviation to solve for the 68.26%, 95.44%, and 99.73% intervals? Yes!

Perform these calculations if the sample mean is 31.2 and the sample standard deviation is .7517.

Note: The results in example 2 depend on what’s referred to as the normality assumption, or the assumption that the variable under analysis is normally distributed.

We can verify the normality assumption using a large sample (n > 30 observations) and graphical analyses.

GRAPHICAL ANALYSES

Use mileages.dta to create a histogram and a stem-and-leaf diagram.

A frequency distribution allows us to examine how a variable is distributed, and therefore whether the normality assumption is appropriate.

A frequency distribution is simply an arrangement of the values of a variable in ascending order.  Each entry in the table contains the frequency of the occurrences of values within an interval.

In Stata we can create a frequency table using the “tabulate” command.  Using the “plot” option can give you an even better feel for the data. Management homework help

Note: Do this using mileages.dta

Before you can create a histogram using Stata, please move to the correct directory.

First, create a “projects” folder on your c: drive (go to your c drive, right click, and create a folder named “Projects).

In your projects folder, create a folder for our class – e.g., mgt6460.

In your mgt6460 folder, create a “statsreview” folder that contains the mileages.dta, or our “mileages” data.

Once you do this, you can create a histogram in Stata using the following commands:

. cd c:projectsmgt6460statsreview

. use mileages, clear

. list

. histogram mileage

In a histogram, the frequency of values of a variable within an interval is measured on the vertical axis, and the total number of intervals, or bins, is included on the horizontal axis.

We can use a frequency distribution to “see” the chances of occurrence of a particular value of a variable, and how the chances change from one bin to the next.

This can be easily accomplished by creating a relative frequency distribution.

In a relative frequency distribution, the percentage of the occurrence of a particular range of values is plotted on the vertical axis and the total number of intervals is included on the horizontal axis.

Note:The relative frequency for a bin is simply the frequency of the occurrences of values within that interval divided by the sample size.

Note:The number of intervals or “bins” chosen should be the smallest integer K such that 2K> n.  Here n is the total number of observations.  A trick is to set 2K=n, and then just round up to the nearest integer.Management homework help

Note:Interval widths, also known as “class” widths, are formed by dividing the range of the data by the number of class intervals.  The class width for mileage is found as

Note: the class width is the distance between the lower limits of consecutive classes.

Once we have the class width, you can create the first interval by adding 0.6 to the smallest value of the variable, or the smallest mileage.

The result provides the lower class, or the base of the following interval.  Repeat this process until the full table is constructed as follows:

 Interval Frequency [29.8, but less than 30.4) 3 [30.4, but less than 31) 9 [31, but less than 31.6) 12 [31.6, but less than 32.2) 13 [32.2, but less than 32.2) 9 [32.8, but less than 33.4) 3

Or you can construct the table by adding 0.5 to the smallest value to create bin sizes as follows:

 Interval Frequency [29.8, but less than 30.3] 3 [30.4, but less than 30.9] 9 [31, but less than 31.5] 12 [31.6, but less than 32.1] 13 [32.2, but less than 32.7] 9 [32.8, but less than 33.3] 3

This means that to include the smallest measurement and the largest measurement in the K=6 classes, each class should contain 6 measurement values moving from the lower class to the upper class in increments of 0.1.

For example, the first class includes the smallest measurement (the lower class), the largest measurement (the upper class), and all values in between in increments of .1, or 29.8, 29.9, 30.0, 30.1, 30.2, and 30.3.

Finally, use Stata’s “stem” command to create a stem-and-leaf diagram as follows:

. use mileages, clear

. stem mileage

Basic Statistical Concepts Part 2

THE STANDARD NORMAL PROBABILITY DISTRIBUTION (finding normal probabilities)

For every y value there is a z value, thus there is a population of z values corresponding to the population of y values.Management homework help

If y is randomly selected from a normally distributed population with mean µ and standard deviation σ, the z value corresponding to the y value is

Standard normal distribution – if y is normally distributed with mean µ and standard deviation σ, then z is normally distributed with mean 0 and standard deviation 1.

If we subtract µ from the inequality

and divide by σ, we obtain the following inequalities:

Note z­a­ is the z value corresponding to a, etc.  We see that

This is the area under the standard normal curve corresponding to the interval [z­a­, z­].

Example 1:

If the population of all mileages is normally distributed with mean µ = 31.5 and standard deviation σ = .8, what is

Hints: Solve for the z value corresponding to 29.9 and 33.1.  What do these tell us?  Now, use the normal table in the appendix of your book (you can also get one online) to solve for the probability.

Note: A normal distribution table often gives

for values ranging from .00 to 3.09.  Looking at this table, we see that

Since the standard normal curve is symmetrical, it must be that

To perform the calculations involved in some statistical inference procedures, we need to find the z value such that the area to its right under the standard normal curve is γ, or z­γ­.

The point on the scale of the standard normal distribution is called the critical z value.  This value is often compared to the z values for sample statistics.

Example 2:

Find z­[.025]­.

Hints: The area under the standard normal curve between zero and z­[.025]­ must equal .5-.025 = .475.  Looking a normal distribution table sets the z value corresponding to an area of .4750 equal to 1.96.

This says that we must be 1.96 standard deviations above the mean to obtain a right-hand tail area of .025.Management homework help

THE t-DISTRIBUTION, THE F-DISTRIBUTION, AND THE CHI-SQUARE DISTRIBUTION

Sometimes a population has what is called a t-distribution.  We use a t-distribution for small sample sizes.

Google search “t-distribution” to review the properties of a t-distribution.  Here is a great video:

To perform calculations involved in some statistical inference procedures, we need to find the t value such that the area to its right under the t-distribution is γ, or .

In general, we refer to point as the point on the scale of the t-distribution having df degrees of freedom such that the area under the curve to the right of this point is γ.

The point on the scale of the t-distribution is called the critical tvalue.

This value is often compared to the t values for sample statistics.

We can find this point using a t-table (there is one in the back of the book, but you can find one online, too).Management homework help

The F-Distribution

Sometimes a population has an F-distribution.

What are the properties of the F-distribution?

To perform calculations involved in some statistical inference procedures, we need to find the point on the scale of the F-distribution having r­1­ and r­2­ degrees of freedom such that the area under this curve to the right of this point is γ.

This point is denoted as .

The point on the scale of the F-distribution is called a critical F value.

This value is often compared to the F values for sample statistics.

We can find this point using an F-table similar to the one in the back of the book (again, there are lots of examples online, too, just “Google” F-table).Management homework help

The Chi-square Distribution

Sometimes a population has a chi-square distribution.

Here is a great video that reviews this distribution:

The exact form of the curve depends on a parameter that is called the number of degrees of freedom and is denoted df.

We refer to the point  as the point on the scale of the chi-square distribution having df degrees of freedom such that the area under this curve to the right of this point is γ.

CONFIDENCE INTERVALS FOR A POPULATION MEAN

We can use the sample mean  and sample standard deviation s to solve for an interval estimate of the population mean, or a confidence interval.

Why?  The mean does not provide any indication of how close it is to the population mean µ.

A 100(1-α)% confidence interval for the population mean µ based on the t-distribution can be solved for as follows (α is known as the level of significance and 100(1-α)% is known as the level of confidence):

Remember, is simply a critical t value.

Also note that as the number of degrees of freedom approaches 30 or so, we can use the critical z value.

Example 3:

Federal gasoline standards state that µ, the mean gasoline mileage obtained by the fleet of all Hawks, must be at least 30 mpg.  To demonstrate that this standard is being met, National Motors randomly selects a sample of n=5 Hawks and tests them for gasoline mileage.  If the sample

is obtained, solve for the 95% confidence interval for µ.Management homework help

Input this data in Stata using the “input” command as follows:

.input mpg

1. 7
2. 8
3. 2
4. 0
5. 3
6. end

Solve for the confidence interval using the command “ameans” as follows:

. ameans mpg

You can also solve for the confidence interval by hand using the formula above.  Since 100(1-α)%=95% implies that α=.05, we use

The Derivation of the Interval

Sampling distribution–A sampling distribution can be obtained by takingrepeated samples of a population, calculating the mean for each sample, and examining the distribution of the sample means.

An important result is that the mean of a sampling distribution is equal to the population mean, and that the sampling distribution of the mean is normally distributed regardless of the distribution of the population of individual values.

To derive the 100(1-α)% confidence interval, we state several important results.

The population of all possible sample means

1. Has mean (the mean of the means is the population mean)
2. Has variance (the variance of the sampling distribution, or the variance of the distribution of sample means; if the population sampled is infinite)
3. Has standard deviation (the standard deviation of the sampling distribution; if the population sampled is infinite)
4. Has a normal distribution (if the population sampled has a normal distribution)

Since , or since the average of the sample means equals the true population mean, we say that the sample mean  is unbiased, or that we are using an unbiased estimation procedure (in arriving at ).

Also since , the standard deviation of the population of all sample means decreases as the sample size n increases.

Since each possible sample mean is an average of n sample values, the sample mean “averages out” high and low sample values.

Thus, we’d expect the sample means to be more closely clustered around µ than the individual population values.  That is, intuitively, , or the standard deviation of all sample means, should be smaller than σ, the standard deviation of the individual population values.

This is evident in the formula of the standard deviation of the population of all sample means  since we divide the population standard deviation σ by .Management homework help

Example 4:

If the true values of µ, σ2, and σ are 31.5, .64, and .8, then what are the values for ?

Moreover, assume that the population of all mileages is normally distributed.  If this is the case, then the population of all possible sample means is also normally distributed.  Thus, 95.44% of all possible sample means lie in the interval

Note: this interval is narrower than the interval containing 95.44% of the individual mileages (example 2.3 in the text).

Results 1, 2, and 3 from above imply that if the population that is sampled is normally distributed, then the population of all possible values of

has a standard normal distribution.

Note: we estimate by , which is called the standard error of the estimate .

Then it can be proved that if the population that is sampled is normally distributed, the population of all possible values of

has a t-distribution with n-1 degrees of freedom.  This implies that

is the area under the curve of the t-distribution having n-1 degrees of freedom between  and

Just as before, the probability a particular t-value is greater than a lower critical t-value, but less than an upper critical t-value is simply the area under the curve in between the lower and upper critical t-values.

We can define these values for any given significance level, or for any given α.  Thus,

The probability that a particular t-value is greater than the lower critical t-value with .05/2 = .025 in the lower tail of the distribution, but less than the upper critical t-value with .05/2 = .025 in the upper tail of the distribution is 1-.5, or .95.

Multiply the inequality in the probability statement by to get

Subtracting through by  implies

Multiplying the above inequality by -1 gives

This can be written as

This says that the proportion of confidence intervals containing the population mean µ in the population of all possible 100(1-α)% confidence intervals for µ is equal to 1-α.

Thus, if we compute a 100(1-α)% confidence interval confidence interval for µ by using the formula

Then 100(1-α)% of the confidence intervals in the population of all possible 100(1-α)% confidence intervals for µ contain µ, and 100(α)% of the confidence intervals in this population do not contain µ.Management homework help

Confidence Intervals Based on the Normal Distribution

The preceding confidence interval is based on the t-distribution.  It assumes that the population sampled is normally distributed.  We now consider a confidence interval that is valid for any population.

The central limit theorem states that if the sample size n is large (greater than 30), then the population of all possible sample means is approximately normal with mean  = µ and standard deviation  = , no matter what probability distribution describes the population sampled.

ORDER A PLAGIARISM-FREE PAPER NOW

Therefore, if n is large, the population of all possible values of[1]

approximately has a standard normal distribution (the distribution of z values is a standard normal distribution).  This implies that

are approximately correct 100(1-α)% confidence intervals for µ, no matter what probability distribution describes the population sampled.  We can derive these intervals using a process similar to that followed above.

Note: the 2nd interval follows from the first by approximating σ by s.

Note: a more precise statement of the central limit theorem says that the larger the same size n is, the more nearly normally distributed is the population of all possible sample means.  Also, the larger n is, the smaller is .

In summary, when we do not know the population standard deviation σ, we should use the 100(1-α)% confidence interval for µ based on the normal distribution

if the sample n is large.

If the sample n is small and the population is normally distributed, we should use the 100(1-α)% confidence interval for µ based on the t-distribution

In both cases, if we know the population standard deviation, use σ, and if not, use s.

Note: please review example 2.7 in the text.  Also, please review the Stata code for this example.Management homework help

HYPOTHESIS TESTING FOR A POPULATION MEAN

We sometimes wish to test the null hypothesis, H0: µ=c versus the alternative hypothesis, Ha: µ ≠ c.  Here µ is the population mean, which isestimated by , and c is an arbitrary constant.

If our sample statistic has a normal distribution – i.e., if the distribution of all of the sample means is normal, then we can see how far apart our sample statistic, or our estimate of the population mean, is from the hypothesized value by calculating

Remember,  and s are the mean and standard deviation of a sample of size n that has been randomly selected from the population having mean µ.

Further, the sample mean is an unbiased estimator of the population mean – i.e., although the sample mean  does not equal the population mean µ, the average of all of the different sample means that we could have calculated is equal to µ.

If the population is normally distributed, then we know the sampling distribution of the mean, and once we know the sampling distribution of the mean, we can make probabilistic statements about sampled values versus hypothesized values.

Think of the hypothesized value as a reference point – our test centers the sampling distribution of the mean on this value.

So, if the calculated value of our sample statistic is far from the hypothesized value (where distance is measured in standard deviations), then we have statistical evidence that the population mean is different than the hypothesized value c.

For example, if the t-statistic is a large positive value, this provides evidence to support rejecting H0 in favor of Ha b/c the point estimate  indicates that µ is greater than c.

This likely warrants some sort of intervention by the firm.  Interventions can be costly, but waiting too long to intervene can be even more costly.

A test statistic nearly equal to zero results when  is nearly equal to c – such a test statistic provides little or no evidence to support rejecting H0 in favor of Ha. This is so b/c the point estimate  indicates that µ is nearly or exactly equal to c.

A type 1 error is committed if we reject H0 when it is true.  The probability of a type 1 error is α.  Why?  If a researcher chooses a significance level of α = .05, we reject the null hypothesis when it is true about 5% of the time.[2]

If the distribution of all possible sample means is normal with µ = c, then we’d expect to see sample means greater than and less than critical t-values about 5% of the time.Management homework help

That is, if the population is normally distributed with mean µ, we can reject H0: µ = c in favor of Ha: µ ≠ c by setting the probability of a type 1 error equal to α if and only if

The points  and  are called rejection points, or critical values, because they tell us how different from zero t must be for us to be able to reject H0 by setting the probability of a type 1 error equal to α.

A type 2 error is committed if we do not reject H0 when it is false.

Why does using this rejection point procedure ensure that the probability of a type 1 error equals α?

Recall that if the population sampled is normally distributed with mean µ, then the population of all possible values of

has a t-distribution with n-1 degrees of freedom.

It follows that if the null hypothesis H0: µ = c is true, then the population of all possible values of the test statistic

has a t-distribution with n-1 degrees of freedom.

Thus, using the above rejection points says that if H0: µ = c is true, then the probability that

is 1-α.  95% of all possible values of t are in between these points.

Further, the probability that

is α.  5% of all possible values of t are to the left or to the right of the rejection points, which leads us to reject the null hypothesis when it is true.

Follow the following steps to hypothesis testing:

1. State in advance the significance level, or α.
2. State in advance the decision rule, or the H0 and Ha
3. Compute the test statistic.
4. Compare the test statistic to the critical value(s), or the rejection point(s).
5. Reject or fail to reject the null hypothesis H0: µ = c

Example 5:

1. Use a significance level of α = .05.
2. G&B Corporation will randomly select a sample of n = 6 bottle fills from its bottle-filling process to test the following hypotheses[3]:

H0: µ = 16

Ha: µ ≠ 16

1. If G&B observes the following sample of n = 6 bottle fills:

Compute the test statistic as

1. Compare the absolute value of the test statistic to the critical value of .
2. Reject the null hypothesis H0: µ = 16 in favor of Ha: µ ≠ 16 since 3.2 > 2.571.

How can we solve for this in Stata?Management homework help

First, input the data as

.input fills

.15.68

.16.00

.15.61

.15.93

.15.86

.15.72

.end

. ttest fills = 16

. save fills, replace

That’s it!  You even get a 95% confidence interval with your ttest – since the hypothesized value is not contained in the interval, we can reject the null hypothesis.

One-Tailed Hypothesis Tests

Note: in the gasoline problem, recall that mileage standards state that the mean mileage µ must be at least 30 mpg.

Here we might be tempted to stay that National Motors can “prove” that µ ≥ 30 if it can accept the null hypothesis H0: µ ≥ 30 instead of the alternative hypothesis Ha: µ < 30.

However, hypothesis testing seeks to find how confident we can be that the null hypothesis should be rejected in favor of the alternative hypothesis.  It does not seek to find how confident we can be that the null hypothesis should be accepted.

Therefore, we cannot use hypothesis testing to “prove” that a null hypothesis is true.

In conducting one-tail tests, do the following:

1. State what you wish to justify in the form of a strict inequality (<, >, or ≠), and make it the alternative hypothesis Ha.
2. State what we’d expect if the alternative hypothesis is false, and make this statement the null hypothesis H0.

Example 6:

1. Use a significance level of .05.
2. National Motors will randomly select a sample of n = 5 mileages to test the following hypotheses: Management homework help

H0: µ ≤ 30

Ha: µ > 30

1. Using our mpg data, we saw that = 31.2 and that s = .7517.  Thus

1. Compare the value of the test statistic to the critical value of .
2. Reject the null hypothesis H0: µ ≤ 30 in favor of Ha: µ > 30 since 3.569 > 2.132.

In Stata, do the following:

. use mpg, clear

. ttest mpg = 30

Note: This gives results for one-sided and two-sided tests.

Example 7:

Suppose National Motors wishes to claim in an advertisement that the Hawk’s mean stopping distance is less than 60 feet, the value claimed by its competitors.  Do the following:

1. Use a significant level of .05
2. National Motors will randomly select a sample of n = 64 stopping distances to test the following hypotheses:

H0: µ ≥ 60

Ha: µ < 60

1. Suppose the sample mean and standard deviation are = 58.12 feet and s = 6.13 feet.  Thus,

1. Compare the value of the test statistic to the critical value of .[5]
2. Since -2.45 < -1.645, reject the null hypothesis H0: µ ≥ 60 in favor of the alternative hypothesis Ha: µ < 60.

National motors will be allowed to make the television claim that µ < 60.

Note: Remember to use z-scores instead of t-scores for large samples.

Using p-Values

The p-value is twice the area under the curve of the t-distribution having n-1 df to the right of the absolute value of the calculated t-value.Management homework help

You can think of the p-value as the area to the left or to the right of the calculatedt-value for a one-sided test, or twice the area to the right of the calculated t-value for a two-sided test.

In other words, if the null hypothesis is true, the p-value is the probability we observe a sample statistic at least as large as that observed.

Thus, if p < α, reject the null hypothesis H0: µ = c.

P-values are automatically calculated in Stata.

. use fills, clear

. ttest fills = 16

What is the p-value?

[1] Remember, these are just z-values.

[2] Note: The significance level is chosen by the researcher in advance of conducting a hypothesis test.

[3] We assume that the infinite population of bottle fills is normally distributed, or at least mound shaped

[4] Our t-statistic is smaller because we are putting a larger percent of the area under the curve in one tail, versus that same percent in two tails.

[5]Why do we use the z-value instead of the t-value? Management homework help