User:StatsLab/LabNotebook

Lab1:
The objective of this lab is to get accustomed to the various aspects of graphing with Minitab, and to use various types of graphs to analyse data about aphasia. To do this we start by drawing a bar chart of the data to see which form of aphasia is the most recurrent:

Using a bar chart instead of a list of written values it is far easier for the user to analyse data. In this case we can clearly see that the most recurrent type of aphasia is anomic. The second the conduction type of aphasia, followed by the brocas. Next we will take into consideration the same data, except that in this case a pie chart will be used. The following is the pie chart for the types of aphasia data:

We can see that this graph is even more “graphically friendly”, as the proportions of the results stand out more visibly than the bar chart. However it lacks in explicit numerical explanation. We obviously cannot figure out the exact numbers of the aphasia types just by looking at the image. The next part of this lab consists of analysing the miles per gallon results from a sample. To do this we will use a Histogram and a Stem and leaf display.

histogram for the MPG data:

The advantage of this type of graph is a more in-depth study of whichever data is considered. In this case, thanks to the histogram, we can see that the sample vaguely follows a normal distribution with a mean of 36.99 and a standard deviation of 2.418. Now we use a stem and leaf diagram to display the same data.

As we can see, this is a less user-friendly analysis that is still very useful as it shows the frequency of a particular type of data.

Lab2:
Question 1

Here are the different graphs we did for this lab:

There is a noticeable difference between the graphs; as the sample size increases the standard deviation approaches the values given. This is also true of the mean. As the sample size increases the outliers generated by minitab have less of an impact on the normality of the graphs.

Comparing the histogram of MPG with 4 other similar graphs whicg are not shown here, we saw that there was a greater variance in the mean and standard deviations values between each graph.

Question 2

Below are the results from the descriptive statistics tool in MiniTab. This allows us to view simple statistical findings like mean, standard deviation and median.

Because of the small amount of data listed below the results returned are not as useful as they would be with a greater sample set.

Lab 3
For this lab we had to generate normally distributed numbers with a mean of 30 and a standard deviation of 4 and then analyse them using the descriptive statistics option in Minitab.

First we used columns with only 10 random normal data.

Then we used columns with 100 and 10,000.

We found that the values for the Standard Error in the mean in our columns and the predicted values from the Bienayme formula agreed.

Bienaymé formula:

The larger our sample size the closer our values came to the Bienayme formula values. We noticed during this lab that the larger our sample size, the smaller the Standard Error in the mean. We noticed that the Standard Error in the mean is inversely proportional to the square root of the number of samples.

We repeated this experiment for exponentially distributed numbers.

We noted the Standard Error in the mean for the exponential distribution was much smaller and that it also reduced significantly with a large increase in sample size.

We also repeated this experiment with a lognormal distribution.

We noted the Standard Error in the mean for the lognormal distribution was quite variable for N = 100 but the variance in the SE mean reduced significantly with a large increase in sample size.

Lab 4
In this lab we will analyse Mendenhall studies by graphing different types of distribution as the number n approaches 30 (for simplicity reasons the n will actually be 25, as Minitab doesn't suppport n = 30 for the exponential distribution). For all the distribution the numbers will range from -1.0 to 1.0 to make them comparable.

First, a uniform distribution of numbers is taken into account. When we consider a list of 100 numbers we see that the result is not even close to a normal distribution:

In fact we can see that the end result is closer to the "triangle" shown in the picture.

The previous graphs were generated with n = 2. Now we will graph the same distribution with n = 25 to confirm Mendenhall's claims:

It is clear that the distribution shown here approaches a normal distribution. For further confirmation of this we will plot the normality test for this distribution:

Now we will show the graph for n = 2 and n = 25 if a normal distribution of data is taken into account.

For n = 2:

and for n = 25:

Next we will analyse Mendenhall's theory on the exponential distribution. in this case though it should take far more data for the exponential distribution to approach a normal one. In fact we can see that the first graph drawn with n = 2 is far from being simmetric:

After generating the exponential distribution with n = 25 we see that it is close to the result predicted by Mendenhall.

To conclude, our results support the claim that enough data is 30 because:

1. As the data set approaches 30 the distribution of the means of the data becomes more normal (eg: with n = 2, we experience a triangular shape in the data however  as n approaches 30, the data becomes more normal).

2. As the data set approaches 30 the normality test products data which closely resembles a straight line. However in the case of the exponential graph more samples would be required to get a better approximation.

Lab 5
We got mintab to calculate the regression equation on our AdSales data, this is our result:

We calculated the regression equation ourselves in excel using the formulae for the least squared estimates:

As you can see our values for B0 and B1 in excel and minitab match.

We got mintab to calculate the regression equation on our OJuice data, this is our result:

These are our excel calculations for Sweetness of Orange Juice data:

` As you can see our values for B0 and B1 in excel and minitab match.

(6)

(a) y = -0.0023x + 6.25

(b) The value for B1 indicates that as the pectin level in the orange juice increases the sweetness index decreases. The value for B0 indicates that if there was no pectin present in the orange juice it would have a sweetness index of 6.25.

(c)	y =	5.558880144

lab 7
The objective of this lab is to use excel to produce the same results obtained with Minitab regarding the statistical analysis of a set of values. These values regard the regression analysis of fire damage related to distance from it.

lab 8
Our minitab regression on the ADELMAN AND WATKINS data:

The regression equation for this model is: Tx Value (MM$) = 5.37 Oil Reserves (MMBBL) + 0.444 Gas Reserves (BCF)

4)	The data shown by minitab gives us very strong evidence of the price of gas and oil being very close to the figures already calculated. t-values of 6.80 and 8.33 tell us that we are more than 99.9% confident that the coefficients for oil and gas reserves are really good estimates. In fact, to obtain a 99% confidence interval the t-size should be equal to 2.457 (this is computed with a d.f. of 30, which is the closest figure available in the t-table).

5) In this section we analyze the minitab output for Confidence Intervals (C.I.) and Prediction Intervals (P.I.): A 99% Confidence interval means that we are 99% confident that the population mean will lie within the range of the confidence interval. A 99% Prediction Interval means that we are 99% confident that the next observation will lie within the range of the prediction interval.

Now we will analyse a few outputs obtained from minitab:

1 394.97   19.61  (341.27, 448.67)  (291.21, 498.73)XX 4 132.90    7.37  (112.72, 153.08)  ( 41.86, 223.94) 10   89.50   12.95  ( 54.03, 124.97)  ( -6.10, 185.10) 25  116.52    6.48  ( 98.79, 134.26)  ( 25.99, 207.05)

in the first case we are 99% confident that the population mean equals 394.97 + or - 53.7. As regard the P.I. the data tells us that the next observation will lie between 291.21 and 498.73 with a confidence of 99%. Same concept is applied to the next few values. An exception occurs in the tenth observation recorded, as in the prediction interval the range goes from -6.10 to 185.10. But we know that 6.1 only has statistical relevance while in real life this is not a realistic result (it would mean giving away oil and having to pay someone to take it).

Lab 9
Analysis of Regression

This regression model is a good fit because it has a high F-value(124.287) and low P-value(0.00000) for the regression. It also has a high R-sq value(95.98%).

However we feel that this model could be improved as it has a lot of variables some of which are not statistically significant. e.g. AGE*AGE has a P-value of 0.914

Here is an improved general regression model using just two variables AGE and AGE-Bid who are the two most important terms according to our scatterplots.