1(1)

Based on the given data, the scatter plot of CCI vs Median Household Income is plotted.

From the scatter plot, it can be seen that the data points appear to be scattered around a straight line with positive slope, such that there appears to be a line trend between the Median Household Income and CCI. Therefore it is likely that there is a positive linear relationship between the Median Household Income and CCI.

1(2)

A correlation analysis is carried out between the variables Median Household Income and CCI, and the result is shown below:

**Correlation: CCI, Median Household Income ($1000) **

** **

Pearson correlation of CCI and Median Household Income ($1000) = 0.829

P-Value = 0.006

From the result, it can be seen that the correlation coefficient is 0.829. This implies that there is a strong positive linear relationship between Median Household Income and CCI.

The coefficient of determination r2 can be calculated by taking the square of the correlation coefficient.

r2 = 0.829^2 = 0.6872

The coefficient of determination r2 being 0.6872 implies that 68.72% of variation in CCI can be explained using the Median Household Income

Also 1 – r2 = 1 – 0.6872 = 0.3128

This implies that 31.28% of variation in CCI cannot be explained using the Median Household Income.

1(3)

A regression analysis is carried out between the variables CCI and Median Household Income such that a linear model in predicting CCI is to be fitted using Median Household Income as the independent variable. The result is shown below:

**Regression Analysis: CCI versus Median Household Income ($1000) **

** **

Analysis of Variance

Source DF Adj SS Adj MS F-Value P-Value

Regression 1 2829 2829.0 15.43 0.006

Median Household Income ($1000) 1 2829 2829.0 15.43 0.006

Error 7 1283 183.3

Total 8 4112

Model Summary

S R-sq R-sq(adj) R-sq(pred)

13.5389 68.80% 64.34% 50.08%

Coefficients

Term Coef SE Coef T-Value P-Value VIF

Constant -599 176 -3.41 0.011

Median Household Income ($1000) 19.22 4.89 3.93 0.006 1.00

Regression Equation

CCI = -599 + 19.22 Median Household Income ($1000)

From the normal probability plot of the CCI, it can be seen that the data points all lying close to a straight line. This implies that the residual is approximately normally distributed. Also from the residual vs fitted value plot, it can be seen that the data points appeared to be randomly scattered about the zero value horizontal line. This implies that the residual has no relationship with the fitted value. Therefore the linear regression model is adequate.

Based on the result of regression analysis, the fitted model is found to be:

CCI = -599 + 19.22 Median Household Income ($1,000)

The slope coefficient of 19.22 implies that when the Median household Income increases by $1,000, the CCI is expected to increase by 19.22.

1(4)

From the result of the regression analysis, the standard error of the estimate is found to be 13.5389. This implies that standard deviation of the residual in the regression is 13.5389. The descriptive statistics of CCI alone is shown below:

**Descriptive Statistics: CCI **

** **

Variable N N* Mean SE Mean StDev Minimum Q1 Median Q3 Maximum

CCI 9 0 91.66 7.56 22.67 61.60 67.20 91.50 110.70 125.40

The standard deviation of CCI alone is 22.67. Therefore with the Median Household Income, the standard deviation decreased from 22.67 to 13.5389.

1(5)

In order to determine whether there is significant linear relationship between the CCI and the Median Household Income, a hypothesis test is carried out on the slope term of the regression.

Null hypothesis: The slope term of the regression is zero.

Alternative hypothesis: The slope term of the regression is not zero.

Take the level of significance for the test be 0.05.

From the result of the regression, the test statistic for the slope term is 3.93.

The degree of freedom is 9 – 2 = 7, the corresponding p-value is 0.006

Since the p-value of 0.006 is smaller than the level of significance of 0.05. The null hypothesis is rejected. Therefore it can be concluded that the slope term of the regression is not zero, and thus there is significant linear relationship between CCI and Median Household Income.

2.

Based on the given data, a multiple linear regression model is built in predicting the average price of gold from the price of copper, silver and aluminum. The result is shown below:

**Regression Analysis Homework: Gold ($ per versus Copper (cent, Silver ($ pe, Aluminum (ce **

** **

Analysis of Variance

Source DF Adj SS Adj MS F-Value P-Value

Regression 3 104445 34815.0 12.19 0.002

Copper (cents per lb.) 1 18 18.1 0.01 0.939

Silver ($ per oz.) 1 69634 69633.6 24.38 0.001

Aluminum (cents per lb.) 1 26208 26208.3 9.18 0.016

Error 8 22845 2855.7

Total 11 127291

Model Summary

S R-sq R-sq(adj) R-sq(pred)

53.4386 82.05% 75.32% 66.28%

Coefficients

Term Coef SE Coef T-Value P-Value VIF

Constant -51.6 87.8 -0.59 0.573

Copper (cents per lb.) 0.070 0.875 0.08 0.939 1.44

Silver ($ per oz.) 18.78 3.80 4.94 0.001 1.08

Aluminum (cents per lb.) 3.54 1.17 3.03 0.016 1.41

Regression Equation

Gold ($ per oz.) = -51.6 + 0.070 Copper (cents per lb.) + 18.78 Silver ($ per oz.)

+ 3.54 Aluminum (cents per lb.)

Fits and Diagnostics for Unusual Observations

Gold ($ Std

Obs per oz.) Fit Resid Resid

10 448.0 341.4 106.6 2.11 R

R Large residual

First of all, from the normal probability plot, it can be seen that all the data points of the residuals lie around a straight line. This implies that the residuals can be approximated by normal distribution. This is also verified by the histogram of the residual which demonstrates a shape close to normal distribution. From the plot of residual vs fitted value, it can be seen that the data points appeared to be randomly scattered about the zero value horizontal line. This implies that the residual has no relationship with the fitted value. Therefore the linear regression model is adequate.

The coefficient of determination R2 is found to be 82.05%. This implies that 82.05% of variation in price of gold can be explained by the model. As a result, the R can be calculated by R = √82.05% = 0.9058. From the coefficients in the regression equation, it can be seen that the coefficients are all positive, and thus it is reasonable to believe that R is also positive in value, and the multiple correlation implies a strong positive correlation between the independent variables and the price of gold. Also 1- R2 = 1 – 82.05% = 17.95%. This implies that 17.95% of variation in gold price cannot be explained by the model. The adjusted R2, which take into account the effect of multiple independent variables, is found to be 75.32%. This implies that after taking into account multiple independent variables effect, the variation explained by the model is 75.32%.

The regression equation is found to be:

Price of gold ($ per oz.) = -51.6 + 0.070 Copper (cents per lb.) + 18.78 Silver ($ per oz.) + 3.54 Aluminum (cents per lb.)

The coefficient of Copper implies that given the price of Silver and Aluminum remain unchanged, when the price of Copper increase by 1 cent per lb. The price of gold is expected to increase by $0.07 per oz. The coefficient of Silver implies that given the price of Copper and Aluminum remain unchanged, when the price of Silver increase by $1 per oz, the price of gold is expected to increase by $18.78 per oz. The coefficient of Aluminum implies that given the price of Copper and Silver remain unchanged, when the price of Aluminum increase by 1 cent per lb., the price of gold is expected to increase by $3.54 per oz.

The standard error of estimate in the regression is found to be 53.4386. This implies that the standard deviation of the residual in the regression is 53.4386. The descriptive statistics for gold price alone is shown below:

**Descriptive Statistics: Gold ($ per oz.) **

** **

Variable N N* Mean SE Mean StDev Minimum Q1 Median Q3 Maximum

Gold ($ per oz.) 12 0 388.1 31.1 107.6 161.1 328.8 379.3 445.5 613.0

The standard deviation of gold price alone is 107.6. Therefore with the regression model, the residual is decreased by more than half.

To check for the overall significance of the model, the hypothesis test on the slope terms is carried out.

Null hypothesis: All the regression coefficients are zero.

Alternative hypothesis: At least one of the regression coefficient is not zero.

The level of significance for the test is set to be 0.05.

From the output of the regression, the test statistic is 12.19.

The degree of freedom of numerator is 3. The degree of freedom of denominator is 8. The p-value is 0.002.

Since the p-value of 0.002 is smaller than the level of significance of 0.05, the null hypothesis is rejected. Therefore it can be conclude that at least one of the regression coefficients is non-zero, and thus the overall model is significant.

For each of the regression coefficient:

For Copper, the t value is 0.08, and the p-value is 0.939. Since the p-value is not smaller than 0.05, the variable Copper is not significant.

For Silver, the t value is 4.94, and the p-value is 0.001. Since the p-value is smaller than 0.05, the variable Silver is significant.

For Aluminum, the t value is 3.03, and the p-value is 0.016. Since the p-value is smaller than 0.05, the variable Aluminum is significant.