Your cart is currently empty!
MS9003 Applied Statistical Methods 1 Official (Closed), Non-Sensitive Assignment 2b This is an individual assignment. Do not share your work with others, and submit only your own work. Plagiarism will result in a loss of marks. Question 1 (25 marks) All necessary graphs and output are to be included in your answers. A research company…
MS9003 Applied Statistical Methods
1
Official (Closed), Non-Sensitive
Assignment 2b
This is an individual assignment. Do not share your work with others, and submit only
your own work. Plagiarism will result in a loss of marks.
Question 1 (25 marks)
All necessary graphs and output are to be included in your answers.
A research company was tasked by a government agency to analyse how 229 applicants for
unemployment benefits felt about the service they have received from the agency. The
company created a survey consisting of nine statements that the applicants were asked to
indicate their agreement with. Each statement response is on a scale of 1 to 7, with 1
indicating “strongly disagree” and 7 indicating “strongly agree”. The data is in worksheet Q1
of the Excel file Data_2b.xlsx, with each row representing the responses of one applicant.
The nine statements are indicated by S1 to S9 in the Excel file. The full statements are given
in the table below.
Variable Full statement
S1 My privacy was taken into account.
S2 I received clear information about my unemployment benefits.
S3 The reception desk staff were friendly.
S4 I felt that I was taken seriously.
S5 It is clear to me what my rights are.
S6 It is easy to find information regarding my unemployment benefits.
S7 I have been told clearly how my application process will continue.
S8 I know who can answer my questions about my unemployment benefits.
S9 The letters I receive have an appropriate tone of voice.
(a) The research company has hypothesized that there are two underlying factors that can
explain the applicants’ responses to the nine statements. Perform a two-factor
principal component factor analysis on the dataset, and rotate the factors using
varimax rotation. Ensure that “Correlation” is selected for “Matrix to Factor” in the
“Options…” menu. Generate both the loading plot and the score plot.
(b) Reproduce both the unrotated and rotated factor loadings and communalities output
from Minitab. Compare and contrast the two sets of values, making sure to state any
similarities and differences you observe between them.
(c) Would you recommend that the factors be rotated in this analysis? Explain.
(d) Using the generated output, classify the nine statements into two groups, with the
statements in each group being explained by the same underlying factor.
(e) Based on how you have grouped the statements in part (d), suggest a suitable
descriptor for each of the two underlying factors.
MS9003 Applied Statistical Methods
2
Official (Closed), Non-Sensitive
(f) On the score plot in part (a), mark out the area of the plot that contains applicants with
high overall satisfaction with the service they have received. Indicate this area with a
. Similarly, mark out the area of the plot that contains applicants with low overall
satisfaction. Indicate this area with a . Briefly explain your answers.
(g) Give two reasons as to why the factor analysis model may be improved by including
more than two common factors.
Question 2 (18 marks)
A sample of size n = 150 was taken of random variables XX1 and XX2 from a population with
three groups (Class = AA, BB, and CC). The objective is to find a relationship between XX1 and
XX2, and the grouping of the population. The data is in worksheet Q2 of the Excel file
Data_2b.xlsx.
(a) Linear discriminant analysis was carried out, resulting in the following unstandardized
discriminant functions. Group centroids are also given. Show step-by-step how to use
the discriminant functions to classify the observation XX1 = 0.582, XX2 = 3.617.
DD1 = 2.23xx1 − 1.43xx2 + 3.65
DD2 = 0.117xx1 + 0.344xx2 − 0.152
(b) Use Minitab to perform linear discriminant analysis on the data. Do not specify prior
probabilities. Complete the following classification matrix (showing counts). Calculate
the misclassification rate of the model on the sample data.
Actual
Predicted AA BB CC
AA
BB
CC
Misclassification rate: ___________
(c) Write down the classification functions necessary for classifying new observations.
Show how to use these functions to classify the observation XX1 = 2.742, XX2 = 0.411.
Group Centroids
Group DD1 DD2
AA −0.647 0.896
BB −2.89 −0.583
CC 3.54 −0.313
MS9003 Applied Statistical Methods
3
Official (Closed), Non-Sensitive
Question 3 (17 marks)
A marketing analyst has data on the following attributes for 20 brands of canned beer:
calories (per 12 oz.), sodium (mg/12 oz.), alcohol (%), and wholesale price ($ per 12 oz.).
The analyst would like to segment the brands into groups with similar attributes to aid in
product marketing. The data is in worksheet Q3 of the Excel file Data_2b.xlsx.
(a) Explain why the data should be standardized before it is clustered.
(b) Use Minitab to perform hierarchical clustering using complete linkage and Euclidian
distance. Include all four quantitative variables, and select the option to standardize the
variables.
Review the table of “Amalgamation Steps”, and explain why k = 4 may be a good
choice for the number of clusters.
(c) Rerun the hierarchical clustering with the selected number of clusters (k = 4). Fill in
the following table (centroids to 2 decimal places).
Cluster Centroids
Variable Cluster1 Cluster2 Cluster3 Cluster4
Calories
Sodium
Alcohol
Price
Cluster Size
(d) Match the following cluster names to the four clusters. Justify your answer using
results of part (c).
(i) “Extra Light” beers (very low calories, very low alcohol content)
(ii) “Light” beers (low calories, low alcohol content)
(iii) “Premium” beers (based on price)
(iv) “Mass market” beers (standard beers with affordable prices)