Description
Q1.
A study by the ministry of manpower found that the median monthly salary of a woman in full time work was 16.3 percent less than a man in full time work in 2018. Jerry a student at suss would like to understand whether this is consistent with his observation. He collected a sample of annual salary data from online survey distributed through various social media platform.
State the values of three measures of location and three measures of dispersion for annual salaries of both male and female workers in the sample. Explain which of the three location measures is the most appropriate.
Interpret the results to examine gender gap in annual salaries in 2021.
Describe the interquartile range and explain whether there are outliers in both male and female samples.
The difference between the first quartile range and 3rd quartile range is significant means that there is a huge range or variance in data. As observed, the range for both genders is 127200, 118000, which means that there are some outliers.
Explain 2 concerns in the step of data collection.
Question 2
a) a tennis match requires that a player win three of five sets to win the match. If a player wins the first two sets, what is the probability that the player wins the match, assuming that the player has a 80% chance to win each set? Binomial distribution
Success- Win each Set
Failure-Not win a set
P(Success)=0.80 and P(Failure)=1-0.80=0.20
To win a match, the player needs to win 3 sets out of 5 sets.
The player wins the first two sets but needs to win only 1 set to win the game
P(failure)=0.2
P(loosing the remaining 3 sets)= 0.2*0.2*0.2 = 0.008
P(winning at least one of the 3 sets)=1-p(loosing all 3 sets) = 1-0.008 = 0.992
Question 2 b. The mean amount of time students spend per week on the learning system is 5.4 hrs. Assume the spent per week on the learning system follows the normal distribution with a standard deviation of 1.6 hrs. (using the TI-84 calculator)
(i) Identify the share of students who spend less than 5 hrs per week on the learning system.
(ii) if 5% of students spend more than V hours per week on the learning system, what is the value of V?
question 4
According to the ninth world happiness report 2021 compiled by the Gallop World Poll team, Singapore is the happiest country in Southeast Asia followed by Thailand. To understand how happiness is associated with age, gender, and health, Mr Tan surveyed 114 people in Singapore, and collected data on happiness, age, gender, and self-reported health. Table 4-1 shows the variable names and notes. Table 4-2 shows the regression output obtained using The linear equation that shows the relationship between happiness and age,gender and health is given by:
Happiness= β_0+β_1 (Age)+β_2 (Gender)+β_3 (Health)
Hence the Estimated linear equation is
Happiness= 2.508+0.005(Age)-0.299(Male)+0.234(Health)
Each of the Independent variable(Age, Gender and Health) is associated with the Happiness.
If all the variables(Gender, Health) remains the same and there will be only change in the variable Age then as the age increases then the happiness level is also decreases and vice versa .
Similarly Health and Gender is also associated with happiness.
Q4 (A): Describe the relationship between happiness and age, gender and health by writing down the linear equation. Interpret the coefficients to show how each of the independent variable is associated with happiness
QUESTION 4 C: EXECUTE APPROPRIATE hypothesis tests to advise mr tan which variable is significant at the 5% significance level. Write down the steps
Question 4(d): To test multicollinearity, whether health is correlated with age and gender, Mr.tan ran a regression of health against variable age and male, and calculated the variance inflation factor of 1.032. interpret the result and comment on multicollinearity
4E) What are the assumptions required for the multiple linear regression? How should Mr Tan verify the assumptions (5 marks)