MS9002-Data Mining Techniques Using Knime/Weka Assignment 2

$ 10.00

MS9002: Data Mining Techniques Assignment 2 1 Instructions You should hand in a document report of not more than 20 pages length, inclusive of charts (if any) and KNIME workflows through Politemall’s assignment submission portal. Your submission should be .doc/.docx/.knwf/. Late submissions will be penalised. Task description The data you have was used to study…

Description

MS9002: Data Mining Techniques

Assignment 2

1

Instructions
You should hand in a document report of not more than 20 pages length, inclusive
of charts (if any) and KNIME workflows through Politemall’s assignment submission
portal. Your submission should be .doc/.docx/.knwf/. Late submissions will be
penalised.

Task description
The data you have was used to study credit card default behaviour in Taiwan. The
variables in this dataset are described below:
This research employed a binary variable, default payment (Yes = 1, No = 0), as
the response variable. This study reviewed the literature and used the following
23 variables as explanatory variables:

ID Description

X1 LIMIT_BAL Amount of given credit in NT dollars (includes individual and

family/supplementary credit)
X2 SEX Gender (1=male, 2=female)
X3 EDUCATION (1=graduate school, 2=university, 3=high school, 4=others)
X4 MARRIAGE Marital status (1=married, 2=single, 3=others)
X5 AGE Age in years
X6 PAY_0 Repayment status in September 2005
X7 PAY_2 Repayment status in August 2005
X8 PAY_3 Repayment status in July 2005
X9 PAY_4 Repayment status in June 2005
X10 PAY_5 Repayment status in May 2005
X11 PAY_6 Repayment status in April 2005
X12 BILL_AMT1 Amount of bill statement in September 2005
X13 BILL_AMT2 Amount of bill statement in August2005
X14 BILL_AMT3 Amount of bill statement in July 2005
X15 BILL_AMT4 Amount of bill statement in June 2005
X16 BILL_AMT5 Amount of bill statement in May 2005
X17 BILL_AMT6 Amount of bill statement in April 2005
X18 PAY_AMT1 Amount of previous payment in September 2005
X19 PAY_AMT2 Amount of previous payment in August 2005
X20 PAY_AMT3 Amount of previous payment in July 2005
X22 PAY_AMT4 Amount of previous payment in June 2005
X22 PAY_AMT5 Amount of previous payment in May 2005
X23 PAY_AMT6 Amount of previous payment in April 2005
Default
payment
next
month

Default payment next month (1=yes, 0=no)

MS9002: Data Mining Techniques

Assignment 2

2

The measurement scale for the repayment status is:
-2 balance paid in full and no transactions this period
-1 balance paid in full but account has a positive balance at the end of
period due to recent transactions for which payment is not yet due
0 customer paid the minimum due amount but not the entire balance
1 payment delay for one month
2 payment delay for two months
8 payment delay for eight months
9 payment delay for nine months and above
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
0 = Customer paid the minimum due amount but not the entire balance. i.e., the
customer paid enough for their account to remain in good standing but did leave a
balance
-1 = Balance paid in full, but account has a positive balance at end of period due
to recent transactions for which payment has not yet been due.
-2 = Balance paid in full and no transactions this period (we may refer to this
credit account as having been ‘inactive’ this period)
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Your task
1. Load Assignment 2 Data.arff using arff node. Ensure the headers are defined.
Evaluate and compare at least two predictive models for predicting customer
default. Using WEKA or H2O on same technique will not be accepted.
2. Study and report on the risk factors associated with credit default.
Possible risk factors include the features/attributes of the data set and
includes the variables relating to payment history and status (variables X6-
X23).
3. Note that you may also derive additional features from the underlying
dataset and study their relation to default risk. You must indicate why
additional fields are required. If you have dropped attribute(s), also
indicate why these attributes are redundant. You must mention in detail what
techniques you have adopted for data cleaning, transformation, conditioning
etc.
4. You will be assessed on whether you have chosen a suitable evaluation
metric, optimisation, and the persuasiveness of your recommendation. Your
report / study need include a final predictive model with a profile of a
typical credit card defaulter. Create a cost matrix scenario for the chosen
prediction model. Use the cost matrix as part of your evaluation.