University of Huddersfield Data Mining and Data Report

Publish By: Admin,
Last Updated: 10-May-24
Price: $120

Study the dataset: find its size, number and describe the type of variables. Check if there’s any data missing (if yes, apply an appropriate cleaning technique). Perform a descriptive statistical analysis of the dataset: choose a range of the variables of your interest, find their frequencies and dependencies through bar plots, grouped bar plots, pie-charts, etc. Draw conclusions.

Advanced: Perform a factor analysis. Comment on your findings.

(THIS IS THE UCI REPOSITORY)

Q2. Split the dataset on training and testing parts. Build a Random Forest Regression model (using random Forest R library) to predict a final year grade (G3). Evaluate your model using a test dataset.

Plot an importance graph. Estimate accuracy. Comment on your results.

Advanced: Divide the students into 3 categories: poor achieving students, average achieving, well achieving (based on the final grade). Build a classification Random Forest model. Evaluate your model using test dataset. Print confusion matrix. Build conclusions.

Recommended reading: Breiman, L., (2001). Random Forests. Machine Learning. 45(1),5–32. doi: 10.1023/A:1010933404324

Recommended reading:Breiman, L., (2001). Random Forests. Machine Learning. 45(1),5–32.doi: 10.1023/A:1010933404324

Structure of the evaluative report:

1.Cover page with your name, name of the chosen dataset and the corresponding Data Mining method.

2.Introduction which contains a short description of the chosen method.

3.Answers on the stated questions and conclusions.

4.A literature review which should include the reference to the original method, its extensions and improvements (if applicable) and a few recent applications of the method. You must use APA 7th for referencing.

5.Appendix which must include the R commands you used in your analysis

.I WOULD WANT AN EXPORTED R ALSO

All plots, figures and graphs must be numbered and clearly labelled