Machine learning, also known as Educational Data Mining (EDM), is the process and application of data mining techniques with the objective of improving the process of education, and outcomes of educational systems. DM is the process of extracting and interpreting useful data, which has been in existence for long time in finance, science, commerce, and other academic fields for decision making and other relevant purposes. DM is extensively used in retail sector in preparing market research reports (Han, Kamber and Pei, 2016).
EDM has the purpose of exploring interesting and unique data in relation to interactions and settings in order to understand clearly where and what improvements are necessary for betterment of the process (Barker and Yacef, 2019). Two trends are clearly visible in today’s educational system; one is the increased use of software and huge database that is changing the learning process, and the other is e-learning in which internet is used to access myriad ways of educational interactions. In the system of DMR raw data are extracted, processed, and developed into useful information for educational software developers, teachers, academic researchers, and other stakeholders of the education system. EDM collects raw data from the education system, processes it and develops knowledge, which then goes back to the system for its improvement.
Article 1: Early Prediction of Students Performance using Machine Learning Techniques, by Anal Acharya and Devadatta Sinha.
EDM enables application of raw data converted into knowledge, to evaluate the system of education, and application of the same to make qualitative improvement in the system, and prepare ground for future researches. DM was most successfully implemented in e-commerce business in its debut appearance in the field of research, but not many researchers have been undertaken in respect of application in the education system. In higher education system, namely in medical and engineering, the authorities need to know much about the ability of the students in their institutions to gauge who are capable of scoring high and be eligible to be awarded with merit scholarships, and who are less capable to achieve distinction in examinations so that the system can be developed to make it more inclusive.
In this study, 5 MLA classifiers are chosen, for representation of each group by each MLA. The reason for choosing different MLAs is to bring in variation, and so that the MLAs do not produce identical results. This is a great precautionary step of the authors. The course of the college requires a student to appear in six semesters, each semester divided into two examinations. In this study, the authors selected semester 1 for the purpose of making result prediction. A questionnaire was prepared for each group and data collected from principal, faculty members, students, and selected literature.
The authors have used 14 independents variables and 1dependent variables. A 15 variable algorithm is difficult to perfectly fit in the research work due to computational and other problems of data overburdening. In dealing with these problems, the authors made detailed study of filter and wrapper based techniques, and their attributes to select feature selection technique. The feature selection techniques used in the study are correlation based feature selection (CBFS), information gait attribute evaluation (IGATE), and chi-square based feature selection (CSBFS). The MLAs used in the study are Naive Bayes, 1-Nearest Neighbourhood, and C4.5. The authors observed that CBFS algorithm gave the best prediction. In order to maintain uniformity in result, the authors CBFS algorithm with 8 highest ranking features was used to select classifier in each of the examination papers.
The existing literature uses MLAs to classify students’ data. The debate remains whether those could be used efficiently in educational settings, due to presence of large number of data and features.
Article 2: Predicting Students’ Academic Performance using Artificial Neural Network: A Case Study of an Engineering Course by V.O. Oladokun, Ph.D., A.T. Adebanjo, B.Sc., and O.E. Charles-Owaba, Ph.D.
The graduation level score of some Nigerian universities’ students has been a matter of concern for all concerned. The root of such under performance by students could be traced to some flaws in the admission examination system. In this study the authors developed an Artificial Neural Network (ANN) system to predict the performance of a candidate in the admission test. The variable inputs used in the model included such factors as secondary level examination scores, family background, schools attended, combination of subjects, and gender. A multilayer Perception Topology based model was developed with data of 5 generations of Ibadan University graduates.
The purpose of admission tests is to screen students to admit those who could later on live up to the expectations of university and the broader community. Quality of Research and training in university settings depend upon the quality of students admitted into the university. It has been observed in recent past that quality of students graduating from Ibadan University of Nigeria, the country’s first university is abysmally low. The main cause of such deterioration in the quality of students was attributed to the growing gap between the numbers of students seeking admission, and number of approved seats in the universities. As a result of these students manipulates the system to get admission into universities, which led to a number of admission related frauds. In this background, it has been felt necessary to improve the admission test system to bring back the past glory of the premier university of Nigeria.
The complaints of people against the traditional system of admission tests in Nigerian Universities is nothing new, and the phenomenon is visible in many parts of the world where university authorities are not satisfied with the quality of education provided in universities. While in many developing countries information and technological advances have been embraced by the authorities in efforts to students’ learning process and the outcomes, Nigeria is left behind in matter of using technology to improve quality of educational settings. This study has been undertaken to explore ways to solve the loop-holes in the system and make it more robust.
The authors undertook extensive research including interviews and discussion with subject matter experts. Such interactions certain social, economic, biological, and environmental factors were identified which, according to academicians, certainly impact performances of students. These data were then structured to develop the ANN model. These variables were put into the model as independent inputs, and performances of students in the light of the present system of ranking students.
After completion of data collection and selection of the training method, the topology of the network has to be determined. The topology is the arrangement of the network. There are numerous topologies and selection of the topology is a difficult task (Bose and Liang, 2016). After careful consideration of the advantages and disadvantages of each topology, Multilayer Perception, recurrent network, and time-lagged recurrent network were considered.
The network correctly predicted 82% of the good data which represented marks of students securing 1st class or upper 2nd class, 53.3% of average data representing marks students securing 2nd class or lower result, and 88% of bad data representing 3rd class or only pass scores. Thus the ANN model was successful in predicting 74% of the examination results.
The study has been fairly successful in constructing an ANN model for predicting examination results of student. Thus the ANN model can be used to predict admission test results of students in colleges in Nigerian university. One limitation of the model is that the input data are obtained from the records supplied by students before the admission process starts. These data may not be correctly supplied by the students. The model now can be extended to other non-engineering admission tests. The model can be used to predict a student’s performance taking graduate level admission tests. The study shows that system of admission should be improved, and the students taking the admission test could be put into the model to admit students with better potential.
Article 3: A Comparative Analysis of Techniques for Predicting Academic Performance by Nguyen Thai Nghe , Paul Janecek , and Peter Haddawy Computer Science and Information Management Program Asian Institute of Technology (AIT), Thailand 12120
wIn educational set up with limited financial and infrastructural resources, it is imperative to predict performances of students with high degree of accuracy. This is because of two reasons; firstly the university has to keep provision for awarding distinction to the best performing students, and to make optimum allocation of tutorial resources in order to be fair to the poorly performing students. In this paper two case studies have been made to examine the suitability of data mining technique in predicting students’ performance in examinations. The first case study involves performance prediction of second year students of Can Tho University (CTU) of Viet Nam, and the second case study is made with input variables of students of Asian Institute of Technology, Thailand. In this paper prediction of students’ performance is made using Decision Trees (DT) and Bayesian Network (BM) to make the prediction in both of the case studies.
This study has its own place for importance in the middle of other researches as it has made some important contributions to the application of EDM in predicting students’ performance. Firstly, the study focuses on the whole process of EDM application in educational settings, and also emphasizes the method of application with scope of refining of data and achieving high degree of accuracy in perditions, secondly, the study revealed that DT algorithm is capable of predicting students’ performance more accurately than BN.
Mauthors have described the methodology under following sections:
Data mining tool selection: In the first step the authors made a detailed study on data mining tools and selected 30 DMT which was later filtered to 10. Next the authors applied the methodology suggested by (Collier et al, 1999) and identified the criteria for computability, functionality, and support for the project to run. The authors ensured that the system operates on an open source platform. This further filtered the number of DMT to 6. Selection of 30 DMT and filtering to 10 is a very effective first step in the course of the study. This would enable the authors to put more variables as and when takes place.
Preparation of data: In this step the authors collected historical data of students of the two institutes. In case of CTU 20,042 complete records were collected in respect of students of the university from 1995 to 2002. For AIT case study, 936 complete records were collected in respect of students of the university from 2003 to 2005. From the data base, those with meaningful attributes were selected and grouped and new attributes were derived by the authors from their past experience.
Modelling the performance prediction problem: The authors used DT and BN to construct the algorithm.
Tuning algorithm parameters: In this step the authors configured the algorithm by finetuning the parameters.
Result of the prediction model using DT and BN algorithm shows that the CTU students’ performance predictions were visibly more accurate than the AIT students’ performance prediction. The result also shows that DT algorithm gave consistently better accuracy than the BN algorithm although the application process. It was found during the experiment that the output classes were unevenly distributed, and the larger classes showed greater accuracy than the smaller classes. Thus in case of CTU accuracy in predicting number of students to perform good was 79%, whereas the accuracy of prediction of number of students to fail was 34%. Greater accuracy in larger classes has been observed consistently in both the data sets and in respect of all the academic levels.
Article 4: A CHAID Based Performance Prediction Model in Educational Data Mining by M. Ramaswami1 and R. Bhaskaran2
Performances of students at higher education level are influenced by a number of factors; some are highly relevant, such as intelligence level, socio-economic factors, and educational background. A performance predicting model that includes all the variables is highly beneficial for predetermining the weak students in need of better tutoring for betterment of result than predicted, and the best students who need to be awarded fir achievement. Keeping the importance of performance prediction accuracy in mind, a number of studies have been done by constructing input-output algorithm. It has been evidenced that inductive approach with EDM application gives greater prediction accuracy than deductive approach (Kabakchieva, 2015). The authors have taken quantitative method of inductive research. This paper comprises of a research project for determining performance prediction for Indian higher education students group.
The authors applied survey and interview method to input the model. The primary source of information was the regular students, while some secondary data was collected from school authorities. Direct interview from the students enabled the researchers to collect more reliable and real-time information.
The authors took great care in preparing the questionnaire by designing the questions on the basis of input from teachers, parents, director of studies, and the educational experts in universities and colleges. This enabled the authors to determine the range of variables directly from the stakeholders of the education system (Writer and Frank, 2015). On the basis of the above guidelines, the most relevant variables were considered for the study. The questionnaire was prepared and survey was made from the students directly, and some data was collected from the school authorities.