File System Data mining: Pratical machine learning tools and techniques (2nd ed.). In the latter, input features are linked to a variable of interest in a functional connection. Random Forest is used for the classification of PQ disturbances [18] and fault record detection in data center of large power grid [19]. is the percentage of variance in Y explained by the regression model.
OAuth, Contact Sequence Clustering. From the data collected, online or offline analysis is needed to be carried out to classify the disturbances [4,5,6,7]. Power quality disturbances classification using data mining technique. A small power outage has a great economic impact on the industrial consumers. Witten, I. H., & Frank, E. (2005).
14th International Joint Conference on Artificial Intelligence (IJCAI); 2. p. 11371143. in EEE from Bhoj Reddy Engineering College for Women (BRECW), Hyderabad, in 2006 and M.E. The risk of overfitting training data exists because learning algorithms are taught on finite samples: the model may memorize the training samples rather than learning a general rule, i.e., the data-producing model.
relevant to different algorithms. The goal of data mining is to construct learning models that can automatically extract knowledge from vast amounts of complex data. Asha Kiranmai, S., & Jaya Laxmi, A. However, since we are using data mining outcomes for better business decisions,
The effect of data attributes on the classification accuracy and time taken for training the decision trees is also discussed. Process (Thread) WEKA supports many different standard data mining tasks such as data pre-processing, classification, clustering, regression, visualization and feature selection. Detailed classification of various categories of power quality problems. She is a Senior Member of IEEE, Member of International Accreditation Organization (MIAO), Fellow member of Institution of Electrical Engineers Calcutta (FIE), Life Member of System Society of India (MSSI), Life Member of Indian Society of Technical Education (MISTE), Life Member of Electronics and Telecommunication Engineering (MIETE) and Life Member of Indian Science Congress (MISC).
This gives us the error rate. Selector During the Data Mining project creation, Create a Testing Data Set is an important option for accuracy. Let us look at the Decision tree classification matrix. An analysis of machine learning techniques (J48 & AdaBoost)-for classification (pp. Similarly, there are 2024 actual bike buyers and which are predicted
Analysis of WEKA data mining algorithm REP tree, simple cart and random tree for classification of Indian news.
Sampson, A. Data mining methods are well equipped to handle large amount of data and to detect the useful patterns in these data that allow us to improve the performance. The following are the basic measures that can be derived from the classification matrix. Therefore, it is essential to find out how accurate your data mining models are. Comparison between random Forest algorithm and J48 decision trees applied to the classification of power quality disturbances. www.nilc.icmc.usp.br/elc-ebralc2012/minicursos/WekaManual-3-6-8.pdf. Pandey, P., & Prabhakar, R. (2016). Montreal, Quebec, Canada: In Proc. Data mining techniques, instead, can analyze and cope intelligently with records containing missing values, as well as a mixture of qualitative and quantitative data, without tedious manual manipulation [31, 32]. Role of attribute selection in classification algorithms. (2015).
Zhou, J., Ge, Z., Gao, S., & Yanli, X. the Specify a different data set. Google Scholar. Let us look at different evaluation parameters for the different algorithms. possible buyers who have the highest probability of buying. Security WEKA manual. Consequences of poor power quality An overview. Cruz & Jordan Rel C. Orillaza (2017). It has been found that whenever correct attributes are selected before classification, accuracy of data mining algorithms is improved significantly [23, 24]. In a 3-phase system, voltage unbalance takes place when the magnitudes of phase or line voltages are different, or the phase angles differ from the balanced conditions, or both. The 3-phase RMS voltages calculated at the Point of Common Coupling (PCC) are used as the main data for classification of the power quality problems. Isinkaye, F. O., Folajimi, Y. O., & Ojokoh, B. These algorithms are implemented on two sets of voltage data using WEKA software.
Fixed Cost, Individual cost and the expected revenue. International Journal of Advances in Engineering & Technology, 1(2), 111. The authors declare that they have no competing interests.
. The internal nodes test an input attribute/feature in relation to a decision constant and, this way, determines what will be the next descending node. ACEEE Int J on Electrical and Power Engineering, 3(1), 5561. Pre-process stage of data mining in WEKA with 7 attributes. Data Analysis Kalmegh, S. (2015). PubMedGoogle Scholar. Data mining technology is an effective tool to deal with massive data, and to detect the useful patterns in those data. In the next case, along with Va, Vb, Vc and class attribute, three more extra numeric attributes are included. Predicting a class label from instances in a problem domain is called classification predictive modeling. This paper focuses on how data mining techniques of J48, Random Tree and Random Forest decision trees are applied to classify power quality problems of voltage sag, swell, interruption and unbalance.
Power quality problems like voltage sag, swell, unbalance, interruption, flicker, harmonics, etc., create poor power quality. What dimensionality curse in Data Science? It is also clear that the training time taken by the Random Tree is only 1.91s, which is very less as compared to J48 and Random Forest. It employs top-down and greedy search through all possible branches to construct a decision tree to model the classification process. Manimala, K., Selvi, K., & Ahila, R. (2008). We have different terminology for each case we saw in the above case. Voltage sag is defined as a decrease in RMS voltage between 0.1 p.u. Decision trees such as J48, Logistic Model Tree (LMT), Reduced Error Pruning (REP) Tree, Random Tree, Simple Cart, Random Forest are used for the classification purpose [15,16,17]. As a result, the models capacity to generalize is measured by its accuracy on unknown data. The power quality monitoring requires storing large amount of data for analysis. Data mining applied to the electric power industry: Classification of short-circuit faults in transmission lines (p. 2007). Random Tree algorithm takes very less training time among the three algorithms and its accuracy is good. The ultimate goal of data mining is to discover useful information from large amounts of data in many different ways using rules, patterns and classification [27].
Classification Matrix or the confusion matrix is used to derive various classification accuracy matrices. need to test the built model for the customers with age over 40 years. ,