Total views : 266
Analysis and Implementation of Data Mining Algorithms for Deploying ID3, CHAID and Naive Bayes for Random Dataset
Objectives: Effective data processing for fast retrieval of information has become a burning issue. A modern document contains not only text but images, video, audio as well. In this paper, a brief history of storage devices from the Vedic period to the world of digitization with some important inventions has been presented. Method/Statistical Analysis: It also includes the discussion on how data is transformed for the decision making process along with preprocessing techniques. Findings: A comparative analysis has been done of various techniques, their specific algorithms, uses, pros, limitations and applications where these can be implemented. It helps to give us an insight about these techniques. Finally experimental results on three different algorithms (Id3, CHAID, Naive Bayes) using Rapid Miner have been evaluated to compare their performance based on three parameters (accuracy, precision and recall). Applications/Improvements: The empirical results show the ID3 as more accurate than others with 95.95% accuracy while CHAID shows 89.11% and Naive Bayes classified 81.77% data accurately. *
CHAID, Development of Storage Devices, ID3, Information Retrieval, Rapid Miner, Retrieval Techniques, Visualization.
- Soper HE. Means for compiling tabular and statistical data. U.S. Patent US00135169231; 1920.
- Goldberg E. Statistical Machine. U.S. Patent 183838929; 1931.
- Modern computer system. 2016. Available from: http://history-computer.com /Modern Computer/ Basis/compact_disc.html
- History of computer storage devices. 2016. Available from: http://www.zetta.net/history-of-computer-storage
- History of all usb devices. 2016. Available from: http://www.allusb.com/usb-history
- Nanus B. The use of electronic computers for information retrieval. Bulletin of the Medical Library Association. 1960; 48(3):278.
- Calvin NM. The theory of digital handling of non-numerical information and its implications to machine economics. Zator Co.1950; 48:48–52.
- Taube M, Gull CD, Wachtel IS. Unit terms icoordinate indexing. American Documentation. 1952; 3(4):213–8.
- Kent BA. Operational criteria for designing information retrieval systems. American Documentation. 1955; 6(2):93–101.
- Luhn HP. The automatic creation of literature abstracts. IBM Journal of Research and Development. 1958; 2(2):159–65.
- Bar-Hillel Y. The mechanization of literature searching. in: mechanization of thought processes. Proceedings of a Symposium held at the National Physical Laboratory, HM Stationery Office; 1959. p. 791–807.
- Salton G, Wong A, Yang CS. A vector space model for automatic indexing. Communications of the ACM. 1975; 18(11):613–20.
- Porter MF. An algorithm for suffix stripping. Program: Electronic Library and Information Systems. 1980; 14(3):130–7.
- Agrawal R, Imielinski T, Swami A. Database mining: A performance perspective. IEEE Transactions on Knowledge and Data Engineering. 1993; 5(6):914–25.
- Hongjun L, Setiono R, Liu H. Effective data mining using neural networks. IEEE Transactions on. Knowledge and Data Engineering. 1996; 8(6):957–61.
- Usama MF, Piatetsky-Shapiro G, Smyth P. Knowledge discovery and data mining: Towards a Unifying Framework. KDD. 1996; 96:82–8.
- Lee DL, Chuang H, Seamons K. Document ranking and the vector-space model. Software, IEEE. 2000; 14(2):67–75.
- Ramasubramanian V, Paliwal KK. Fast nearest-neighbor search algorithms based on approximation-elimination search. Pattern Recognition. 2000; 33(9):1497–510.
- Sahu SK, Mohapatra DP, Balabantaray RC. Information retrieval in the context of checking semantic similarity in web: Vision of future web. Indian Journal of Science and Technology. 2016 Aug; 9(32):1–12.
- Elbeltagi E, Hegazy T, Grierson D. Comparison among five evolutionary-based optimization algorithms. Advanced Engineering Informatics. 2005; 19(1):43–53.
- Ning X, Jin H, Jia W, Yuan P. Practical and effective IR-style keyword search over semantic web. Information Processing and Management. 2009; 45(2):263–71.
- Meenakshi A, Suganthi P, Aghila R, Nirmala S. Information retrieval using dynamic decision quadtree in soil database. Indian Journal of Science and Technology. 2016 Mar; 9(10):1–7.
- Reddy LC. A review on data mining from past to the future. International Journal of Computer Applications. 2011; 7(4):19–22.
- Barros RC, Cerri R, Jaskowiak P, de Carvalho ACPLF. A bottom-up oblique decision tree induction algorithm. 2011 IEEE 11th International Conference on Intelligent Systems Design and Applications (ISDA); 2011. p. 450–6.
- Liao SH, Chu PH, Hsiao PY. Data mining techniques and applications – A decade review from 2000 to 2011. Expert Systems with Applications. 2012; 39(12):11303–11.
- Sukanya M, Biruntha S. Techniques on text mining. 2012 IEEE International Conference on Advanced Communication Control and Computing Technologies (ICACCCT); 2012. p. 269–71.
- Mohamed MM. More than words: Social networks text mining for consumer brand sentiments. Expert Systems with Applications. 2013; 40(10):4241–51.
- Silva LCS, Sampaio RR. Using Luhn’s automatic abstract method to create graphs of words for document visualization. Social Networking. 2014; 3(2):65–70.
- Vigneshwari S, Aramudhan M. Social information retrieval based on semantic annotation and hashing upon the multiple ontologies. Indian Journal of Science and Technology. 2015 Jan; 8(2):103–7.
- Venkatesan N, Arunmozhi K, Arasan AA, Muthukumaran S. An ID3 algorithm for performance of decision tree in predicting student’s absenteeism in an academic year using categorical datasets. Indian Journal of Science and Technology. 2015 Jul; 8(14):1–5.
- Sajeer K, Rodrigues P. Novel approach of implementing speech recognition using Neural Networks for information retrieval. Indian Journal of Science and Technology. 2015 Dec; 8(33):1–5.
- Abdulmohsen A, Mubarak A. Relevance feature discovery for text mining. IEEE. 2015 Jun; 27(6):237–43.
- History of ID3. 2016. Available from: https://Id3www.cse.unsw.edu.au/ ~billw/ cs9414/ notes/ ml/06prop/ id3/id3.html
- There are currently no refbacks.
This work is licensed under a Creative Commons Attribution 3.0 License.