Total views : 337

Diagnosing Diabetic Dataset using Hadoop and K-means Clustering Techniques.


  • Department of Computer Science, Vels University, Chennai - 600117, Tamil Nadu, India


Objectives: The articles display how enormous measure of information in the field of social insurance frameworks can be dissected utilizing grouping method. Removing helpful data from this gigantic measure of information is profoundly compound, exorbitant, and tedious, in such territory information mining can assume a key part. Specifically, the standard information digging calculations for the examination of colossal information volumes can be parallelized for speedier preparing. Methods/Statistical Analysis: This paper concentrate on how grouping calculation to be specific K-means can be utilized as a part of parallel handling stage in particular Apache Hadoop bunch (MapReduce paradigm huge) so as to dissect the gigantic information quicker. Findings: As an early point, we complete examination keeping in mind the end goal to evaluate the adequacy of the parallel preparing stages as far as execution. Applications/Improvements: Based on the final result, it shows that Apache Hadoop with K-means cluster is a promising example for versatile execution to anticipate and analyze the diabetic infections from huge measure of information. The proposed work will give an insight about the big data prediction of diabetic dataset through Hadoop. In future this technology has to be extended on cloud so as to connect various geographic districts around Tamil Nadu to predict diabetic related diseases.


Apache Hadoop, K-means, MapReduce.

Full Text:

 |  (PDF views: 300)


  • Sadhana S, Shetty S. Analysis of diabetic data set using hive and R. International Journal of Emerging Technology and Advanced Engineering. 2014; 4(7):626–9.
  • Iyer AS, Jeyalatha J, Sumbaly R. Diagnosis of diabetes using classification mining techniques. IJDKP. 2015; 5(1):1–14.
  • Koklu M, Unal Y. Analysis of a population of diabetic patients databases with classifiers. World Academy of Science, Engineering and Technology International Journal of Medical, Health, Pharmaceutical and Biomedical Engineering. 2013; 7(8):167–9.
  • Kumar P, Rathore VS. Efficient capabilities of processing of big data using hadoop map reduce. International Journal of Advanced Research in Computer and Communication Engineering. 2014; 3(6):7123–6.
  • Rajendran PK, Asbern A, Kumar KM, Rajesh M, Abhilash R. Implementation and analysis of mapreduce on biomedical big data. Indian Journal of Science and Technology. 2016 Aug; 9(31). DOI: 10.17485/ijst/2016/v9i31/83451.
  • Arun k, Jabasheela L. Big data: Review, classification and analysis survey. IJIRIS. 2014; 1(3):17–23.
  • Raghupathi W, Raghupathi V. Big data analytics in healthcare: Promise and potential. Health Information Science and System. 2014; 2(3):1–10.
  • Augustine DP. Leveraging big data analytics and hadoop in developing India’s healthcare services. International Journl of Cpmputer Application. 2014; 89(16):44–50.
  • Sharmila K, Vethamanickam SA. Survey on data mining algorithm and its application in healthcare sector using hadoop platform. International Journal of Emerging Technology and Advanced Engineering. 2015; 5(1):567–71.
  • Kumar SNM, Eswari T, Sampath P, Lavanya S. Predictive methodology for diabetic data analysis in big data. Science Direct, Elsevier, Procedia Computer Science. 2015; 50:203–8.
  • Sharmila K, Vethamanickam SA. Application of mapreduce in diabetic dataset using hadoop platform. International Journal of Applied Engineering Research. 2015; 10(60):15–20.
  • Greeshma L, Pradeepini G. Big data analytics with apache hadoop mapreduce framework. Indian Journal of Science and Technology. 2016 Jul; 9(26). DOI: 10.17485/ijst/2016/v9i26/93418.
  • Can A. Benchmarking of data mining techniques as applied to power system analysis. Department of Information Technology, Uppsala University; 2013.
  • Nagarajan S, Chandrasekaran RM. Design and implementation of expert clinical system for diagnosing diabetes using data mining techniques. Indian Journal of Science and Technology. 2015 Apr; 8(8):771–6.
  • Zhao W, Ma H, He Q. Parallel K-means clustering based on mapreduce. Springer-Verlag Berlin Heidelberg. 2009; 5931:674–9.
  • Ramzan M, Ramzan F, Thakur S. A systematic review of type-2 diabetesbyhadoop/map-reduce. Indian Journal of Science and Technology. 2016 Aug; 9(32). DOI: 10.17485/ijst/2016/v9i32/100184.


  • There are currently no refbacks.

Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.