Total views : 2223

Application of Big Data Analysis with Decision Tree for Road Accident


  • Laboratory of Intelligent Energy Management and Information Systems, Faculty of Sciences Semlalia, Cadi Ayyad University, Marrakech, Morocco


Objectives: In transportation field, a huge amount of data collected by IoT systems, remote sensing and other data collection tools brings new challenges, the size of this data becomes extremely big and more complex for traditional techniques of data mining. To deal with this challenge, Apache Spark stand as a powerful large scale distributed computing platform that can be used successfully for machine learning against very large databases. This work employed large-scale machine learning techniques especially Decision Tree with Apache Spark framework for big data analysis to build a model that can predict the factors lead to road accidents based on several input variables related to traffic accidents. Based on this, the predicting model first preprocesses the big accident data and analyze it to create data for a learning system. Empirical results show that the proposed model could provide new information that can assist the decision makers to analyze and improve road safety


Data mining, Big Data, Road accident, Decision Tree, Apache Spark, Mllib

Full Text:

 |  (PDF views: 494)


  • Fayyad UM, Piatetsky-Shapiro G, Smyth P. From data mining to knowledge discovery: An overview. Advances In Knowledge Discovery And Data Mining. AAAI Press/The MIT Press; 1996. p. 134.
  • Agrawal R, Imielinski T, Swami A. Mining association rules between sets of items in large databases. In: Proceedings of ACM SIGMOD Conference on Management of Data (SIGMOD 1993); 1993. p. 207216. Crossref.
  • Available from:
  • Jeffrey D, Sanjay G. MapReduce: Simplified data processing on large clusters. Proceedings of the 6th Conference on Symposium on Operating Systems Design and Implementation, (OSDI’04); Berkeley, CA, USA: USENIX Association. 2004. p. 10-10.
  • Saathoff BG, Hamid RA, Hill R, Staniforth A, Bayerl PS. Application of Big Data for National Security. Akhgar B, editor. Butterworth-Heinemann; 2015. p. 4.
  • Chen Y, Li F, Fan J. Mining association rules in big data with NGEP Cluster Comput. 2015; 18: 577. Crossref.
  • Jin S, Lin W, Yin H. Community structure mining in big data social media networks with MapReduce. Cluster Comput. 2015; 18:999. Crossref.
  • Available from:
  • Available from: 10. Kuhnert PM, Do KA, McClure R. Combining non-parametric models with logistic regression an application to motor vehicle injury data. Computational Statistics and Data Analysis. 2000; 34(3):371–86. Crossref.
  • Ossenbruggen PJ, Pendharkar J, Ivan J. Roadway safety in rural and small urbanized areas. Accidents Analysis and Prevention. 2001; 33(4):485–98. Crossref.
  • Sohn S, Hyungwon S. Pattern recognition for a road traffic accident severity in Korea. Ergonomics. 2001; 44(1):101–17. Crossref. PMid:11214896
  • Chong M, Abraham A, Paprzycki M. Traffic accident analysis using decision trees and neural networks. iadis international conference on applied computing, Isaias P, et al, editor. Portugal: IADIS Press; 2004. p. 39–42.
  • Chang L, Wang H. Analysis of traffic injury severity: An application of non-parametric classification tree techniques accident analysis and prevention. Accident Analysis and Prevention. 2006; 38(5): 1019–27. Crossref. PMid:16735022
  • Sze NN, Wong SC. Diagnostic analysis of the logistic model for pedestrian injury severity in traffic crashes. Accident Analysis and Prevention. 2007; 39:1267–78. Crossref. PMid:17920851
  • Abugessaisa. Knowledge discovery in road accidents database integration of visual and automatic data mining methods. International Journal of Public Information Systems. 2008; 1:59–85.
  • Anderson TK. Kernel density estimation and K-means clustering to profile road accident hotspots. Accident Analysis and Prevention. 2009; 41(3):359–64. Crossref. PMid:19393780
  • Wong J, Chung Y. Comparison of methodology approach to identify causal factors of accident severity. Transportation Research Record. 2008; 2083:190–8. Crossref.
  • Zelalem R. Determining the degree of drivers’ responsibility for car accident: The case of Addis Ababa traffic office. Addis Ababa, Addis Ababa University; 2009.
  • Pakgohar R, Tabrizi S, Khalilli M, Esmaeili A. The role of human factor in incidence and severity of road crashes based on the CART and LR regression: A data mining approach. Procedia Computer Science. 2010; 3:764–9. Crossref.
  • Demirel N, Emil MK, Duzgun HS. Surface coalmine area monitoring using multi-temporal high-resolution satellite imagery. Int J Coal Geol. 2011; 86:311. Crossref.
  • Wu H, Tao J, Li X, Chi X, Li H, Hua X, Yang R, Wang S, Chen N. A location based service approach for collision warning systems in concrete dam construction. Saf Sci. 2013; 51:338–46. Crossref.
  • Kecojevic ZM, Komljenovic V. Investigation of haul truckrelated fatal accidents in surface mining using fault tree analysis. Saf Sci. 2014; 65:106–17. Crossref.
  • Sanmiquel L, Rossell JM, Vintr C. Study of Spanish mining accidents using data mining techniques. Safety Science. 2015; 75:49–55. ISSN: 0925-7535.
  • Dean J, Ghemawat S. MapReduce: Simplified data processing on large clusters. Proceedings of the 6th Conference on Symposium on Operating Systems Design and Implementation, OSDI’04; Berkeley, CA, USA: USENIX Association. 2004. p. 10–10. PMid:14728975
  • Jung-Kyu C, Keun-Hwan J, Yonggwan W, Application of big data analysis with decision tree for the foot disorder. Journal of Cluster Computing. 2015; 18(4):1399–404. Crossref.
  • Park S, Kim S-M, Ha Y. Highway traffic accident prediction using VDS big data analysis. The Journal of
  • Supercomputing. 2016; 72(7): 2815–31. Crossref.
  • Ravindra C, Gummadi A, Yedukondalu G, Raju UNPG. Classification by decision tree induction algorithm to learn decision trees from the class labeled training tuples. International Journal of Advanced Research in Computer Science and Software Engineering. 2012; 2(4):427–34.
  • Quinlan JR. Induction of decision trees. Mach Learn. 1986; 1:81–106. Crossref.
  • Quinlan JR. Bagging, boosting, and C4.5. Proceedings of the 13th National Conference on Artificial Intelligence and 8th Innovative Applications of Artificial Intelligence Conference 96; Portland, Oregon. 1996 Aug 4-8.
  • Bradley M, Yavuz J, Sparks B, Shivaram V, Davies L, Jeremy F, Manish A, Sean O. Machine learning in Apache Spark; 2015.
  • Available from:
  • Available from:
  • Available from:
  • Addi M, Tarik A, Fatima G. An approach based on association rules mining to improve road safety in Morocco. 2016 International Conference on Information Technology for Organizations Development (IT4OD); 2016. p. 1–6.
  • Available from: home.aspx
  • Kimio T, Natarajan G, Hideki A, Taichi K, Nanao K.Higher involvement of subtelomere regions for chromosome rearrangements in leukemia and lymphoma and in irradiated leukemic cell line. Indian Journal of Science and Technology. 2012 Apr; 5(1):1801–11.
  • Cunningham CH. A laboratory guide in virology. 6th ed.Minnesota: Burgess Publication Company; 1973.
  • Kumar E, Rajan M. Microbiology of Indian desert. Ecology and vegetation of Indian desert. Sen DN, editor. India: Agro Botanical Publ; 1990. p. 83–105.
  • Rajan M, Rao BS, Anjaria KB, Unny VKP, Thyagarajan S. Radiotoxicity of sulfur-35. Proceedings of 10th NSRP; India. 1993. p. 257–8.
  • Available from:


  • There are currently no refbacks.

Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.