Total views : 179

CATs-Clustered k-Anonymization of Time Series Data with Minimal Information Loss and Optimal Re-identification Risk

Affiliations

  • Department of Computer Science and Engineering, St. Peter’s University, Avadi, Chennai - 600054, Tamil Nadu, India
  • Department of Computer and Engineering, C. Abdul Hakeem College of Engineering and Technology, Melvisharam - 632509, Tamil Nadu, India
  • Department of Computer Science and Engineering, SCSVMV University, Kanichipuram - 631561, Tamil Nadu, India

Abstract


Background/Objectives: Time series is a significant type of data, widely used in diverse application such as financial, medical, and weather analyses, which in-turn contain personal privacy to a great extent. Methods/Statistical Analysis: The perquisite to protect privacy of time series data is to bolster the data holder to get involved in the above applications without any privacy threats. The k-anonymization approach of time series data has picked up consideration over late years, a key requirement of such an approach is to guarantee anonymization of time series data while minimizing the information loss caused from that approach. Findings: In this article, we implemented a novel methodology called CATs (Clustered k-Anonymization of Time Series Data) that applies the idea of clustering on time series data and ensure anonymization by gaining minimized information loss within venerable utility. The fundamental perception here is that the time series data tuples that are alike, ought to be a part of one cluster, and de-identification of these tuples is furnished. We thus formulate and proposed this approach as CATs, implemented through mishmash of WEKA and ARX anonymization tool. We have executed the solution on two benchmark time series data set available in UCR, Our experimental result strives that CATs confirms to have minimal information loss ranging from 18% to 24% reduction rate when compared with existing TSA (Time Series Anonymization) approaches. Applications/Improvements: As result of our experimentation, we express that our approach can play a remarkable role in the field of financial management, Online Medical process monitoring and management etc.

Keywords

Clustering, Information Loss, k-Anonymization, Privacy Preserving Data Mining, Re-Identification Risks, Time Series Data Mining.

Full Text:

 |  (PDF views: 179)

References


  • Sweeney L. k-Anonymity: Privacy protection using generalization and suppression. International Journal Uncertainty Fuzziness and Knowledge-based Systems. 2002; 10(5): 571–88.
  • Pensa RG, Monreale A, Pinelli F, Pedreschi D. Patternpreserving k-Anonymization of sequences and its application to mobility data mining. International Workshop Privacy in Location-Based Applications (PiLBA); 2008.
  • Abul O, Atzori M, Bonchi F, Giannotti F. Hiding sequences.Proceeding IEEE 23rd International Conference. Data Engineering (ICDE) Workshops, India; 2007. p. 147–56.
  • Mohammed N, Fung BCM, Debbabi M. Walking in the crowd: Anonymizing trajectory data for pattern analysis.Proceeding 18th ACM Conference Information and Knowledge Management (CIKM), China; 2009. p. 1441–4.
  • Nergiz ME, Atzori M, Saygin Y. Perturbation-driven anonymization of trajectories. Technical Report 2007-TR-017, ISTI-CNR; 2007.
  • Machanavajjhala A, Gehrke J, Kifer D, Venkitasubramaniam M. l-diversity: Privacy beyond k-Anonymity. 22nd International Conference on Data Engineering (ICDE); 2006. p. 24.
  • Li N, Li T, Venkatasubramanian S. t-closeness: Privacy beyond k-Anonymity and l-Diversity. IEEE 23rd International Conference on Data Engineering (ICDE); 2007. p. 106–15.
  • Papadimitriou S, Li F, Kollios G, Yu PS. Time series compressibility and privacy. 33rd International Conference on Very Large Data Bases (VLDB); 2007. p. 459–70.
  • Singh L, Sayal M. Privacy preserving burst detection of distributed time series data using linear transforms. IEEE Symposium Computational Intelligence and Data Mining (CIDM); 2007. p. 646–53.
  • Malaisamy A, Nawaz GMK. Data privacy using k-Anonymization with clustering technique. International Journal of Innovations in Engineering and Technology. 2016 Feb; 6(3).
  • Byun J-W, Kamra A, Bertino E, Li N. Efficient k-anonymization using clustering techniques. Advances in Databases: Concepts, Systems and Applications. 2007; 4443:188–200.
  • Fung CMB, Wang K, Wang L, Debbabi M. A framework for privacy preserving cluster analysis. ISI 2008 Jun IEEE, Taipei, Taiwan; 2008.
  • Mandapati S, Bhogapathi RB, Rao MVPCS. Classification via clustering for anonymization data. International Journal of Computer Network and Information Security.2014; 3:52–8.
  • Rajalakshmi V, Mala GSA. Anonymization by data relocation using sub-clustering for privacy preserving data mining. Indian Journal of Science and Technology. 2014 Jul; 7(7):975–80.
  • Hariharan R, Mahesh C, Prasenna P, Kumar RV. Enhancing privacy preservation in data mining using cluster based greedy method in hierarchical approach. Indian Journal of Science and Technology. 2016 Jan; 9(3).
  • Kohlmayer F, Prasser F, Kuhn KA. The cost of quality: Implementing generalization and suppression for anonymizing biomedical data with minimal information loss.Journal of Biomedical Informatics. 2015; 58:37–48.
  • Taneja H, Kapil, Singh AK. Preserving Privacy of Patients based on Re-identification risk. Procedia Computer Science. 2015; 70:448–54.
  • Aggarwal CC, Yu PS. A condensation approach to privacy preserving data mining. Ninth International Conference on Extending Database Technology (EDBT); 2004. p. 183–99.
  • Keogh EJ, Chakrabarti K, Mehrotra S, Pazzani MJ. Locally adaptive dimensionality reduction for indexing large time series databases. Proceedings ACM SIGMOD Conference; 2001. p. 151–62.
  • Nin J, Torra V. Towards the evaluation of time series protection methods. Information Sciences. 2009; 179(11):1663–77.
  • Keogh EJ, Chakrabarti K, Pazzani MJ, Mehrotra S.Dimensionality reduction for fast similarity search in large time series databases. Knowledge Information Systems.2001; 3(3):263–86.
  • Deegalla S, Bostrom H. Reducing high-dimensional data by principal component analysis vs. random projection for nearest neighbor classification. 3rd International Conference on Intelligent Computational Systems (ICICS’2013) 2013 Apr 29–30, Singapore; 2013.
  • Keogh E, Xi, Wei L, Ratnamahatana CA. The UCR time series for classification/clustering [Internet]. Available from: http://www.cs.ucr.edu/~eamonn/time_series_data.

Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.