Total views : 205

Data Integration - Challenges, Techniques and Future Directions: A Comprehensive Study

Affiliations

  • Faculty of Computer Science and Engineering, Sathyabama University, Chennai - 600119, Tamil Nadu, India
  • School of Information Technology and Engineering, VIT University, Vellore - 632 014, Tamil Nadu, India

Abstract


Objectives: This paper studies various query reformulation techniques, which are used to convert the intermediate schema to the targeted schema. The techniques such as Ontology based information integration and data integration languages are also reviewed. Methods/Statistical Analysis: This paper discusses the techniques used for data integration and also to resolve inconsistencies from the integrated data. Data integration techniques mainly focusing on integration of data in several levels and applying independent or unified query over the data available. Findings: Analysis of various techniques done in the paper has led to the identification of several shortcomings and scope for improvements in the available techniques. This identified research directions includes vertical enhancement of wrappers by utilizing a single unified wrapper for all the data sources. Optimizing the queries depending on the data source is also another major requirement to provide efficient and faster results reducing the data retrieval latencies. The paper also advocates other research directions that include identifying duplicates from the retrieved data and performing effective elimination strategies to reduce space consumption. Identifying conflicts and applying strategies to eliminate conflicts is another major area with a huge scope for improvement. Application/Improvements: The comprehensive survey also recommends further works in the area of data integration techniques.

Keywords

Conflict Identification, Conflict Resolution, Data Integration, Data Conflicts, Inconsistency Resolution.

Full Text:

 |  (PDF views: 199)

References


  • Nachouki G, Quafafou M. Multi-data source fusion. Special Issue on Web Information Fusion. 2008; 9(4):523–37.
  • Ivan R, Dodero JM, Stoitsis J. Non-functional aspects of information integration and research for the web science. Procedia Computer Science. 2011; 4(1):1631-9.
  • Huimin Z, Ram S. Combining schema and instance information for integrating heterogeneous data sources. Data and Knowledge Engineering. 2007; 61(2):281-303.
  • Tao YJ, Raghavan VV, Zu Z. Web information fusion: A review of the state of the art. Information Fusion. 2008; 9(4):446-9.
  • Yu L, Huang W, Wang S, Lai KK. Web warehouse – A new web information fusion tool for web mining. Information Fusion. 2008; 9(4):501-11.
  • Wolfgang M, Lausen G. A uniform framework for integration of information from the web. Information Systems. 2004; 29(1):59-91.
  • Calvanese D, Giacomo GD, Lenzerini M, Nardi D, Rosati R. A principled approach to data integration and reconciliation in data warehousing. Proceedings of the International Workshop on Design and Management of Data Warehouses; 1999. p. 16-1-11.
  • Philipp A, Motro A. Data integration: Inconsistency detection and resolution based on source properties. Proceedings of FMII-01, International Workshop on Foundations of Models for Information Integration; 2001. p. 1-15.
  • Faraz F, Noessner J, Kiss E, Stuckenschmidt H. Mapping assistant: Interactive conflict-resolution for data integration. Poster at the 8th Extended Semantic Web Conference (ESWC); 2011.
  • Channah NF, Ouksel AM. A classification of semantic conflicts in heterogeneous database systems. Journal of Organizational Computing and Electronic Commerce. 1995; 5(2):167-93.
  • Xin Y, Zhang L, Zhong Q, Hui P. A novel method for data conflict resolution using multiple rules. ComSIS. 2013; 10(1):215-35.
  • Jens B. Data Fusion and Conflict Resolution in Integrated Information Systems. Hasso-Plattner-Institute for Software System Techniqu; 2010. p. 1-184.
  • Luna DX, Naumann F. Data fusion: Resolving data conflicts for integration. Proceedings of the Very Large Database Endowment; 2009. p. 1654-5.
  • Alon YL. Logic-based techniques in data integration. Logic-Based Artificial Intelligence. US: Springer; 2000. p. 575-95.
  • Molina G, Hammer HJ, Ireland K, Papakonstantinou Y, Ullman J, Widom J. Integrating and accessing heterogeneous information sources in TSIMMIS. Proceedings of the AAAI Symposium on Information Gathering; 1995. p. 1-4.
  • Carey MJ, Haas LM, Schwarz PM, Arya M, Cody WF, Fagin R, Flickner M, Luniewski AW, Niblack W, Petkovic D, Thomas J, Williams JH, Wimmers EL. Towards heterogeneous multimedia information systems: the Garlic approach. Proceedings of the 5th International Workshop on Research Issues in Data Engineering-Distributed Object Management (RIDE-DOM’95); 1995. p. 161–73.
  • Manolescu I, Florescu D, Kossmann D, Olteanu D, Xhumari F. Agora: Living with XML and relational. Proceedings of the International Conference on Very Large Databases (VLDB); 2000. p. 623–6.
  • Halvey AY, Ives ZG, Mork P, Tartarinov I. Piazza: Data management infrastructure for semantic web applications. Proceedings of the 12th International Conference on World Wide Web; 2003. p. 556–67.
  • Wache H, Vogele T, Visser U, Stuckenschmidt H, Schuster G, Neumann H, Hubner S. Ontology-based integration of information. A Survey of Existing Approaches. 2001. p. 1-10.
  • Heeseok J, Jeong H. Ontology-based Integration and refinement of evaluation-committee data from heterogeneous data sources. Indian Journal of Science and Technology. 2015; 8(23):1-7.
  • Witold L, Abdellatif A, Zeroual A, Nicolas B, Vigier P. MSQL: A multi-database language. Information Sciences. 1989; 49(1):59-101.
  • Markus T, Scholl MH. A classification of multi-database languages. IEEE Proceedings of the 3rd International Conference on Parallel and Distributed Information Systems; Austin, Texas. 1994. p. 1-20.
  • Litwin W, Abdellatif A. An overview of the multi-database manipulation language MDSL. Proceedings of the IEEE; 1987; 75(5):621-32.
  • Lakshmanan VS, Sadri F, Subramanian SN. Schema SQL: An extension to SQL for multi-database interoperability. ACM Transactions on Database Systems (TODS); 2001 Dec; 26(4):476-519.
  • Lakshmanan L, Sadri F, Subramanian IN. Schema SQL - A language for interoperability in relational multi-database systems. Proceedings of the 22nd Conference Mumbai (Bombay), India, Very Large Data Base; 1996. p. 239-50.
  • Carol I, Kumar SBR. Conflict resolution and duplicate elimination in heterogeneous datasets using unified data retrieval techniques. Indian Journal of Science and Technology. 2015; 8(22):1-6.
  • Khazalah F, Malik Z, Rezgui A. Automated conflict resolution in collaborative data sharing systems using community feedbacks. Information Sciences. 2015; 298:407-24.
  • Weiguo F, Lu H, Madnick SE, Cheung D. Discovering and reconciling value conflicts for numerical data integration. Information Systems. 2001; 26(8):635-56.
  • Ramkumar T, Hariharan S, Selvamuthukumaran S. A survey on mining multiple data sources. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery. 2013; 3(1):1-11.
  • Ramkumar T, Srinivasan R, Hariharan S. Synthesizing global association rules from different data sources based on desired interestingness metrics. The International Journal of Information Technology and Decision Making. 2014; 13(3):473-95.

Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.