Total views : 273

An Improved Indexing Method for Xpath Queries


  • Institute of Information and Technology, Vietnam Academy of Science and Technology (VAST), Hanoi, Viet Nam
  • Food Industrial College, Phu Tho, Viet Nam


Today, the XML is used as data storage for complex data models like bioinformatics information. A bioinformatics system deals with large data sets and complex queries. Thus, it is necessary to have accessing methods for XML data. XPath is a method to quickly locate any information that we need in an XML (tree) data starting from the context node in a root node to subtrees. In this paper, we propose a system model to store XML data more efficiently and also an improved indexing method to support Xpath queries. In the system model, we integrated big data model with relational data model in order to get benefit from both of them. The new indexing method is an improvement of R-tree that helps Xpath queries run more efficiently in some axes. Our experiments showed that the proposed method gains better results for node queries compared to the R-tree in transformed XML data. Our method is intended to apply to phylogenetic queries of Treefam databases.


Bioinformatics, Hadoop, Indexing, XML Data, Xpath Queries.

Full Text:

 |  (PDF views: 177)


  • Haw S, Lee C. Data Storage Practices and Query Processing in XML Databases: A Survey. Interna-tional Journal of Knowledge-Based Systems, Elsevier. 2011; 1317–40.
  • Alghamdi NS, Rahayu W, Pardede E. Semantic-based Structural and Content indexing for the efficient retrieval of queries over large XML data repositories. Journal of Future Generation Computer Systems. 2014 Jul; 212–31.
  • Chung J, Min CW, Shim K. APEX: an adaptive path index for XML data. Proceedings of ACM SIGMOD. 2002; 121–32.
  • Han JY, Liang ZP, Qian G. A multiple-depth structural index for branching query. Journal of Information and Software Technology. 2006; 928–36.
  • Rao P, Moon B. PRIX: indexing and querying XML using prufer sequences. Proceedings of ICDE, IEEE. 2004; 288–300.
  • Tatikonda S, Parthasarathy S, Goyder M. LCS–Trim: dynamic programming meets XML indexing and querying. Proceedings of the 33rd International Conference on Very Large Data Bases. 2007. p. 63–74.
  • Haw S, Lee C. Node labeling schemes in XML query optimization: a survey and trends. Journal of IETE Tech Rev. 2009; 88–100.
  • Dietz P. Maintaining order in a linked list. Proceedings of the Fourteenth Annual ACM Symposium on Theory of Computing, ACM. 1982; 122–7.
  • Li Q, Moon B et al. Indexing and querying XML data for regular path expressions. Proceedings of the Interna-tional Conference on Very Large Data Bases. 2001. p. 361–70.
  • Zhang C, Naughton J, DeWitt D, Luo Q, Lohman G. On supporting containment queries in relational database management systems. Journal of ACM SIGMOD Record, ACM. 2001; 425–36.
  • Tatarinov I, Viglas SD, Beyer K, Shanmugasundaram J, Shekita E, Zhang C. Storing and querying ordered XML using a relational database system. Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, SIGMOD ’02, ACM, New York. 2002. p. 204–15.
  • O’Neil P, O’Neil E, Pal S, Cseri I, Schaller G, Westbury N. Ordpaths: insert friendly XML node labels. Pro-ceedings of the 2004 ACM SIGMOD International Conference on Management of Data, SIGMOD ’04, ACM, New York. 2004. p. 903–8.
  • Grust T, Van Keulen M. Tree awareness for relational DBMS kernels: Staircase join. Journal of Lecture Notes in Computer Science. 2003; 231–45.
  • Grust T, Keulen MV, Teubner J. Accelerating XPath evaluation in any RDBMS. Journal of ACM Trans Data-base Syst. 2004; 91–131.
  • Guttman A. R-Trees: A dynamic index structure for spatial searching. Proceedings of SIGMOD, Boston, Mas-sachusetts. 1984.


  • There are currently no refbacks.

Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.