A hierarchical data replication method in scientific data grid

 

Weizhong Lv1,2, Yuanchun Zhou1, Kaichao Wu1, Baoping Yan1

 

1Computer Network Information Center, Chinese Academy of Sciences, Beijing, China

2Graduate University of Chinese Academy of Sciences, Beijing, China

Email lvweizhong@sdb.cnic.cn

 

 

The Scientific Data Grid (SDG) of Chinese Academy of Sciences (CAS), which is based on the scientific database hosted and supported by different institutes, is a fundamental infrastructure for many data-intensive natural scientific research projects. The scientific database is the major scientific and technological information resources with a total data volume of hundreds of terabytes and consists of geographically distributed and heterogeneous multidisciplinary data resources. In the grid and distributed computing environment, data replication is an effective way to improve the data accessibility and accessing efficiency because data intensive applications produce large amounts of datasets for reliability and performance. A data replication method called Hierarchical Replication Model (HRM) is proposed in this paper. This method selects grid nodes as data replicas holders considering the information on data access frequencies, network topology, and information on links bandwidth. This method groups the network into three hierarchies to improve the data accessibility. This paper presents the framework of the Hierarchical Replication Model and gives a detailed resolution for simulating. Furthermore, simulation results have shown that this method speeds up the data transportation and increases the data accessing efficiency, and verified the effectiveness of this method at the same time.