IMPLEMENTING THE FILE STORAGE SYSTEM

IN THE UKRAINIAN ACADEMIC GRID INFRASTRUCTURE

 

Salnikov A.O.1, Sliusar I.A.1, Sudakov O.O.1, Boyko Yu.V.1 and Kornelyuk O.I.2

 

1Information and Computer Center, National Taras Shevchenko Univresity of Kyiv, Glushkova prosp., 2, Kyiv, Ukraine, cluster@cluster.kiev.ua

2The Institute of Molecular Biology and Genetics of the National Academy of Sciences of Ukraine, Academician Zabolotny str., 150, Kyiv, Ukraine, kornelyuk@imbg.org.ua

 

 

HPC systems operate with huge amounts of data, which needs storage. The Grid Infrastructure implements access control of the computing elements (CE) and scratch data storage. In order to schedule a computation job, user must pass Grid authentication, and actually upload data to a CE. Upon job completion, user can obtain its output by means of middleware tools. If user wants to continue research within the same dataset, each iteration requires data transfers. Another case is when group of users wants to process the same dataset. Without unified storage system, it has to be duplicated for each user.

So, flexible and reliable data storage scheme is a need for the Grid Infrastructure. We’ve concluded a set of requirements for such system: (1) access to the storage element (SE) should be controlled relying on user’s distinguished name (DN) and virtual organization (VO) membership, and (2) data distribution between SEs should be transparent to the user.

The implemented solution relies on three major components. A (1) “Smart” Storage Element (SSE) is an autonomous service implementing data management without user’s intervention. It is part of HTTPSD, one of ARC middleware servers. For specification of authorization policies, SSE uses (2) Grid Access Control Lists (GACL), which allows policies to be written in terms of common Grid credentials: GSI proxies, VOMS attribute certificates and lists of X.509 identities.

Client part of the SSE is integrated into utilities provided by the middleware. To accommodate clients with data locations, SSE must be complemented with (3) Data Indexing Services (IS). SSE can register stored files at Globus Replica Catalog (RC) and Replica Location System (RLS) IS.

Summarizing the latter, Grid user has to provide only an URL of the Grid IS and SSE will be determined automagically by the client tools, which let him to set access permissions and transfer data to/from the client and directly between SEs.