IPSJ Digital Courier
Online ISSN : 1349-7456
ISSN-L : 1349-7456
A Distributed-Processing System for Accelerating Biological Research Using Data-Staging
Yoshiyuki KidoShigeto SenoSusumu DateYoichi TakenakaHideo Matsuda
Author information
JOURNAL FREE ACCESS

2008 Volume 4 Pages 250-256

Details
Abstract

The number of biological databases has been increasing rapidly as a result of progress in biotechnology. As the amount and heterogeneity of biological data increase, it becomes more difficult to manage the data in a few centralized databases. Moreover, the number of sites storing these databases is getting larger, and the geographic distribution of these databases has become wider. In addition, biological research tends to require a large amount of computational resources, i.e., a large number of computing nodes. As such, the computational demand has been increasing with the rapid progress of biological research. Thus, the development of methods that enable computing nodes to use such widely-distributed database sites effectively is desired. In this paper, we propose a method for providing data from the database sites to computing nodes. Since it is difficult to decide which program runs on a node and which data are requested as their inputs in advance, we have introduced the notion of “data-staging” in the proposed method. Data-staging dynamically searches for the input data from the database sites and transfers the input data to the node where the program runs. We have developed a prototype system with data-staging using grid middleware. The effectiveness of the prototype system is demonstrated by measurement of the execution time of similarity search of several-hundred gene sequences against 527 prokaryotic genome data.

Content from these authors
© 2008 by the Information Processing Society of Japan
Previous article Next article
feedback
Top