Background

Computational science is a rapidly growing multidisciplinary field that uses advanced computing capabilities to understand and solve complex problems. An integral part of future success of the computational science field is the availability of high-performance computers and a supporting infrastructure. Because of the scale of resources needed, and the distributed location of users, it is expected that a highly distributed infrastructure should be employed. Currently the most likely candidate is the Computational Grid, which is a distributed infrastructure that appears to an end user as one large computing resource across organization boundaries.

The Computational Grid is based on the concept of a network of computers and storage systems, making computational readily power available. Similar to the power Grid, users should not have to know where this power actually is "made" (where the code is executed).The computing task is just submitted to the Grid and the result is returned after a while. So far, Grid computing as gained some maturity with respect to the actual computation. However, the management of data in Grid networks is still a very immature area. In general, simple files are used. As has been the experience in computer science for the last decades, the availability of database management systems (DBMS), which reduce coupling between programs and data have many advantages. The resulting data independence can give higher performance, but more importantly increases program maintenance, sharing, and security of data. It should also be mentioned that typical applications of Grids use relatively complex, structured data, containing lots of references. It is also the case that much of the data will be local data that should be made available to the outside world or querying, but for various reasons (including the size of the data volumes) the raw/source data itself should not be distributed.

It is thus evident that DBMSs should be an integral part of the Grid infrastructure. However, a centralized DBMS is not applicable in the heavily distributed Grid context. Because of the need for autonomy, high availability, and loose coupling between participating sites in the Grid, a traditional distributed DBMS is not a good solution. Data management in a Grid context has two aspects which makes it differ from more traditional approaches: a) large amounts of data is created and used by the creator, as well as b) part of the data, mostly summary data, can also be accessed and used by other Grid participants. An example of such applications is weather forecasting, where the national weather forecasting institutions have large amounts of locally collected data, do forecast, and make the resulting data available. They also store historical data, and both the summary data and historical data will be of interest to, and used by, other weather forecasting institutions. The data will also be interesting for researchers in other areas, an example can be environmental research trying to correlate historical weather data with other observations like farming produce and urban development.

Our solution to the Grid database support problem is a Grid DBMS based on the peer-to-peer (P2P) paradigm, where the use of P2P technology aims at supporting both scalability, availability, and efficient querying in the presence of loose coupling. A challenge when dealing with data from different research areas is how to find data sources, how to combine them, and in particular in the case of historical data where possible schema changes/metadata changes have occurred. In a highly distributed context and with the desire of using little human resources, creating wrappers and similar traditional technologies are not applicable. A solution to this problem, is an ontology-based approach for mapping between data sources. The DASCOSA project is a research project funded by the Norwegian Research Council under the eVITA research and infrastructure programme.

For more information please contact the project leader, Dr. Kjetil Nørvåg.

For more information about the research group and the department, please visite the respective home pages:
The Data and Information Management Group
Department of computer and information science