Background
Computational science is a rapidly growing multidisciplinary field
that uses advanced computing capabilities to understand and solve
complex problems. An integral part of future success of the
computational science field is the availability of high-performance
computers and a supporting infrastructure. Because of the scale of
resources needed, and the distributed location of users, it is
expected that a highly distributed infrastructure should be
employed. Currently the most likely candidate is the Computational
Grid, which is a distributed infrastructure that appears to an end
user as one large computing resource across organization boundaries.
The Computational Grid is based on the concept of a network of
computers and storage systems, making computational readily power
available. Similar to the power Grid, users should not have to know
where this power actually is "made" (where the code is executed).The
computing task is just submitted to the Grid and the result is
returned after a while. So far, Grid computing as gained some maturity
with respect to the actual computation. However, the management of
data in Grid networks is still a very immature area. In general,
simple files are used. As has been the experience in computer science
for the last decades, the availability of database management systems
(DBMS), which reduce coupling between programs and data have many
advantages. The resulting data independence can give higher
performance, but more importantly increases program maintenance,
sharing, and security of data. It should also be mentioned that
typical applications of Grids use relatively complex, structured data,
containing lots of references. It is also the case that much of the
data will be local data that should be made available to the outside
world or querying, but for various reasons (including the size of the
data volumes) the raw/source data itself should not be distributed.
It is thus evident that DBMSs should be an integral part of the Grid
infrastructure.
However, a centralized DBMS is not applicable in the heavily
distributed Grid context. Because of the need for autonomy, high
availability, and loose coupling between participating sites in the
Grid, a traditional distributed DBMS is not a good solution. Data
management in a Grid context has two aspects which makes it differ
from more traditional approaches: a) large amounts of data is created
and used by the creator, as well as b) part of the data, mostly
summary data, can also be accessed and used by other Grid
participants. An example of such applications is weather forecasting,
where the national weather forecasting institutions have large amounts
of locally collected data, do forecast, and make the resulting data
available. They also store historical data, and both the summary data
and historical data will be of interest to, and used by, other weather
forecasting institutions. The data will also be interesting for
researchers in other areas, an example can be environmental research
trying to correlate historical weather data with other observations
like farming produce and urban development.
Our solution to the Grid database support problem is a Grid DBMS based
on the peer-to-peer (P2P) paradigm, where the use of P2P technology
aims at supporting both scalability, availability, and efficient
querying in the presence of loose coupling.
A challenge when dealing with data from different research areas is
how to find data sources, how to combine them, and in particular in
the case of historical data where possible schema changes/metadata
changes have occurred. In a highly distributed context and with the
desire of using little human resources, creating wrappers and similar
traditional technologies are not applicable. A solution to this
problem, is an ontology-based approach for mapping between data
sources.
The DASCOSA project is a research project funded by the Norwegian
Research Council under the eVITA
research and infrastructure programme.
For more information please contact the project leader, Dr. Kjetil Nørvåg.
For more information about the research group and the department,
please visite the respective home pages:
The Data and Information
Management Group
Department of computer and
information science