Artemis: Integrating Scientific Data on the Grid

Rattapoom Tuchinda, Snehal Thakkar, Yolanda Gil, and Ewa Deelman

Grid technologies provide a robust infrastructure for distributed computing, and are widely used in large-scale scientific applications that generate terabytes (soon petabytes) of data. These data are described with metadata attributes about the data properties and provenance, and are organized in a variety of metadata catalogs distributed over the grid. In order to find a collection of data that share certain properties, these metadata catalogs need to be identified and queried on an individual basis. This paper introduces Artemis, a system developed to integrate distributed metadata catalogs on the grid. Artemis exploits several AI techniques including a query mediator, a query planning and execution system, ontologies and semantic web tools to model metadata attributes, and an intelligent user interface that guides users through these ontologies to formulate queries. We describe our experiences using Artemis with large metadata catalogs from two projects in the physics domain.

