ECS 268: Scientific Data & Workflow Management

Subject
ECS 268
Title
Scientific Data & Workflow Management
Status
Active
Units
4.0
Effective Term
2008 Winter Quarter
Learning Activities
Lecture: 3 hours
Discussion: 1 hour
Description
Scientific data integration, metadata, knowledge representation, ontologies, scientific workflow design and management.
Prerequisites
ECS 165A

Summary of Course Content

I. Introduction to scientific data management: goals and challenges

II. Scientific data models, transformations

  • Generic data exchange formats (XML)
  • Specialized data/file formats (netCDF, HDF5, FASTA, Nexus)
  • Tree-based data transformations (XPath, XQuery, XSLT)
  • XSLT, XQueryXML model and query/transformation languages
    • Database integration, query rewriting

III. Knowledge representation with ontologies

  • From controlled vocabularies, taxonomies, to description logic ontologies
  • Reasoning with ontologies

IV. Data integration

  • Schema-mapping based approaches: Global-as-View (GAV), Local-as-View (LAV); Extensions
  • Ontology-based extensions for data integration

V. Scientific Workflows

  • Introduction/motivation: capturing in silico experiments as scientific workflows
  • Application examples from diverse domains (e.g., bioinformatics, ecoinformatics, particle physics)
  • Formal models for scientific workflows: Petri nets, Kahn process networks, Synchronous Dataflow
  • Scientific workflow design paradigms: Collection-Oriented Modeling & Design (COMAD), higher order/functional programming patterns
  • Data and workflow provenance models

Projects

There are two kinds of projects: implementation projects and research projects. In the former, the students will work with Java-based open source systems such as the Kepler workflow system (www.kepler-project.org) and design and implement example workflows, e.g., to create a bioinformatics workflow that connects several "bio web services". Thus, in implementation projects students work with existing software systems, but they typically will also implement project-specific extensions to that software.

For research projects, students will read one or more research papers from a list of offered research topics (e.g., scientific data integration, ontologies and knowledge representation in scientific data management, scientific workflows). Students will then need to apply the results of the research papers to a specific problem (e.g., applying a certain query rewriting algorithm to a given integration scenario and set of queries). In general, the deliverable of a research project is a technical report that summarizes and compares the results of the studied papers, and their application to the given problem. Depending on the topic, the presented algorithms might have to be implemented and applied to the given problem instance.

Illustrative Reading

A selection of technical papers addressing specific topics will be used. No textbook is required.

Potential Course Overlap

There is no significant overlap with any other course.

Course Category