CSI 710 - Fall 2006 - WebCT Course Syllabus Details


Course Information
Course title: Scientific Databases
Course number: CSI710/IT864
Course discipline: Computing Sciences
Course description: Untitled Page

This is an interdisciplinary course in advanced databases for the purpose of facilitating scientific research. The course focuses on:

  • Issues related to database support for scientific data management;
  • Requirements and properties of scientific databases;
  • Data model, domain models and ontologies for scientific and statistical databases;
  • Semantic and object-oriented modeling of application domains;
  • Data Structures for scientific and statistical database management;
  • Statistical databases and On-line Analytical Processing (OLAP);
  • Case studies such as the Human Genome Project, NASA's Earth Observing System, the Hubble Space Telescope, and the National Virtual Observatory.

This is a hands-on course. You will learn-by-doing as your team specifies and develops a scientific database application.

Course date: Tuesday, August 29, 2006 through Tuesday, December 12, 2006
Location: Innovation Hall room 222 and On-line via WebCT
Meeting day(s): Tuesdays
Meeting time(s): 7:20 to 10:00 PM
Prerequisite(s): INFS 614, or equivalent, or Permission of Instructor
Instructor Information
Name: Drs. Ruixin Yang, Kirk Borne, & John Tan
Email: yang@yang.gmu.edu, kborne@gmu.edu, jtan.gmu@gmail.com
Office location: R.Yang: Research I room 226
Office hours: Tuesdays 5:00-7:00pm
Phone: 703-993-3615
Biography: More information regarding the other instructors is available here: http://classweb.gmu.edu/kborne/csi710/
Grading Policy
Introduction: Untitled Page

Your grade in the course will be determined by grades obtained on the homework assignments, your research project, peer evaluation, class participation, and a take-home final exam.

The percentage breakdown is as follows:

Assignments, Class Participation 20%
Research Project Team Paper (including peer evaluation) 20%
Research Project Team Presentation (including peer evaluation) 20%
Final Exam (take-home) 40%
NOTE: Instructors may submit Term Papers to the TurnItIn.com plagiarism-detection service, in compliance with GMU policy, Provost approval, and the GMU Honor Code.
Web sites
Syllabus URL: http://classweb.gmu.edu/kborne/csi710/
Course Goals
Course goals: Course Goals

Course Objectives

Students will learn how database management tools and techniques can be used: 1) to model scientific and statistical databases, 2) to access and query them, and 3) to manage both their meta-data and their data.

Scientific applications pose different problems from those of traditional transaction-oriented database applications. The course will examine the requirements of scientific databases, advanced data modeling techniques to capture the semantics of scientific applications (especially the temporal, spatial, and statistical aspects of the data), the need for data repositories, and the need for advanced retrieval capabilities.

The course will have several invited lecturers who will focus on scientific applications such as the Bioinformatics and Human Genome Project; the Earth Observing System Data and Information System; Geospatial Databases and GIS; Knowledge Sifter and Semantic Databases.

Students will also have homework assignments that focus on the design, construction, and access of on-line scientific databases.

Group Research Projects

The field of Scientific and Statistical Database Management is evolving rapidly and much research and development needs to be done. 

The goal of the research project is to supplement the course material by allowing students to study a particular area in depth. One outcome from the research project might be a grant proposal to an agency (such as NASA, the Office of Naval Research, the National Science Foundation) to support the student's research for the doctoral dissertation. 

Projects will be developed by interdisciplinary teams of students that work together to address a scientific database problem. The minimal goal of this research project is to do a survey of the topic and to identify the following characteristics: major papers in the field, research requirements on the system, existing database implementations, new research directions, and your suggestions for new database solutions (e.g., schema, architectures, interfaces, data products, metadata products, retrieval options). The preferred goal for the Group Research Project is to do all of those things, plus to develop an operational implementation of a scientific database system to address one or more of the research requirements in that scientific discipline. A more ambitious goal for the Group Research Project, and one that some more advanced students might attempt, would be a research paper of publishable quality for a professional research conference or journal. One such conference is the International Conference on Scientific and Statistical Database Management (SSDBM -- the 2001 SSDBM conference was hosted by George Mason University).  The latest such conference was held in Vienna Austria (http://www.ocg.at/ssdbm2006/ ). The next such conference will be held in Banff Canada (http://ssdbm2007.cpsc.ucalgary.ca/ ). A refereed journal for possible publication of your research results is the Data Science Journal.

We are interested in seeing implementation-oriented projects based on the development of prototype systems that would take advantage of technology here at GMU. Projects will provide research that define user models, domain models, database architectures, proposed standards, interfacing with research endeavors, etc. Teams will be composed of natural science and computer science majors to facilitate the development of a relevant solution to scientific information management/distribution problem. Our expectations from you are a thorough and solid survey, with a substantial list of references, and a written document that will be useful to others in the class. Note that the GMU Honor Code is in effect, and we expect the paper to be in your own writing style!

The research project may be chosen from the list below, or an entirely different one may be proposed. The first step is to agree upon a topic with the instructor. Next, a short proposal should be written to delineate the topic and to document your proposed approach, the expected results, and the resources needed (including computing facilities and software). Consider this as a pre-proposal for the project that includes the specific aims and the division of work between the team members. This short proposal when approved will serve as the contract between the student and the instructor for the Research Project. This is the same process that the grant agencies use for submissions. The project deadlines are firm and you will not receive full credit for late work. The best team projects will be considered for presentation at a national meeting on scientific databases.

So as to avoid embarrassing situations at the end of the semester, particularly requests for a grade of incomplete, there will be well-defined milestones with deliverables. There are no incompletes in CSI 710/INFT864! No excuses will be accepted.

We have several goals for these projects:

  1. To help you to understand the relevant issues in Scientific Databases,
  2. To produce some technical reports or usable software code if appropriate,
  3. To publish the results of your projects in workshops or conferences involving Scientific Databases, and
  4. To experience how to work with your peers on a scientific research project.

Finally, we want to stress that you will learn by doing in this course.

Proposed Research Project Subtopics

The following are some topics that might lead to projects:

  • BioDAS Distributed Annotation System for Bioinformatics.
  • Web Services for scientific data sharing, interoperability and discovery.
    (See Dr. Kerschberg's INFS 770 web site (http://classweb.gmu.edu/kersch/infs770/ ) for information on Web services)
    (See the E-Center for E-Business Web Site for relevant publications (http://eceb.gmu.edu/publications.html );
  • Knowledge Sifter: Knowledge Acquisition and Integration from multiple heterogeneous information sources;
  • Ontology Specification for Scientific Database Domains
    (see Protégé system from Stanford University http://protege.stanford.edu/ );
  • XML for Data and Process Interchange in Scientific Databases;
  • Development of XML language and/or Web Services for scientific data sharing, interoperability, and discovery;
  • Electronic Marketplaces for Sharing, Collaboration, and Auctioning of Scientific Data;
  • Knowledge Portals for Multimedia Scientific Information.  Please see the San Diego Supercomputing Center for Science Projects;
  • The integration and interchange of information among multiple Scientific Databases;
  • Schema Integration across multiple Scientific Databases;
  • Meta-Data Management for Scientific Databases. Examples are the Earth Observing System and the Genome Project;
  • Query language primitives for domain-specific Scientific Databases;
  • Grid Services for Scientific Data Manipulation, Mining, and/or Analysis;
  • Intelligent Query Formulation for Scientific Databases;
  • The application of Artificial Intelligence in Scientific Databases;
  • Architectures for Scientific Databases Systems;
  • Agents, Knowledge Rovers, and Mediators for Scientific Databases;
  • Collaborative Research Environments;
  • Active Scientific Databases and Dictionaries: Integrating Production Systems, Meta-data, and Databases;
  • The application of logic query languages for Scientific Databases;
  • QBE (Query-By-Example) interface to Scientific Databases;
  • Rule-based query optimization techniques over complex scientific data;
  • Multimedia and Hypermedia Data Models for Scientific Databases;
  • Data Mining and Knowledge Discovery in Scientific Databases;
  • Application of CRM, target marketing, or interactive marketing concepts to Scientific Database user support;
  • Incorporating scientific data types in extensible databases;
  • Model Management in Scientific Databases;
  • Temporal and/or spatial data models and languages for Scientific Databases;
  • Physical database structures for Scientific Databases;
  • Metadata browser for an FTP-accessible Scientific Data Repository;
  • Parallel and Distributed Scientific Databases;
  • Extensible Object-Oriented Scientific Databases;
  • Ontology specification for Scientific Database domains;
  • World Wide Web Scientific Database systems;
  • Laboratory Information Management systems;
  • Database Support for large scale Scientific Programs:
    1. Human Genome Project
    2. Earth Observing Satellite Data Information System
    3. Materials Properties Databases
    4. Global Change Modeling
    5. Virtual Observatories @ http://www.ivoa.net/ or http://www.us-vo.org/
    6. Comprehensive Atmospheric Modeling Program
    7. Living With a Star @ http://lwsde.gsfc.nasa.gov/
    8. eGY (Electronic Geophysical Year) @ http://www.egy.org/
    9. IHY (International Heliophysical Year) @ http://ihy2007.org/
    10. Peer-to-Peer Science Data Exchange (SETI@Home, Folding@home, Einstein@home, LHC@home, prediction@home)
Textbooks
Required reading: Database System Concepts, Silberschatz, Korth, and Sudarshan, McGraw-Hill, 5th edition (published in 2005), 0-07-295886-3
Required reading: The Semantic Web: A Guide to the Future of XML, Web Services, and Knowledge Management, Daconta, Obrst, and Smith, Wiley, 2003, 0-471-43257-1