CSI 710 - Fall 2007 - WebCT Course Syllabus Details


Course Information
Course title: Scientific Databases
Course number: CSI710/IT864
Course discipline: Computing Sciences
Course description: Untitled Page

This is an interdisciplinary course in advanced databases for the purpose of facilitating scientific research. The course focuses on:

  • Issues related to database support for scientific data management;
  • Requirements and properties of scientific databases;
  • Data model, domain models and ontologies for scientific and statistical databases;
  • Semantic and object-oriented modeling of application domains;
  • Data Structures for scientific and statistical database management;
  • Statistical databases and On-line Analytical Processing (OLAP);
  • Scientific Data Mining and Discovery Informatics;
  • The new e-Science research paradigm;
  • Case studies such as the Human Genome Project, NASA's Earth Observing System, the Hubble Space Telescope, and the National Virtual Observatory.

This is a hands-on course. You will learn-by-doing as your team specifies and develops a scientific database application.

Course date: Tuesday, August 28, 2007 through Tuesday, December 11, 2007
Location: Robinson Hall room B113, and On-line via WebCT
Meeting day(s): Tuesdays
Meeting time(s): 7:20 to 10:00 PM
Prerequisite(s): INFS 614, or equivalent, or Permission of Instructor
Instructor Information
Name: Drs. Ruixin Yang and Kirk Borne
Email: yang@yang.gmu.edu, kborne@gmu.edu
Office location: R.Yang: Research I room 226
Office hours: Tuesdays 5:00-7:00pm
Phone: 703-993-3615
Biography: More information regarding the other instructors is available here: http://classweb.gmu.edu/kborne/csi710/
Grading Policy
Introduction: Untitled Page

Your grade in the course will be determined by grades obtained on the homework assignments, your research project, peer evaluation, class participation, a take-home final exam, and an in-class final exam (December 11, 2007).

The percentage breakdown is as follows:

Assignments, Class Participation 20%
Research Project Team Paper (including peer evaluation) 20%
Research Project Team Presentation (including peer evaluation) 20%
Final Exam (take-home plus in-class portions) 40%
NOTE: Instructors may submit Term Papers to the TurnItIn.com plagiarism-detection service, in compliance with GMU policy, Provost approval, and the GMU Honor Code. Note that the GMU Honor Code is in effect for all of your course assignments, and we expect your homework, papers, and exams to be in your own writing style, containing your own creative content!
Web sites
Syllabus URL: http://classweb.gmu.edu/kborne/csi710/
Course Goals
Course goals: Course Goals

Course Objectives

Students will learn how database management tools and techniques can be used: 1) to model scientific and statistical databases, 2) to access and query them, and 3) to manage both their meta-data and their data.

Scientific applications pose different problems from those of traditional transaction-oriented database applications. The course will examine the requirements of scientific databases, advanced data modeling techniques to capture the semantics of scientific applications (especially the temporal, spatial, and statistical aspects of the data), the need for data repositories, and the need for advanced retrieval capabilities.

The course will have several invited lecturers who will focus on scientific applications such as the Bioinformatics and Human Genome Project; the Earth Observing System Data and Information System; Geospatial Databases and GIS; Space Sciences and Virtual Observatories; e-Science; Scientific Data Mining; Knowledge Sifter and Semantic Databases.

Students will also have homework assignments that focus on the design, construction, and access of on-line scientific databases.

Group Research Projects

The goal of the research project is to supplement the course material by allowing students to study a particular area in depth. One outcome from the research project might be a grant proposal to an agency (such as NASA, the Office of Naval Research, the National Science Foundation) to support the student's research for the doctoral dissertation, or perhaps the goal might be the submission of your research paper to a peer-reviewed journal. 

Projects will be developed by interdisciplinary teams of students that work together to address a scientific database problem.

The minimal goal of the Group Research Project is to do a survey of the topic and to identify the following characteristics: major papers in the field, research requirements on the system, existing database implementations, new research directions, your suggestions for new database solutions (e.g., schema, architectures, interfaces, data products, metadata products, retrieval options), and applications (e.g., what real scientific questions can be answered with the data/tools).

The preferred goal for the Group Research Project is to do all of those things, plus (a) to develop an operational implementation of a scientific database system to address one or more of the research requirements in that scientific discipline (referred to as an "implementation project"), or (b) (alternatively) to develop a real scientific research application based on the existing global cyber-infrastructure and specific scientific database systems (referred to as an "application project").

A more ambitious goal for the Group Research Project, and one that some more advanced students might attempt, would be a research paper of publishable quality for a professional research conference or journal. One such conference is the International Conference on Scientific and Statistical Database Management (SSDBM -- the 2001 SSDBM conference was hosted by George Mason University).  The latest such conferences were held in Vienna Austria (http://www.ocg.at/ssdbm2006/ ) and in Banff Canada (http://ssdbm2007.cpsc.ucalgary.ca/ ). The next such conference will be held in Hong Kong (http://i.cs.hku.hk/~ssdbm/ ). A refereed journal for possible publication of your implementation project research results is the Data Science Journal. An application project paper may appear in a specific scientific research journal, depending on the application area.

For implementation projects, we are interested in seeing projects based on the development of prototype systems that would take advantage of technology here at GMU. Projects will include research activities that define user models, domain models, database architectures, proposed standards, interfacing with research endeavors, etc. Teams will be composed of natural science and computer science majors to facilitate the development of a relevant solution to the scientific information management/distribution problem.

For application projects, we are interested in seeing science research projects that invoke data and/or tools that are based on the global cyber-infrastructure, existing data systems, tools with remote data access capabilities, and other SSDBMs. Projects should define a current scientific research problem (a question or a hypothesis), identify data sources that are available via the Internet for answering the question (or confirming the hypothesis), and then utilize (and enhance) tools to access and analyze the data that addresses the scientific research problem. Since the existing systems and tools may not fully satisfy the research needs for real science, programming for system/tool enhancement is expected. Examples include, but not limited to, programs for automatically accessing a large amount of data, or data processing on-the-fly, or scripts for server-side data manipulations, or other client science applications.

Our expectations from you for both types of projects are a thorough and comprehensive survey, with a substantial list of references, plus a well written scholarly document that will be useful to others in the class. The research project may be chosen from the list below, or an entirely different one may be proposed. The first step is to agree upon a topic with the instructor. Next, a short proposal should be written to delineate the topic and to document your proposed approach, the expected results, and the resources needed (including computing facilities and software). Consider this as a pre-proposal for the project that includes the specific aims and the division of work between the team members. This short proposal when approved will serve as the contract between the students and the instructors for the Research Project. This is the same process that the grant agencies use for submissions. The project deadlines are firm and you will not receive full credit for late work.

So as to avoid embarrassing situations at the end of the semester, particularly requests for a grade of incomplete, there will be well-defined milestones with deliverables. There are no incompletes in CSI 710/INFT864! No excuses will be accepted.

We have several goals for these projects:

  1. To help you to understand the relevant issues in Scientific Databases,
  2. To leverage existing cyber-infrastructure data systems for real scientific research,
  3. To produce some technical reports or usable software code if appropriate,
  4. To publish the results of your projects in workshops or conferences involving Scientific Databases, and
  5. To experience how to work with your peers on a scientific research project.

Finally, we want to stress that you will learn by doing in this course.

Proposed Research Project Subtopics

The following are some topics that might lead to implementation projects:

  • Web Services for scientific data sharing, interoperability, and discovery;
  • Knowledge Sifter: Knowledge Acquisition and Integration from multiple heterogeneous information sources;
  • Ontology Specification for Scientific Database Domains
    (see Protégé system from Stanford University http://protege.stanford.edu/ );
  • XML for Data and Process Interchange in Scientific Databases;
  • Development of XML language and/or Web Services for scientific data sharing, interoperability, and discovery;
  • Electronic Marketplaces for Sharing, Collaboration, and Auctioning of Scientific Data;
  • Knowledge Portals for Multimedia Scientific Information.  Please see the San Diego Supercomputing Center for Science Projects;
  • The integration and interchange of information among multiple Scientific Databases;
  • Schema Integration across multiple Scientific Databases;
  • Meta-Data Management for Scientific Databases. Examples are the Earth Observing System and the Genome Project;
  • Query language primitives for domain-specific Scientific Databases;
  • Grid Services for Scientific Data Manipulation, Mining, and/or Analysis;
  • Intelligent Query Formulation for Scientific Databases;
  • The application of Artificial Intelligence in Scientific Databases;
  • Architectures for Scientific Databases Systems;
  • Agents, Knowledge Rovers, and Mediators for Scientific Databases;
  • Collaborative Research Environments;
  • Active Scientific Databases and Dictionaries: Integrating Production Systems, Meta-data, and Databases;
  • The application of logic query languages for Scientific Databases;
  • QBE (Query-By-Example) interface to Scientific Databases;
  • Rule-based query optimization techniques over complex scientific data;
  • Multimedia and Hypermedia Data Models for Scientific Databases;
  • Data Mining and Knowledge Discovery in Scientific Databases;
  • Application of CRM, target marketing, or interactive marketing concepts to Scientific Database user support;
  • Incorporating scientific data types in extensible databases;
  • Model Management in Scientific Databases;
  • Temporal and/or spatial data models and languages for Scientific Databases;
  • Physical database structures for Scientific Databases;
  • Metadata browser for an FTP-accessible Scientific Data Repository;
  • Parallel and Distributed Scientific Databases;
  • Extensible Object-Oriented Scientific Databases;
  • Ontology specification for Scientific Database domains;
  • World Wide Web Scientific Database systems;
  • Laboratory Information Management systems;

The following systems/tools/projects may be leveraged for application projects:

Textbooks
Required reading: Database System Concepts, Silberschatz, Korth, and Sudarshan, McGraw-Hill, 5th edition (published in 2005), 0-07-295886-3
Required reading: The Semantic Web: A Guide to the Future of XML, Web Services, and Knowledge Management, Daconta, Obrst, and Smith, Wiley, 2003, 0-471-43257-1