| Course Information | |||||||||
| Course title: | Scientific Databases | ||||||||
| Course number: | CSI710/IT864 | ||||||||
| Course discipline: | Computing Sciences | ||||||||
| Course description: |
This is an interdisciplinary course in advanced databases for the purpose of facilitating scientific research. The course focuses on:
This is a hands-on course. You will learn-by-doing as your team specifies and develops a scientific database application. |
||||||||
| Course date: | Tuesday, August 28, 2007 through Tuesday, December 11, 2007 | ||||||||
| Location: | Robinson Hall room B113, and On-line via WebCT | ||||||||
| Meeting day(s): | Tuesdays | ||||||||
| Meeting time(s): | 7:20 to 10:00 PM | ||||||||
| Prerequisite(s): | INFS 614, or equivalent, or Permission of Instructor | ||||||||
| Instructor Information | |||||||||
| Name: | Drs. Ruixin Yang and Kirk Borne | ||||||||
| Email: | yang@yang.gmu.edu, kborne@gmu.edu | ||||||||
| Office location: | R.Yang: Research I room 226 | ||||||||
| Office hours: | Tuesdays 5:00-7:00pm | ||||||||
| Phone: | 703-993-3615 | ||||||||
| Biography: | More information regarding the other instructors is available here: http://classweb.gmu.edu/kborne/csi710/ | ||||||||
| Grading Policy | |||||||||
| Introduction: |
Your grade in the course will be determined by grades obtained on the homework assignments, your research project, peer evaluation, class participation, a take-home final exam, and an in-class final exam (December 11, 2007). The percentage breakdown is as follows:
|
||||||||
| Web sites | |||||||||
| Syllabus URL: | http://classweb.gmu.edu/kborne/csi710/ | ||||||||
| Course Goals | |||||||||
| Course goals: |
Course ObjectivesStudents will learn how database management tools and techniques can be used: 1) to model scientific and statistical databases, 2) to access and query them, and 3) to manage both their meta-data and their data. Scientific applications pose different problems from those of traditional transaction-oriented database applications. The course will examine the requirements of scientific databases, advanced data modeling techniques to capture the semantics of scientific applications (especially the temporal, spatial, and statistical aspects of the data), the need for data repositories, and the need for advanced retrieval capabilities. The course will have several invited lecturers who will focus on scientific applications such as the Bioinformatics and Human Genome Project; the Earth Observing System Data and Information System; Geospatial Databases and GIS; Space Sciences and Virtual Observatories; e-Science; Scientific Data Mining; Knowledge Sifter and Semantic Databases. Students will also have homework assignments that focus on the design, construction, and access of on-line scientific databases. Group Research ProjectsThe goal of the research project is to supplement the course material by allowing students to study a particular area in depth. One outcome from the research project might be a grant proposal to an agency (such as NASA, the Office of Naval Research, the National Science Foundation) to support the student's research for the doctoral dissertation, or perhaps the goal might be the submission of your research paper to a peer-reviewed journal. Projects will be developed by interdisciplinary teams of students that work together to address a scientific database problem. The minimal goal of the Group Research Project is to do a survey of the topic and to identify the following characteristics: major papers in the field, research requirements on the system, existing database implementations, new research directions, your suggestions for new database solutions (e.g., schema, architectures, interfaces, data products, metadata products, retrieval options), and applications (e.g., what real scientific questions can be answered with the data/tools). The preferred goal for the Group Research Project is to do all of those things, plus (a) to develop an operational implementation of a scientific database system to address one or more of the research requirements in that scientific discipline (referred to as an "implementation project"), or (b) (alternatively) to develop a real scientific research application based on the existing global cyber-infrastructure and specific scientific database systems (referred to as an "application project"). A more ambitious goal for the Group Research Project, and one that some more advanced students might attempt, would be a research paper of publishable quality for a professional research conference or journal. One such conference is the International Conference on Scientific and Statistical Database Management (SSDBM -- the 2001 SSDBM conference was hosted by George Mason University). The latest such conferences were held in Vienna Austria (http://www.ocg.at/ssdbm2006/ ) and in Banff Canada (http://ssdbm2007.cpsc.ucalgary.ca/ ). The next such conference will be held in Hong Kong (http://i.cs.hku.hk/~ssdbm/ ). A refereed journal for possible publication of your implementation project research results is the Data Science Journal. An application project paper may appear in a specific scientific research journal, depending on the application area. For implementation projects, we are interested in seeing projects based on the development of prototype systems that would take advantage of technology here at GMU. Projects will include research activities that define user models, domain models, database architectures, proposed standards, interfacing with research endeavors, etc. Teams will be composed of natural science and computer science majors to facilitate the development of a relevant solution to the scientific information management/distribution problem. For application projects, we are interested in seeing science research projects that invoke data and/or tools that are based on the global cyber-infrastructure, existing data systems, tools with remote data access capabilities, and other SSDBMs. Projects should define a current scientific research problem (a question or a hypothesis), identify data sources that are available via the Internet for answering the question (or confirming the hypothesis), and then utilize (and enhance) tools to access and analyze the data that addresses the scientific research problem. Since the existing systems and tools may not fully satisfy the research needs for real science, programming for system/tool enhancement is expected. Examples include, but not limited to, programs for automatically accessing a large amount of data, or data processing on-the-fly, or scripts for server-side data manipulations, or other client science applications. Our expectations from you for both types of projects are a thorough and comprehensive survey, with a substantial list of references, plus a well written scholarly document that will be useful to others in the class. The research project may be chosen from the list below, or an entirely different one may be proposed. The first step is to agree upon a topic with the instructor. Next, a short proposal should be written to delineate the topic and to document your proposed approach, the expected results, and the resources needed (including computing facilities and software). Consider this as a pre-proposal for the project that includes the specific aims and the division of work between the team members. This short proposal when approved will serve as the contract between the students and the instructors for the Research Project. This is the same process that the grant agencies use for submissions. The project deadlines are firm and you will not receive full credit for late work. So as to avoid embarrassing situations at the end of the semester, particularly requests for a grade of incomplete, there will be well-defined milestones with deliverables. There are no incompletes in CSI 710/INFT864! No excuses will be accepted. We have several goals for these projects:
Finally, we want to stress that you will learn by doing in this course. Proposed Research Project SubtopicsThe following are some topics that might lead to implementation projects:
The following systems/tools/projects may be leveraged for application projects:
|
||||||||
| Textbooks | |||||||||
| Required reading: | Database System Concepts, Silberschatz, Korth, and Sudarshan, McGraw-Hill, 5th edition (published in 2005), 0-07-295886-3 | ||||||||
| Required reading: | The Semantic Web: A Guide to the Future of XML, Web Services, and Knowledge Management, Daconta, Obrst, and Smith, Wiley, 2003, 0-471-43257-1 | ||||||||