Data and databases used by scientists. Includes basics about
database organization, queries, and distributed data systems.
Student exercises will include queries of existing systems,
along with basic design of simple database systems.
Stat 354
(Probability and Statistics for Engineers and Scientists II).
Detailed Course Overview:
This course will review data and databases used by scientists.
Modern scientific research projects are often characterized by
the collection, organization, and analysis of databases.
Topics to be covered will include basic concepts about user
requirements analysis, data modeling, database organization,
metadata, query languages, markup languages, data registries,
web services, and distributed data systems. Application-related
concepts will be presented, including data discovery,
data fusion, data integration, data mining, data grids,
user query interfaces, decision support, knowledge representation,
ontologies, the semantic web, inference from databases,
and machine intelligence. Science-related concepts will be
presented, including science data formats, data dictionaries,
informatics, noise and error-handling, multi-level data products,
science use cases, data-driven discovery, data-centric science,
spatio-temporal databases, e-Science, virtual observatories,
annotation systems, and grid computing. Case studies will be
presented from various science disciplines, including
space science, bioinformatics, earth science, geographic
information systems, and numerical simulation research.
Student exercises will include queries of existing systems,
along with basic design of simple database systems.
Grading:
30% = Homework and Lab Exercises
10% = Class Participation
20% = Midterm Exam
40% = Final Exam
Course Objectives:
to become familiar with a variety of large scientific database projects,
their goals and implementation;
to become capable in using database and data management techniques
to solve scientific problems; and
to acquire knowledge in database and data management techniques that
will enable the student to progress to more advanced courses,
research projects, and employment opportunities that require database
skills and science data understanding.