CS 177: Introduction to Bioinformatics
Course Homepage: Spring 2003
Also see the new university
website for bioinformatics.
NOTE: Important information for Spring 2003:
- CS 177 enrollment is capped at 32 students. Register early!
- If you may be interested in bioinformatics as a career, consider
the Hughes program for undergraduates: see the Academics
section of the new university
website for bioinformatics for information, and for instructions
on how to apply.
- Students accepted to the Hughes program may be given priority
in registering for CS 177. See below for registration permission.
- To get permission:
- If you apply for the Hughes program, you do not need permission to
register for CS 177, because we will interpret your
application to the Hughes program as intent to register for CS 177.
If you are interested in the Hughes program but NOT in CS 177,
then you will have to explain why in your Hughes program application.
You may download the application form right here, either
in plain text or print this HTML version. Either way, submit hardcopy
as instructed on the form.
- CS students who are not applying to the Hughes program
can send an email to Prof. Simha explaining
your background, why you want to take the course and why you ought to
be selected.
- CCAS students who are not applying to the Hughes program
can send a similar email to Prof. Rob Donaldson
in Biology (robdon@gwu.edu).
Spring 2003 course information:
- Instructor:
Dr. Tom Wilke (mtmtxw@gwumc.edu).
Co-instructor: Prof. Rahul Simha
- Office Hours:
- Dr. Wilke: by appointment (send email: mtmtxw@gwumc.edu).
- Prof. Simha: Tue 1.30-3.30pm (Fall 2002 office hours) (simha@seas.gwu.edu)
- Class Time/Place:
-
CS 177: Mondays, 3.30 - 6.10 pm, Tompkins 405
- Prerequisites:
Permission of instructor.
- Course description:
This course will provide a broad introduction to the area of
bioinformatics.
Topics include: biochemistry overview, databases, the alignment
problem, proteins and protein structure-function,
introductory phylogenetics, and use of public databases.
- Textbook:
S.M.Brown. Bioinformatics: A Biologist's Guide to Biocomputing
and the Internet. Eaton Publishing.
Supplementary (optional) textbooks:
- D.W.Mount. Bioinformatics: Sequence and Genome Analysis. Cold
Spring Harbor Lab Press.
- C.Gibas and P.Gambeck. Developing Bioinformatics Computer Skills.
O'Reilly.
- Lecture schedule:
- Lecture 1 (Sep 9): Introduction
(Powerpoint slides)
- Motivating problem: manufacture of the Polio virus
- Description of problem, and synthesis of virus
- Informal "information perspective" of problem: string searching
- History: Traditional biology vs. new information-based biology
- What is bioinformatics? Narrow (genomics) definition and broad definition.
- Example using GenBank.
- The future: bioinformatics careers.
- Course goals.
- Lab tour.
- Lecture 2 (Sep 16):
Computing overview for
biologists
- Parts of the bioinformatics infrastructure: internet, databases.
- What is a database?
Structure of a database, traditional databases and the problems they solve.
Relations and tables. Interacting with a database: SQL.
Interacting programmatically: Java, JDBC.
- Web/internet
How does the internet work? What is web content (HTML etc)? What is a browser?
Web interface to a database. Search engines. Scripts.
- Lecture 3 (Sep 23): DNA/RNA overview
(Powerpoint slides)
(Homework 3 - in Word format)
- DNA and its components.
- DNA replication, transcription, translation, protein synthesis.
- DNA cloning technology: PCR, sequencing.
- Inheritance, mutation, recombination.
- Lecture 4 (Sep 30): Nucleotide and protein databases
(Powerpoint slides)
(Homework 4 - in Word format)
- Public sequence databases.
- Sequence retrieval and examples.
- Similarity searching.
- Gene identification.
- Genetic and physical map.
- Protein databases.
- Data exchange and management.
- Lecture 5 (Oct 7): Hands-on lab with databases
(Powerpoint slides)
(Homework 5 - in Word format)
- Motivating problem: a paper from the literature.
- Bio Background for CS students (to understand the paper).
- Dissecting their approach: using GenBank.
- Lab exercises in using GenBank
- Lecture 6 (Oct 14): The Alignment problem
(
Two-sequence alignment - HTML)
(Multiple alignment - Powerpoint)
- Pairwise alignment problem.
- Dynamic programming algorithm (from CS151 notes).
- Multiple alignment.
- Editing and formatting alignments.
- Examples of using alignments: a paper from the literature.
- Lecture 7 (Oct 21): Structure-Function Relationships
(Powerpoint slides)
- Review of protein structure and function.
- Review of experimental techniques to determine structure
- Mitochondrial DNA and the coalescent.
- RNA structure.
- Protein structure.
- The protein folding problem.
- Motifs.
- Lecture 8 (Oct 28): Phylogenetic Inference I
(Powerpoint slides)
- Taxonomy and phylogenetics.
- Cladistic vs Phenetic analyses.
- Phylogenetic signals.
- Models of sequence evolution.
- Lecture 9 (Nov 4): Phylogenetic Inference II
(Powerpoint slides)
- Phylogenetic trees.
- Phylogenetic networks.
- Cladistic methods.
- Computer software and demos.
- Lecture 10: lab exercises
(Powerpoint slides)
- Analysis of a paper from the literature (in phylogenetics).
- Using phylogenetic software.
- Lecture 11 (Nov 18): Applications
- Population genetics and history.
- Reconstruction of disease transmission.
- Host-parasite coevolution.
- Human origin.
- Lecture 12 (Nov 25): Simulation
- What is a simulation?
- A simple 2D landscape simulation.
- Discrete-event simulation.
- Examples: membrane simulation.
- Esoteric topics: The Game of Life and reproduction,
Boolean and catalytic networks, Origin of life theories.
- Lecture 13 (Dec 2): Biological metaphors in computing
- Combinatorial problems
- The genetic algorithm
- Neural networks
- Lecture 14 (Dec 9): Case studies
- 2-3 case studies in using bioinformatics
- Pathway databases