ECS 124: Theory & Practice of Bioinformatics

Subject
ECS 124
Title
Theory & Practice of Bioinformatics
Status
Active
Units
4.0
Effective Term
2019 Winter Quarter
Learning Activities
Lecture - 3.0 hours
Laboratory - 1.0 hours
Description
Fundamental biological, mathematical and algorithmic models underlying bioinformatics and systems biology; sequence analysis, database search, genome annotation, clustering and classification, functional gene networks, regulatory network inference, phylogenetic trees, applications of common bioinformatics tools in molecular biology and genetics. GE Prior to Fall 2011: SciEng. GE: SE.
Prerequisites
(ECS 010 or ECS 032A or ECS 030 or ECS 036A or ENG 006); (STA 012 or STA 013 or STA 013Y or STA 032 or STA 100 or STA 131A or MAT 135A or BIM 105); (BIS 002A or MCB 010)
Enrollment Restrictions
Pass One open to Computer Science, Computer Science Engineering, and Biotechnology majors only.

Summary of Course Content

I. Initial examples of the power of bioinformatics in modern biology

  • The importance of sequence and structure comparison and of database search
  • The use of sequence analysis in laboratory protocols
  • The use of phylogenetics in evolution and non-evolutionary areas of biology

II. Sequence analysis

  • Probabilistic and biological models underlying sequence alignment
  • Computational efficiency and the need for compromises in the models
  • The general technique of dynamic programming
  • Pairwise sequence alignment - algorithms for global, local alignment and variations
  • Algorithms for multiple sequence alignment and the identification/use of motifs
  • Database search, FASTA, BLAST, PSI-BLAST, scoring matrices, statistical significance and its significance
  • Multiple sequence alignment
  • Genome assembly and high-throughput transcriptional profiling

III. Systems Biology

  • Clustering (K-means, hierarchical clustering)
  • Classification (naive Bayes, Support Vector Machines)
  • Machine Learning in Biology
    • Overfitting, bias-variance trade-off, curse of dimensionality
    • Validation methods
  • Biological Networks
    • Introduction to networks biology
    • Functional gene networks
    • Inference of gene regulatory networks

IV. Phylogenetic algorithms

  • Probabilistic and ideal-data models underlying phylogenetic algorithms
  • Distance-based methods
  • Character/parsimony-based methods
  • Maximum-likelihood methods
  • Evolutionary and non-evolutionary uses for phylogenetics
  • Multiscale modeling and simulation of evolutionary systems

Illustrative Reading

  • R. Durbin et al., Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids, Cambridge Press, 1998.
  • A. Baxevanis and B. Ouellete, Bioinformatics: A Practical Guide to the Analysis of Genes and Proteins, Wiley-Interscience, 1998.
  • M. Bishop and C. Rawlings, DNA and Protein Sequence Analysis: A Practical Approach, IRL Press, 1997.
  • D. Gusfield, Algorithms on Strings, Trees and Sequences: Computer Science and Computational Biology, Cambridge Press, 1997.
  • N. Jones and P. Pevzner, An Introduction to Bioinformatics Algorithms, MIT Press, 2004

Potential Course Overlap

The laboratory section of ECS 124 will overlap to some amount with Animal Genetics 212 (taught by J. Medrano) a graduate course offered every other year. The overlap in the laboratory section is only partial, as ANG 212 looks at more sequence analysis packages than in ECS 124; the lab portions of ECS 124 also look at packages for cluster enrichment, biological networks and phylogenetic analysis, which are not covered in ANG 212. They also involve computer programming in Perl or Java, while no programming is involved in ANG 212. The theoretical parts of ECS 124 (the lecture part of the course) will have no essential intersection with ANG 212, being either on different material entirely, or being a much more mathematical and algorithmic treatment of the material, i.e., fully explaining and developing the logic of the techniques, rather than focusing on learning to use these techniques in the form of packaged computer programs.

A good analogy to explain the partial intersection is that ANG 212 is a course on “flying an airplane” while ECS 124 will be on the “physics of flight” with exercises in flying to make the ideas concrete. ECS 124 intersects with ECS 221 (taught by I. Tagkopoulos), a graduate seminar course which focuses on Machine Learning methods in Systems and Synthetic Biology. The overlap is mainly on the areas of clustering, classifications and network analysis. ECS 221 assumes advanced computer science knowledge and treats these topics with more mathematical rigor than ECS 124, which includes lectures related to fundamental computer science methods (e.g. suffix trees, dynamic programming, etc.). In contrast to ECS 124, ECS 221 focuses on the presentation of research papers in the field, while it does not requires any laboratories/homeworks.

ECS 124 intersects EVE 298 (taught by M. Sanderson and S. Nadler) in the subarea of phylogenetics. However, the emphasis of the two courses in that overlapping subarea is again different. EVE 298 is oriented towards teaching biology graduate students to use computerized phylogenetic tools effectively in their biological (phylogenetic) research, while ECS 124 will have a more algorithmic and mathematical orientation.

ECS 124 intersects GGG 298D (taught by C. Warden and M. Syvanen) in the subarea of sequence analysis. Again the emphasis is quite different. GGG 298D is oriented towards teaching biology graduate students to use computerized sequence analysis tools effectively in their (biological sequence analysis) research, while ECS 124 will have a more algorithmic and mathematical orientation. There are no undergraduate courses on campus that have any substantial intersection with ECS 124.

Final Exam

Yes Final Exam

Justification for No Final Exam

final exam

Course Category