Instructor(s):

András Aszódi
Peter Sarkozy
Weeks
1-14
Contact hours
2x2 hours/week
Credit
4 credits

Short Description of the Course:
The Computational Biology and Medicine (CBM) program of AIT helps create a new breed of computer experts who can apply computational and analytical methods to solve complex problems in biomedical research. The CBM course offers a study program that introduces the students to computational biology, with an emphasis on major high-throughput -omics methodologies and databases. The main focus is on the development and application of mathematical modeling and computational simulation techniques for studying biological systems in health and disease.

In the first half of the course gene expression regulation, qualitative and quantitative description of complex kinetic phenomena, metabolic control, information processing in biochemical reactions, homeostasis and robustness will be discussed in detail. In the second half we focus on the data analysis tasks associated with  high-throughput genomic sequencing experiments, with special emphasis on medical decision support.

Motivation and perspective: 
The application of systems biology and computational medicine to translational research in the pharmaceutical and biotech industries is one of the most important recent developments in computational biology. Building on applications of computer science in the field of biology, bioinformatics research requires input from the diverse disciplines of mathematics and statistics, physics and chemistry, and medicine and pharmacology. AIT students interested in computational biology and medicine will be introduced to this multidisciplinary perspective and its applications in academic and industrial environments.

Prerequisites:

  • Calculus, in particular differential equations, Fourier transform methods.
  • Linear algebra: vectors, matrices, eigenproblems.
  • Basic probability theory: discrete distributions, Markov chains, Bayes’ Rule and its applications.
  • Python programming skills. Knowledge of SageMath is an advantage.
  • High-school biology.

 

Topics in detail

Part I: Biocybernetics (A. Aszódi)

Introduction to biocybernetics
Definition of systems. Comparison of natural and artificial systems. Applicability of systems theory and engineering in biology. Basic principles of regulation: positive and negative feedback. Mechanism elucidation: experimental and theoretical methodologies.

Modelling and simulation
Mechanistic models based on differential equations. Exact and approximate solutions of certain important classes of differential equations using SageMath. Stochastic simulations, Markov chains.

System kinetics
Qualitative description of dynamic processes: equilibria, steady states, periodic processes, deterministic chaos, and their relevance to biology. Quantitative description with differential equations. Predator-prey models as examples of complex kinetic phenomena.

Biochemical kinetics
Stochastic kinetics and simulations. Macroscopic (deterministic) kinetics. First and second-order reactions. Enzyme kinetics, Michaelis-Menten approximation. Kinetics of genomic regulation: the lambda phage circuitry. Oscillatory and chaotic biochemical reactions.

The theory of evolution
Fundamental concepts. Lamarckian and Darwinian evolution. Evolution of macromolecular sequences, molecular phylogeny. Epigenetic inheritance.

Computing with biomolecules
Quasi-digital approaches: Adleman’s DNA-based solution of the travelling salesman problem and related efforts. Molecular implementations of Boolean logic gates. Computing with enzymatic reaction networks. Simple learning phenomena.

Regulation in spacetime
Turing’s theory of morphogenesis. Robustness of pattern formation in living systems. Algorithmic models of plant growth, applications in computer graphics.

Artificial life
Chemoton theory: self-reproducing autocatalytic reaction networks. Cellular automata, Conway’s “The Game of Life”. In silico models of simulated evolution: the Tierra and Avida systems.

 

Part II: Data analysis of high-throughput genome sequencing (P. Sarkozy)

Introduction to molecular genetics
The role and characteristics of DNA in organisms. Mutation types, population genetics, linkage disequilibrium, transcription and translation of DNA to proteins, gene expression, epigenetic modifications, the path to personalized medicine.

Overview of DNA sequencing technologies
Sanger sequencing to single-molecule real-time DNA sequencing, in vitro diagnostics, high-throughput measurement methods, partial genetic association studies, genome-wide association studies.

High-throughput measurements
Quality control, filtering, common failure modes and platform-specific error profiles of common measurement methods, sample multiplexing and study design.

Mapping and assembly of large, complex genomes
De-novo assembly, reference mapping, the Smith-Waterman algorithm, the Needleman-Wunsch algorithm, understanding and correcting alignment bias in DNA sequencing, local and global alignment.

Interpretation of results
Identifying variants, detecting somatic mutations, heterogeneous population sequencing, construction of local phylogenetic trees for cancer evolution, resolving haplotypes, copy number variations, large-scale genomic rearrangements.

Medical decision support
Functional effect prediction, the outlooks and current state of personalized medicine, decision support through Bayesian networks, common systems modeling approaches, the utility function of false positives/negatives, integrating genetic evidence, legal, ethical and privacy issues in DNA sequencing.

 

Homeworks

Part I:
Data analysis and simulation tasks (reaction kinetics of biochemical pathways, predator-prey models, biological pattern formation models etc.) using the SageMathCloud on-line computer algebra system.

Part II:
Data analysis of an in-silico genetic association study using various open-source software packages. Simulation of a DNA sequencing experiment, analysis of the results.

 

Exam

Part I:
The students prepare an essay (10 pages minimum) on biological information processing from a topic list provided by the lecturer. Those who propose a topic on their own will get an extra half grade (i.e. B+ instead of B). Instead of writing an essay it is also possible to write small programs to simulate biological regulation phenomena.

Part II:
The students write a research report that summarizes their homeworks and provide an objective overview of their results in no less than 12000 characters in length (without spaces). An additional grade (e.g.  B to A, B+ to A+ ) will be given for the use of publicly available real measurement data instead of simulated data.

Grading Criteria:

  • Essay and research report (70%): the students must demonstrate that they have understood the principles discussed in the lectures and can apply their knowledge in a practical context. Originality and a critical approach is especially important.
  • Course activity (20%): students are required to ask questions and challenge the lecturer and each other.
  • Homeworks (10%): timely completion of the tasks with correct results is required.

Textbooks:

Instructors' bio:

András Aszódi (born 1964) studied chemistry at Eötvös Loránd University in Budapest where he graduated in 1988. He then studied molecular neurobiology at the University of Oxford, supported by a Soros scholarship. He received his Ph.D. in 1991 on the kinetic models of simple learning processes. From 1992 to 1996 he developed protein structure prediction methods at the National Institute for Medical Research in London. In 1996 he joined the Novartis Research Institute in Vienna as a computational modeller. He built up the In Silico Sciences unit that provided bioinformatics and computational chemistry tools to researchers. In 2006 he joined the Research Institute of Molecular Pathology in Vienna where he was developing data analysis tools and databases for high-throughput sequencing projects. He is currently working in the BioComp group of the CSF GmbH and also teaches a systems biology course at the University of Vienna. He has over 35 scientific publications, including a book with W.R. Taylor on protein structure prediction.

Peter Sarkozy (born 1984) received his degree in Computer Science from the Budapest University of Technology and Economics in 2009, and continued his graduate studies at the Department of Measurement and Information systems. During his graduate studies from 2009 to 2012 he participated in multiple projects together with the Department of Genetics, Cell and Immunobiology at the Semmelweis University. His areas of interest include the measurement and error characteristics of next-generation DNA sequencing technologies. He is the first person in Hungary to apply Oxford Nanopore Technologies’ single molecule real-time sequencing technology. He is currently working as a research assistant at the Department of Measurement and Information Systems at BUTE.