The dilemma of modern biology, and in particular molecular biology, is that staggering amounts of data have been and still are being accumulated and stored, but the question of what to actually do with this vast data set is still in its infancy. A great deal of emphasis in biological science is placed upon bioinformatics , essentially storing information and data in library catalogs that can be retreived. The main examples here are the mapping of the human genome and the vast stores of information on protein structure that are held in various protein data banks. The dna sequences of hundreds of organisms have been decoded and stored in this way, and massive sequencing efforts seek to identify mutations in a variety of genes in cancers. The vast volume of data that is produced requires automated/computer systems to read it and to compare sequencing results; consequently there have been considerable advances in computational biology and bioinformatics/biostatistics with major efforts in gene finding, sequence alignment, genome assembly, protein structure etc.

This is of course very important but in itself may never reach the deeper understanding of what actually makes life function, and how these processes can go awry and produce pathological states, disease and cancer. The list of ‘working parts’ is essentially complete but a deep and fundamental mathematical-physical understanding of how the parts actually function together to generate the underlying processes of life is still essentially lacking–the whole is still much greater than the sum of the parts. The explosion of information produced by the genomics revolution will be difficult to understand without the continued application of powerful mathematical methods. Knowing how to properly describe and fully utilize this data using such methods could also open up new and powerful applications in medicine, genetic engineering, drug design and cancer therapy.

This incompleteness of understanding is not meant as a criticism of biologists or the biological sciences or to suggest some kind of inadequecy, but rather a statement that traditional research in molecular biology probably cannot reach the goals it seeks purely on its own but requires continued collaboration with the mathematical sciences. And while a good deal of work has been done in applying mathematics to solve various problems, one could still argue that the field is still in its infancy compared to what might actually be possible. On the other hand, in the pure mathematical sciences a great deal of mathematical machinary and technology has now been developed and continues to be developed, and much of it is waiting for new applications. This has the ongoing potential to provide general principles and powerful new approaches to organize, describe and utilize this biological data in more comprehensible and powerful ways. In particular, geometry and topology are perhaps the fundamental keys to ultimately reaching a deeper understanding of the underlying processes of life at the level of molecular biology.

However, relatively speaking, there are comparatively few researchers working at this interface and such work does not generally have a very large audience. Even very outstanding and innovative papers at this interface can have very few or no citations after many years. The situation continues very much to improve however. A professional training in either mathematics/mathematical sciences or molecular biology/biology are very long and demanding in themselves and so it is difficult to become highly proficient in both areas. There is in a sense a gulf between the two cultures but this gulf at this time can perhaps be best conflated from the mathematical side. (I won’t talk about the often depressing realities of formulating research proposals and obtaining grants but will focus purely on the scientific issues.) The phenomenology that mathematical biology seeks to understand are exceptionally rich, diverse and complex and unlike much of physics are not often derivable from a few simple or fundamental principles. However, for applied mathematicians there are certainly no end of complex and interesting problems and applications that can be considered and tackled (even in your spare time if you like.)

In general, the study of biological processes is over many spatial and temporal scales, as is physics, from the microscopic to the mesoscopic and macroscopic. On increasing spatial scales we encounter the nucleic and amino acids, dna, biopolymers, genes; then proteins, networks, cellular and neural structures, tissues and organs, organisms and finally entire populations and ecosystems. On temporal scales, we have the time scales on which dna replicates and proteins are synthesized in ribosomes and folded, time scales for protein translation, various biological oscillations, cell growth and division, biological cycles, gestation periods, time scales for disease processes, growths or extinctions of entire populations and finally the very large (often geological) timescales over which evolution operates. Often however, we deal with larger temporal scales as we deal with larger spatial scales. On increasing spatial scales the challenges for mathematics within biology generally fall into about five areas:

1. Molecular biology: the geometry and topology of dna, its dynamics and structure, information content and processing, mutations, replication, enzymes, biochemical pathways, genes/genomes; proteins and protein structure, shapes of proteins and how geometry and topology dictates function, protein folding, protein engineering and designing specific drug molecules.

2. Cellular structure and function: how cells reproduce, function, signal, consume and excrete, internally regulate, membrane transport etc.

3. Multicellular structure: how cells work together and form more complex structures, tissues, organs etc.

4. Physiology: circulation, biomechanics, neural systems, brain functioning, immunology, healthcare and medicine, the spread of disease etc.

5. Ecosystems and populations: evolution, biodiversity, epidemics, extinctions etc.

Although work is done in area 1, most of the work in mathematical biology has tended to focus on areas 2 to 5 and the subject really began with area 5. Population dynamics was traditionally the main focus of mathematical biology and dates back to the 19th century. A famous example are the Lotka-Volterra predator-prey equations. This has been supplemented now with concepts like evolutionary game theory and mathematical epidemiology, the study of how infectious diseases and epidemics spread. Clearly, it was obvious to many biologists even back in the 19th century that mathematics could prove highly useful. Indeed, Charles Darwin lamented his lack of mathematical knowledge stating, ” I have deeply regretted that I did not proceed far enough at least to understand something of the great leading principles of mathematics: for men thus endowed seem to have an extra sense.” Darwin never really understood the issues of hereditry for example, and it was Gregor Mendel who was the one who initially made progress here using what is essentially a simple mathematical model. The need for mathematical understanding generally increases as we penetrate to deeper scales, to the level of individual organisms, then cells and tissues and then to the fundamental level of biology at the molecular level. During the 20th century however, mathematical applications have slowly progressed to these deeper levels of biology and the main challenge now in the 21st century is really within molecular biology and the vast data sets produced by the genomics revolution.

Mathematical biology has also traditionally used (and still uses) highly useful and well-established mathematical tools such as linear algebra, probability theory and statistics, calculas, numerical analysis, linear and nonlinear ODE and PDE, discrete math, computer simulations, diffusion and random walk theory, and many ideas from physics too. However, the actual arsenal of mathematical machinary available that can potentially be exploited and applied to the areas 1 to 5 listed above, is now much greater than just these traditional tools. These would include the following (and in no particular order): advanced stochastic tools and methods, stochastic aspects of dynamics and dynamical systems, random walks and random surfaces and stochastic differential geometry, stochastic differential equations, fluctuating/random geometric and topological structures; differential geometry and topology; geometric analysis and topological analysis;knot and braid theory; perturbation theory; bifurcation theory; graph theory, nonlinear systems and dynamical systems theory; nonlinear problems in elasticity; advanced fluid dynamics and the nonlinear theory of incompressible viscous and nonviscous fluids; partial functional differential equations; algebraic topology; algebraic geometry; computing and monte carlo methods, cellular automata, integro-differential equations; linear integral equations; information theory; various numerical methods; the theory of complex systems; self-organisation as well as continued use of advanced computing and simulations. The list is not exhaustive but the possibility exists for the kind of powerful collaboration and cross-fertilization between biology and mathematics in the 21st century that existed (and still exists) for mathematics and physics in the 20th century.

Mathematics already seems to permeate all of biology and at all length scales: to mention just a few examples, a colony of ants seems to function as a parallel processing computer, insects know about prime numbers and geometry, and red blood cells and other membranes seem to already know that the Poincare conjecture is true. To elaborate, cicadae emerge every 17 years and other similar insects always choose prime numbers, and bees build hexagonal honeycombs. Red cells are biconcave discoid membranes containing heamoglobin and their unique geometry seems to be forced upon them by the elastic stresses generated by the membrane. This is the optimum shape that is the most resiliant to the stresses encountered in blood flow and in passing through capillaries and it also maximises the surface area to diffusing oxygen. The geometry can be described using Cassini ovals and Jacobi elliptic functions. However, when placed in certain solutions they can deform smoothly through various geometries back to a three sphere, which is consistent of course with the (now proved) Poincare conjecture.

Topology and geometry also impose powerful constraints on biology and biological processes. As a simple example, consider a cell like the red blood cell, which is essentially a 3-dimensional domain C whose boundary is a 2-surface S so that \partial C=S. The ratio of area to volume is simply   r=A(S)/V(C) . The ‘machinary’ within the cell fills most of the volume but for this internal machinary to function optimally and properly it must absorb-excrete molecules, nutrients gases, ions etc. through the surface membrane of area A(S) . (For a red cell there is no metabolic ‘machinary’ as such however, just heamoglobin that binds to diffusiing oxygen). This limits the size and efficiency of the cell within evolution since surface area does not increase as fast as volume. The ‘isovolume’ problem for cells in general (which nature has solved) is to find the optimum value of r that optimises cellular function. For red blood cells the biconcave discoid differential geometry seems to optimise the function of the cell for example. These basic geometrical and topological constraints imposed on the cell therefore tends to ensure that a single-cell organism remains microscopic and that larger more complex biological structures and organism are necessarily multi-cellular. The area-volume ratio also limits the size of insects (thankfully) since they usually acquire oxygen via diffusion through their surface areas and so the process would become inefficient and then fail if the volume of the insect were to increase. Geometrical and topological properties of biological systems, at all scales, therefore impose very powerful constraints on the evolutionary adaptations that nature can make.

Natures’ most powerul uses of geometry and topology however, are at the scale of molecular biology when we consider the dynamics and replication of dna, protein folding and 3-dimensional protein geometries and topologies. It is likely that it is geometry and topology that underpins the processes of life at the molecular level, and this really should have been obvious from the time dna was discovered and its astonishing and beautiful helical geometric structure was first revealed. Continued future applications of powerful mathematics to these problems–especially geometric and topological analysis–could very well deepen our understanding of the processes of life and perhaps also help solve difficult and crucial problems in cancer research, disease, and the efficient design of artificial molecules, antigens, proteins and drugs with desired shapes and functions. In part II, current and possible future applications will be considered for protein folding; and in part III for the structure of dna and its supercoiled and knotted states.