In search of standards : An official language for Biology

Simulating biological events on a personal computer

     Engineers all over the world are communicating with each other with standardized diagrams incorporating qualitative and quantitative information of their systems. They are designing their research computationally and they are evaluating it through specialized software.

    The biological community could benefit tremendously by an official language, an official representation standard (a lingua franca) incorporating the elusive quantitative aspect, allowing scientists to evaluate in an objective, generally accepted manner, their systems computationally, to represent experimental data, explore system dynamics, discover hidden correlations and patterns and use the power of computer science to predict outcomes. The scientific method consisting of formulating a hypothesis and addressing it experimentally, can benefit significantly by limitation of infinite alternative hypotheses and focusing on computationally promising outcomes to be tested experimentally. Intellectual and manual labor as well as material resources can be saved, decreasing significantly the human and economical cost. Biological behaviors falling into the categories of characterized patterns such as positive or negative feedbacks, switches or oscillations can be addressed adequately through a computational approach. There appear also to exist specific behaviors in specialized knowledge domains that cannot be approached experimentally but only though computation. With the experiment being the proof of research, significant aid can be provided to direct research to focus correctly and also to interpret experimental outcomes, to elaborate on them and to extend them. Parallely, biology scientists engaged in researching different biological systems can communicate with each other in an objective standardized way.

     The official language must be able to incorporate the much needed quantitative aspect: how many molecules are there, how quickly the events are occurring, what is the strength of the interactions between molecules e.g. binding each other. However, its mathematical structure should be extremely basic and attainable by high school mathematics. For example as one mentions that one drives a car at 50 km/h or m.p.h. one would just need a velocity (a kinetic constant) describing how quickly a biological process occurs and an expression describing if this velocity changes; specialists could determine this expression in accordance with experimental data. One shouldn’t have to actually write in it this language but could just draw in specialized software providing a graphical interface to it; one could represent graphically in a white canvas, like in any common designing software, proteins, genes, RNAs, simple molecules and then enter the quantity and time information concerning the represented events.

     At this moment many different computational languages exist. The biological problems they can approach are both overlapping and divergent. It is not necessary to have only one language; on the contrary, it is highly beneficial to have multiple and there exist specialized tools for conversion of one language to the other. There is however the need for a standard that allows biological scientists to communicate their everyday practice. The Systems Biology Markup Language (SBML), constitutes a much promising candidate. 

     There is also the need for a standardized way for graphical representation of biological systems. The Systems Biology Graphical Notation (SBGN) provides satisfactory solutions.

     A popular software providing a graphical interface to SBML is Cell Designer. By a self training of a couple or more hours with a basic tutorial one can familiarize one’s self with the necessary features and perform their first computational simulations of biological events.



A Language for Biology (for instance the SBML) and a standard graphical notation combined in a simulation software, Cell Designer, providing a graphical interface to the language.

The notion of a simulation through an example related to a fundamental biological mechanism implicated in many different diseases.

Let’s acquaint ourselves with the notion of a simulation, defined as the representation of cellular events with the progression of time. Let’s suppose that there exists a microscope that allows us to make observations at the molecular level. We set our timer to zero, we focus on a specific biological process and we follow its evolution with time.

Let’s choose a very common biological process that occurs continuously in all our cells: a protein has been damaged and has become toxic for the cell. The inspection machinery of the cell has identified this event and will add a sign or tag on the protein, a small protein with the characteristic name “ubiquitin”. This tag will function as a signal that will direct the protein to the clearance/degradation machinery of the cell so that the toxic protein can be removed.

The mechanism of generation of toxic proteins and their degradation is extremely important and is central to the pathogenetic mechanism of many different disease conditions, as diverse as neurodegenerative diseases (Parkinson’s etc), liver disease, myopathies and cardiovascular disease.

We want to represent the process of the addition of a ubiquitin molecule termed “ubiquitation” in Cell Designer. This process is catalysed by an enzyme called ubiquitin ligase and results in a ubiquitinated molecule. We can similarly represent any other modification such as a phosphorylation or generally any other enzymatic reaction.

In Cell Designer, the notation for a protein is a rectangle with curved edges. We shall represent the damaged unmodified protein as one species and the damaged modified by ubiquitination protein as a different species. The notation for the transition of one species into another is an arrow, therefore we will draw an arrow between the two species. The notation for catalysis is a line with a circle at the end; we will represent the ubiquitin ligase and will draw a catalysis on the transition arrow.



The experimentalists tell us that the velocity of this reaction is represented by a kinetic constant k which equals 0.1 molecules per minute which means that we have to wait for ten minutes for a molecule to be produced (modified). They also tell us that when they refer to this velocity they mean that there is one molecule of enzyme and one molecule of damaged protein. So, if we have twice as much enzyme (on the same abundant pool of substrate protein), twice as many molecules will be generated and if we have twice as many proteins (with abundant enzyme) we will get double amount of tagged proteins. If both quantities are doubled the product will be quadrupled. Therefore we understand that if we want to express the change of this velocity over time we have to multiply by the amount of damaged protein and the amount of enzyme that we have.

However, until this moment there was not an evident scientific purpose for scientists to measure the absolute abundance of all molecules in their systems in grams, or moles or particles. They were only interested in the change in the relative abundance of components and so they were using arbitrary units. In the absence of data we will assign a value that seems to be logical e.g. 50 as the initial value of the unmodified protein, 0 for the modified and 1 for the enzyme. We enter these values in the tab "species".

Now we shall right click on the arrow and we will enter and define the parameter characterizing the velocity and then we will formulate the kinetic law by multiplying the parameter with the amounts of the damaged protein and the enzyme.




Having done that, we shall click on the simulation button after having specified a duration for our simulation.

What does the result say to us? At each time point our microscope could see how many molecules have been tagged and how many damaged proteins there are and has drawn a curve that links these amounts between them and with time. The curve will characterize the dynamics of our system: is the velocity the same or it changes over time? How much protein will be directed to degradation? How much will be the toxicity load burdening the cell and potentially leading to its malfunction? How much toxicity can the cell tolerate and how much can lead to irreversible damage? 




Imagine now a network of hundreds of species with hundreds of interconnecting reactions. Also imagine the possibility of acting upon the network, with interventions that are simulating a disease condition and with interventions that are simulating therapeutic approaches. The implications are tremendously beneficial.

However, this example is very simple and in many cases the specialists will tell us that the behavior of our system based on the experimental data that they were given can be described by a very complicated law.

That is quite all right, all we have to do is to know how to copy such a formula in the appropriate box. There lies the power of interdisciplinarity; the experimentalists through intellectual and manual labour are generating data and the theoreticians through their own expertise are generating expressions or laws to formulate the behavior of biological systems in a way that can be used to view and examine a biological system in silico and also influence it from the outside to test its behavior.

A very important aspect is also that specific mechanisms are common in different systems. For example, the mechanism mentioned above appears to share a significant amount of similarities in different diseases as divergent as neurodegenerative (Parkinson’s etc), liver disease, myopathy and cardiovascular disease (protein aggregation diseases). Progress made in a specific discipline can provide significant insight in another. Researchers and health professionals can store years of research from their system in a compact model-briefcase and evaluate and compare their systems computationally by clicking a button on their computer keyboard. Communication between different disciplines is objective, simple and effective and potential implications are made clear and encourage interdisciplinary interaction. A neurologist for example would have never thought that his research could be enriched by discussing with a cardiologist on the mechanism of toxicity of protein aggregation. In a larger perspective, interaction between experimentalists and theoreticians, the bench people and the modelers is greatly favored and appreciated. The experimentalists are generating a biological story and they are explaining it to the theoreticians that are performing or assisting the computational representation. Here lies the great challenge of interdisciplinarity; experimentalists with basal mathematical knowledge, having for a example a general idea of what a differential equation represents have to explain to theoreticians who have a general idea of fundamental cellular components and biological processes the practice of experimentation and discuss strategies of computational representation. The art of explaining is a tremendous asset in this case. Both parties should follow pedagogical/training principles. Approaching, evaluating the interlocutor/audience background and most importantly adapting to the knowledge of the other party. Not consider anything for granted and encourage the interest and the attention of the other by referencing appropriate notions. Parallely, individuals trained in both disciplines can offer significant help in this case to contribute in making the interdisciplinary story a successful one.





Link to site

A biological and computational journal club 

A journal club is a common function in the biological community consisting in the presentation and the critical analysis of a selected publication. 

Let's acquaint ourselves with the computational representation of the biological mechanism of protein aggregation that is implicated in many different diseases as diverse as neurodegenerative diseases (e.g. Parkinson's), liver disease, myopathies and cardiovascular disease. We will be describing the following study by Carole Proctor, Maria Tsirigotis and Douglas A. Gray. 

Proctor, C. J., et al. (2007). "An in silico model of the ubiquitin-proteasome system that incorporates normal homeostasis and age-related decline." BMC Syst Biol 1: 17. 

Why was this study chosen? Because it addresses a pathogenesis mechanism that is common in different systems, one which illustrates the point of how insight gained from one system can be applied in another. This fact demonstrates the immense potential of system biology approaches. 

Additionally, there are many ways or “styles” of computational representation. Many of them involve mathematical approaches that cannot be followed by an audience of non-specialists, who have received basic mathematical training. However, this specific “style” can be followed by the whole biological community either directly or after some clarifications on the rational of this approach are given, or after some personal training or “getting used” to this representation. 

In the description that follows, a commentary as well as passages from the publications are given. 

We will suppose that a protein can be damaged because of a common mechanism which is central in many different disease conditions, oxidative stress. We live in an atmosphere of a high oxygen concentration which provides a necessary component for energy production in our cells combustion engine but which is also damaging for their system; fortunately defence mechanisms have been developed to counteract this effect. Our cells are taking up oxygen and are using it in their combustion machinery, a small cellular organelle called the « mitochondrion », to generate energy through a biochemical process which has been interestingly called «respiration». However, during this process certain « leaks » can occur resulting in the generation of reactive species that can cause oxidation. These have been collectively called « reactive oxygen species » abbreviated « ROS ». These species can damage the structure of proteins, and as structure is tightly linked to function, they can compromise protein function. It is essential to note that for a protein to be functional it has to have a specific folding pattern in the three-dimensional space (native protein). In case a harmful stimulus acts on the protein it can lead to the perturbation of this folding pattern. In these cases the protein becomes misfolded and its functionality is compromised. 

When a protein is misfolded there are cellular components that can assist its refolding; these are called protein chaperons as they can accompagny a protein and intervene to protect it. However, when the damage is irreversible, the protein has to be degraded. In this case the damage inspection machinery of the cell will identify the damaged protein and will add a «tag » or signal, a small protein with the characteristic name « ubiquitin » that will direct the protein to the degradation machinery of the cell, the proteasome. At least four ubiquitin molecules have to be added for the protein to gain affinity with the proteasome. 

Let's start developing a biological scenario. After each step of our biological description we will add a velocity, a rate for the described process and a kinetic expression related to the quantities of the species that participate in the process. 

Let's assume that native proteins appear in our system (are produced) at a certain velocity, or rate  k1 which depends on the cellular production machinery. 

Let's assume that at any time point, a native protein can become misfolded with the rate of this reaction depending on the level of reactive oxygen species (ROS) within the cell (higher levels of ROS lead to an increase in the rate of misfolding). Therefore, we will write the following expression: 

Misfolding: k2 [Nat] [ROS] 

The defence mechanisms of the cells (e.g. chaperone systems) can intervene to refold the protein, therefore we will write the following expression: 

Refolding: k3 [MisP] 

As mentioned previously, the inspection machinery of the cell will identify the protein and will tag it with a ubiquitin molecule. However for the ubiquitin molecule to “stick” on the protein, it has to be previously activated. 

Ubiquitin is activated through ATP hydrolysis mediated by an E1 enzyme. The specialist determined that this process is described by a little more complicated law than usual 

E1/Ub binding: k62 [E1] [Ub] [ATP]/(5000+[ATP]) 

The activated ubiquitin linked to E1 (E1_Ub) is then transferred to an E2 (ubiquitin conjugate) enzyme. 

E2/Ub binding: k63 [E1_Ub][E2] 

E2 bound to ubiquitin then forms a complex with an E3 (ubiquitin ligase) enzyme. E3 binds to the protein which is to be degraded. 

The activated ubiquitin is then transferred to the substrate with the release of E2 and E3. At this stage the protein is mono-ubiquitinated. 

Monoubiquitination k64[E2_Ub][E3_MisP] 

More ubiquitin is then attached to form a chain by the further action of E1 and E2 enzymes. A ubiquitin chain of at least four ubiquitin molecules has physical affinity for the proteasome, the machinery responsible for the degradation of the protein and delivers the substrate for degradation. 

Polyubiquitination1 k 65 [E2_Ub][MisP_Ub] 

A large number of de-ubiquitinating enzymes (DUBs) are also found within all eukaryotic cells. DUBs may edit polyubiquitin chains on substrates and thereby affect proteasome binding.

De-ubiquitination1 k 66 [DUB][MisP_Ub8]

Once a damaged protein acquires 4 ubiquitine molecules it can bind to the proteasome.

Proteasome binding1 k 67 [MisP_Ub4][Proteasome]

Even when the misfolded ubiquitinated protein is bound, DUBs can act upon it and modify the behavior.

De-ubiquitinationBoundMisP1 k 68 [DUB][MisP_Ub4_Proteasome]

The proteasome will cleave the misfolded protein into small peptides (that will be reduced to aminoacids by other enzymes) in an ATP-dependent manner. As for ubiquitin, it will be released and recycled.

ProteasomeActivity1 k 69 [MisP_Ub4_Proteasome][ATP]/(5000+[ATP])

Until now we have described the events that take place continuously in the cell. Let's now describe the events that can take place when a malfunction occurs.



Protein aggregation 

If a misfolded protein is not removed immediately by refolding or degradation, then there is a chance that it will interact with another misfolded protein to form a small aggregate 

Aggregation1 k 71 [MisP][MisP-1]/2.0, 

or it may interact with an existing aggregate to form a larger aggregate. 

Aggregation2 k 71 [MisP][AggP] 

An increase in protein aggregation has been shown to inhibit the proteasome leading inevitably to even more aggregation. If the proteasome is inhibited, but the inspection machinery is working normally and tagging the damaged proteins, then damaged proteins can be ubiquitinated but they will not be degraded. Instead they will accumulate and form aggregates. 

Aggregation3 k 72 [MisP_Ub][MisP_Ub-1]/2.0 

An aggregate may be sequestered so as not to interfere with the cellular machinery and to be kept away from causing harm. 

Sequestering Of Aggregates k 73 [AggP] 

Alternatively, it may bind to the proteasome. 

Proteasome Inhibition k 74 [AggP] 

We have now mentioned all steps of this biological story and added the kinetic expressions. 

The specialist has searched the literature for the rates of the above reactions or for data that can be useful for inferring these rates. Time-course data are extremely important for computational simulation. The specialist has also searched for the quantities of the molecules that are implicated in the above steps. 

A model has been generated to represent what is happening normally in a healthy cell as well as what happens if a harmful intervention such as oxidative stress acts upon the system leading to increased protein misfolding. 

The model will examine whether an increase in misfolding (for example by an increase in levels of ROS) leads to an increase in aggregation and inhibition of the proteasome which in turn leads to an even greater level of aggregated protein. 

In the following figure, we can see a healthy cell functioning upon normal conditions. Proteins are produced generally at a stable rate (pink). There is no stress and therefore only a small amount of misfolding occurs due to minimal events (black line). The inspection machinery functions normally and taggs the majority of the misfolded proteins (E3 bound to misfolded proteins).


Let' s now simulate a disease that decreases the activity of the proteasome, the degradation machinery of the cell. We will set the k or rate representing the activity of the proteasome to 0. Similarly, we can simulate a stress insult, for example ROS, by entering a high ROS concentration in the second described process. The stress intervention will lead through this process to increased misfolding.

What is the computational behavior of the cell upon proteasome inhibition? It is shown in the following figure:


As previously, proteins are produced at a specific rate (pink line). However, as misfolded proteins cannot be degraded they will accumulate and their concentration will increase as demonstrated by the black line. The inspection machinery will be tagging the misfolded proteins with ubiquitin until it reaches maximum capacity, that is until the red line becomes horizontal. Until, then the velocity of the increase of misfolded proteins was somehow slower but after this point it increases steadily.

Upon the increased misfolding protein load how will the cell behave? How will it handle the misfolded proteins? Will the misfolded proteins form aggregates? If yes, what will be the fate of the aggregates? Will the cell direct them to large aggregates and store them in that form so that they are not circulating in the cell and causing harmful effects? Will this procedure mimic the generation of aggregates that are generated in disease? Or will it be trying to direct them to the proteasome although this is not functional? The answer is provided in the next figure:


We can visualise the generation of aggregates by the blue line and we can compare the process of directing them to large aggregates (cyan line) as well as the process of directing them to the non-functional proteasome. 

The specialist has submitted the computational model to the publication as well as to the public repository. We can download it from either source and perform our own in silico experiments. It is of note that this repository is like a computational wikipedia as it maintened by a team of volunteers that currate the submitted models.