-
Notifications
You must be signed in to change notification settings - Fork 17
tutorial
This section describes gro using a variety of examples that highlight its features. Studying these examples should teach you most of what you need to know to write your own gro programs. Later chapters on the precise definition of the gro language will make more sense after you write a few simple programs.
Probably the simplest gro program is the following:
include gro
set ( "dt", 0.1 ); // fast and inaccurate
program p() := {
skip();
};
ecoli ( [ x := 0, y := 0 ], program p() );
This program is found in examples/growth.gro of the distribution. You
can load it by choosing growth.gro from the Examples menu with the
right mouse button. You might also want to open the program with an
editor of some kind of editor. The program works as follows.
Line 1 includes the gro libary of functions (discussed later). For now, just put this at the top of all of your code.
On line 3 we set the size of the time step, dt, for the simulation1. If you set dt very large, you will not get accurate simulations. If you set dt too small, you will have to wait a long time for your results. Try running gro with different values for dt.
Lines 5-9 declare a simple program p(). On line 11, the program
is associated with a simulated E. coli cell initialized at the
position (0,0) in the simulated environment. By default, gro defines
10 pixels in the window to be 1 μm.
Loading this program with the gro simulator places a single simulated cell in the middle of the gro window. When you run the program, by clicking Start from the sidebar or the menu, you will see that cell start to grow and divide. The simulation will run until you stop it, or until you get to 1000 cells, at which point gro stops simulating.
Here is what is happening under the hood:
-
A cell is assumed to be 1 μm in diameter and, initially, 2 μm long2, so that it its volume is V = 1.57 fL (that's femtoliters).
-
The cell grows so that by default it doubles in size, to 3.14 fL, in 20 minutes. Using dV/dt = k V and solving for k gives the default rate of 0.034 fL / min. In the simulation, the volume is updated at each step according to V' = V + δ k V where δ is approximately dt (to model random fluctuations in the rate of growth, the actual update chooses δ randomly from a gamma distribution with mean dt).
-
The cell grows until it is approximately doubled in volume, at which point it divides. By default, the volume at which the cell divides is a random variable with mean 3.14 fL and variance 0.005 fL2.
-
When the cell divides, it produces two cells of approximately the same size, but with a bit of randomness. One cell might get 51% or the old volume, the other 49%.
-
One of the cells is chosen arbitrarily to be the parent and the other the daughter. The program running on the parent is copied and the copy is associated with the daughter cell. The values of the variables in each program are adjusted as well - exactly how is described later.
Note: The simulation does not proceed in real time, or even with a consistent real time step. Because the number of cells increases exponentially, gro has to do exponentially more work as simulated time goes by, and so the simulation gets slower and slower. If you were to save a snapshot of the simulation periodically and then play the snapshots back, only then would you at least see a consistent real time step.
If you don't like the default parameters, you can redefine them. Here is a variation on the program above that makes lots of little cells with a greater variation of division size.
program p() := {
set ( "ecoli_division_size_mean", 2.0 ); // fL
set ( "ecoli_division_size_variance", 0.02 ); // fL^2
};
Note that the set statements are only called when the first cell is initialized, which will make more sense after you see a few examples of programs that actually do something.
To determine the division time of a cell, gro uses does approximately the following. When a minimum volume is reached, it starts counting to n, with a probability of λdt of advancing the counter in each dt minutes. The minimum volume, n, and λ are chosen so that the mean and variance of the volume at division time match the desired parameters.
You might not like that model. That's okay, because you can code up your own. Here is an example of a simple division model.
program p() := {
set ( "ecoli_division_size_mean", 1000 );
rate 1 & volume > 3.14 : {
divide()
}
};
Line 3 sets the mean division volume to be so huge that it will never be reached by the simulator. Lines 5-6 describe a guarded command. The part before the colon is the guard. The part after is the command. It states: when the volume of the cell is greater than 3.14 fL then there is probability of 1 dt that the cell will divide in the next simulation iteration.
Here is a more precise description:
-
The symbol
rateis defined (in include/standard.gro) to be a function that takes an argument k and returns true if a uniformly generated random number is less than kdt, where dt is duration of a simulated timestep (0.05 minutes by default). This is the standard meaning of "rate" in stochastic chemical kinetics: an event occurs at rate k if the probability that it occurs in the next dt seconds is kdt. -
The symbol
volumeis a variable defined by gro and set to the current volume just before the programp()is called at each simulated time step. You can try to set it, but the next time gro runs throughp(), volume will have been redefined to be the (possibly new) current volume of the cell. A few other variables are made available in this way, and will be discussed later. -
The function
divide()tells gro to divide the cell and include the randomness associated with division (see the next section).
Try adding
chemostat ( true );
to the top the growth.gro example. You will see that a simple chemostat arrangement shows up, reminiscent of the flow chamber used by the Hasty group with their quorum sensing circuits.
You might also notice some jitter in the simulation. This is because we set dt to 0.1. That's fine for micro-colony growth where there are fewer constraints on movement. If you dial dt down to 0.01, then the chemostat mode will look a bit better - although it will run pretty slowly.
1 Because gro includes both stochastic low copy number reactions and continuous diffusion of signals (in later examples), the simulation is unfortunately not done using the SSA. Instead, the simulation is a vanilla Euler integration scheme with a time step of dt.
2 This is on the large side for an E. coli, but believable accordining to bionumbers.
Suppose you start with a single cell with 1000 copies of GFP in it and also assume that, for whatever reason, the GFP is neither being produced nor degraded. As the cell grows, the GFP becomes more dilute. When it divides, some number of the GFP molecules end up in the parent, the rest in the daughter. After enough divisions, there will 0 or 1 copies of GFP in each cell.
To model this in gro, we can use the following code, which can be found in examples/dilution:
include gro
set ( "dt", 0.1 );
program dilute(m) := {
gfp := m;
};
ecoli ( [ x := 0, y := 0 ], program dilute(1000) );
In line 5, the amount of GFP is initialized to m, which is an argument to the program. In line 9, the program is called with 1000 for its argument.
When this program is run, the cell starts out green. As it grows and divides, the amount of green per cell decreases exponentially until you can't see much green at all. This is because gro automatically distributes all numerical quantities in the cell between the parent and the daughter upon division.
To see how much GFP is in each cell, you can do something like the following:
program report() := {
needs gfp;
selected : { message ( 1, tostring(id) <> ": " <> tostring (gfp) ) }
};
program p(m) := dilute(m) + report() sharing gfp;
ecoli ( [ x := 0, y := 0 ], program p(1000) );
Here, we have defined a new program, report, that sends a message
to the gro window whenever the cell is selected by the mouse. By
default, selected cells are outlined in red. In particular, the
variable selected is true whenever the cell is selected. When
a selected cell divides, only the parent remains selected. The message
function takes two arguments. The first is a position index (0, 1 or 2)
for where to write the message. The second is a string. The <>
operator stands for string concatenation. The variables id
and gfp are numeric, so the tostring function is
used to convert them to strings.
In line 11, we compose the report program with the dilute program
to get a new program p(m). Note that we need to share the variable
gfp and the report() program has to declare that it needs gfp to be
initialized somewhere else. This is one way gro deals with
parallelism - by using shared variables.
Try selecting the first cell and then running the program. You should see the number of GFP molecules in the cell printed in the window.
Now let's add protein production and aim for 100 GFP molecules per cell at steady state. The cells are dividing every 20 minutes. Since the growth rate is 0.0345674 / min, producing protein at 100 times that amount should do it, assuming that the GFP doesn't degrade too fast (below we set it to 0.001). The gro code is as follows:
program make_gfp ( k1, k2, m ) := {
// initialize gfp copy number
gfp := m;
// produce gfp
rate ( k1 ) : { gfp := gfp + 1 }
// degrade gfp
rate ( k2 * gfp ) : { gfp := gfp - 1 }
};
// compute production rate needed for 100 copies of gfp per cell
alpha := - log ( 0.5 ) / 20.0; // dilution rate
k1 := 100 * alpha; // production rate
// make a new cell
ecoli ( [ x := 0, y := 0 ], program make_gfp ( k1, 0.001, 0 ) );
The average volume of a cell in this simulation is about 2.36 fL so that the concentration of gfp should approach [GFP]* = 42.4 proteins / fL. The plot below shows that this is indeed the case, on average - although there is quite a bit of noise.
By the way, to make the figure above, you can do something like
the following. First, define an output function that writes data to standard
output (i.e. the terminal) every delta seconds:
program output(delta) := {
needs gfp;
p := [ t := 0, s := 0 ];
true : {
p.t := p.t + dt,
p.s := p.s + dt
}
p.s >= delta : {
print ( id, ", ", p.t, ", ", gfp / volume, "\n" ),
p.s := 0
}
};
program p() := make_gfp ( k1, 0.001, 0 ) + output(50) sharing gfp;
ecoli ( [ x := 0, y := 0 ], program p() );
This program raises a interesting issue. To keep track of time, the
program needs a couple of variables that it continuously increments
and checks. The problem is that gro will chop numerical variables
roughly in half when the cell divides. The solution is to put the time
variables, s and t, in a record, as in line 4. Since gro doesn't chop
records in half, s and t are safe and sound. When the cell divides,
the record p and its contents are simply copied to the daughter cell.
To store the data that this program generates, you can copy it
from the sub-window at the bottom of the gro GUI and save it
somewhere. Then you will have a comma
separated value (csv)
file with the data in it that you can deal with however you want.
I loaded it up in Mathematica and wrote
a bit of code to display it. To print directly to a file you can use
fopen and fprint. To open a file and get a file
pointer, do
fp := fopen ( "path" );
near the top of your code, and outside of any gro programs. Then replace the print statement above with
fprint ( fp, id, ", ", p.t, ", ", gfp / volume, "\n" )
Building on the previous example, we now introduce a bit more reality into the gene expression model by adding messenger RNA. We can then encode the central dogma of biology: That DNA makes RNA and RNA makes protein.
By the way, when you start gro without any arguments, it opens the program "examples/gfp.gro" by default. This program is a model of the genetic construct synthetic biologists first build in their introductory classes. Here is the program.
include gro
set ( "dt", 0.01 );
alpha_r := 69.4 / 2.35; // mRNA / min / fL
beta_r := - log ( 0.5 ) / 3.69; // 1/min
alpha_p := 3.0; // protein/min/fL/RNA
beta_p := 0.01; // 1/min
program gfp() := {
mRNA := 0;
gfp := 0;
rate ( alpha_r * volume ) : { mRNA := mRNA + 1 };
rate ( beta_r * mRNA ) : { mRNA := mRNA - 1 };
rate ( alpha_p * mRNA ) : { gfp := gfp + 1 };
rate ( beta_p * gfp ) : { gfp := gfp - 1 };
};
set ( "gfp_saturation_max", 1000 );
set ( "gfp_saturation_min", 800 );
ecoli ( [ x := 0, y := 0 ], program gfp() );
Here is an explanation of the choices of rates. Note that this model is not intended to be super accurate. I am sure you can do better. But hopefully it is in the ballpark. More importantly, you should be able to see that you can use gro to encode this process at whatever level of detail you like.
-
mRNA Production: RNA is transcribed in an E. coli at about 85 nt / second1. The gfp gene with non-coding parts, is about 800 nt long, so that's 69.4 mRNA / min. I assume the numbers are for an average sized cell (2.35 fL), so we want 2.35 alpha_r = 69.4.
-
mRNA Degradation: The half life of an mRNA is reported to be 3.69 minutes in one paper2 we found. Solving for the rate gives beta_r.
-
Protein Production: According to another paper2 we found, it takes about 80 seconds to translate lacZ, which is about four times as big as GFP. So, if it takes 20 seconds for GFP, that's 3 per minute per mRNA.
-
Protein Degradation: In our experience, GFP sticks around for a long time. So we choose a small number for the degradation rate.
Notice that in this simulation we set dt=0.01 and that the simulation runs more slowly. How do you decide where to put dt? In this program, there is a rate of 69.4 mRNA / min on average that should be produced. If dt were 0.1 min, then the guard on line 15 would only get called 10 times a minute, and so could not actually produce enough mRNA each minute. With dt = 0.01 the guard gets called 100 times a minute and the rate function ensures that it comes up true about 69.4 percent of the time. An even smaller dt will do an even better job.
Another thing to notice that on lines 22-23 we set the way that the green channel of cells are rendered. By running the simulation a couple of times, we see that we get about 900 gfp molecules per fL. To see the variation better, line 22 maps 1000 gfp / fL and above to the maximum intensity in the green channel, and maps 800 gfp / fL and below to the minimum. This is akin to tweaking the gamma value on your microscope software.
1 Bremer, H., Dennis, P. P. (1996)
2 Bernstein JA, Lin PH, Cohen SN, Lin-Chao S. (2004)
3 Solomovici J, Lesnik T, Reiss C. (1997)
When you declare a signal, you specify its diffusion rate and its degradation rate. Then gro sets up a finite element model of the environment with a (hard coded for now) 160x160 grid with each element of the grid 5 pixels = 0.5 μm wide. At each time step, gro numerically integrates the rate of change of concentration of the signal molecule in each grid point. For now, the units are not well-defined -- although you can associate units with numbers you choose.
As an example, the code below shows how to make something like a concentration bandpass filter1. This example is in examples/bandpass.gro with a few more bells and whistles.
include gro
set ( "dt", 0.1 );
ahl := signal ( 1, 0.01 ); // args = diffusion rate and degradation rate
fun f a . 0.1 < a & a < 0.6;
program sensor() := {
rfp := 0.0;
f ( get_signal ( ahl ) ) : { rfp := rfp + 1 };
rate ( 0.01 * rfp ) : { rfp := rfp - 1 }
};
set ( "rfp_saturation_max", 50 );
set ( "rfp_saturation_min", 0 );
ecoli ( [ x:= 0, y:= 0 ], program sensor() );
program main() := {
// hold the concentration at (x,y)=(0,0) at 10
true : { set_signal ( ahl, 0, 0, 10 ) }
};
In line 5, we declare a new signal with diffusion rate 1 and degradation rate 0.01. The return value of the signal function is an integer that you should use when referring to the same signal later. You can declare as many signals as you want to. In line 7, we declare a function that returns true if its argument is between 0.1 and 0.6 -- this is the band of concentration we want. In the program on line 13, we produce rfp if get_signal(ahl) returns a concentration in the band that we defined.
This program also introduces another feature of gro, which is the main program. If you define a program called main, gro will call it once every time it updates. This can be very useful for controlling the simulation (starting, stopping, saving or analyzing data, etc). In this case, we use the main program to hold the value of the signal at 10 (concentration units). If we call set_signal just once, the signal concentration there would quickly decrease due to diffusion and degradation.
We define an oscillator in the leader program. It basically consists of a timer that increments each time through the program. When it reaches a pre-set limit, it emits a bunch of signal, and then resets the counter. Next, we define the follower to have two modes. In mode 0, the cells are waiting for the signal level to go above some threshold. When it does, the cells emit a signal, and go into mode 1, where they sit in a refractory mode for a while. If they didn't have this second mode, they would respond to their own pulse, which would make a poor wave. The program final looks like this:
include gro
set ( "dt", 0.075 );
ahl := signal ( 1, 1 );
program leader() := {
p := [ t := 2.4 ];
set ( "ecoli_growth_rate", 0.00 );
true : { p.t := p.t + dt }
p.t > 10 : {
emit_signal ( ahl, 100 ),
p.t := 0
}
};
program follower() := {
p := [ mode := 0, t := 0 ];
set ( "ecoli_growth_rate", 0.04 );
p.mode = 0 & get_signal ( ahl ) > 0.01 : {
emit_signal ( ahl, 100 ),
p.mode := 1,
p.t := 0
}
p.mode = 1 : { p.t := p.t + dt }
p.mode = 1 & p.t > 9 : { p.mode := 0 }
};
ecoli ( [ x:= 0, y:= 0 ], program leader() );
ecoli ( [ x:= 0, y:= 10 ], program follower() );
A few other details: We set the growth rate of the leader to 0.0 so there is not more than one cell starting waves. Also note the we called the ecoli function twice. In fact, you can call it as many times as you want, with the same program or different programs each time.
1 Basu et al. made something like this, but at the macro level with lawns of bacteria: Basu, 2005.
To encode evolution in gro we need to be able to model mutation. To do this, we use the daughter variable. This variable is true in the daughter cell only in the time step immediately after the cell has divided. At that point, we can take a parameter from the parent cell and "mutate" it to pass on to the daughter cell. This same parameter will then get mutated again and again as the cells divide, allowing the population to explore the space of possible values.
Here is a gro program that demonstrates this idea. The parameter that is mutated is the production rate k of the enzyme.
include gro
chemostat ( true );
set ( "dt", 0.075 );
nutrient := 1;
kinit := 0.25;
dk := 0.05;
fun cost e n . 0.2 * e * n / ( 50.0 + n );
fun benefit e n . 0.002 * e / ( 1.0 - 0.01 * e );
fun fitness e n . cost e n - benefit e n;
program evolver() := {
p := [ k := kinit ];
E := 25;
t := 0;
rate ( p.k * volume ) : { E := E + 1 }
rate ( 0.05 * E ) : { E := E - 1 }
true : { set ( "ecoli_growth_rate", 0.001 + fitness E nutrient ), t := t + dt }
daughter : {
p.k := p.k + dk * ( rand ( 1000 ) - 500 ) / 1000.0
}
};
ecoli ( [ x := 0, y := 0 ], program evolver() );
Lines 9-11 define the fitness function, using the same functional form as in the Alon paper1, but with pretty much arbitrary values chosen so that the resulting growth rate falls within a reasonable range for gro. (Of course, you can figure out the optimal value using some calculus, but hey: why not do it 1000 times more slowly with a graphical simulation tool?).
Line 22 is where the growth rate is adjusted based on the amount of enzyme and nutrient. Thus, the growth rate fluctuates considerably, which may or may not be realistic.
Besides the daughter variable, the only new featured introduced in this example is the rand function, which returns a random integer between 0 and its argument minus one.
1Dekel, E. and Alon, U., 2005.
Suppose your cells are programmed to respond to a small molecule inducer, such as IPTG or Arabinose. In many flow chambers you can control the concentration of inducers by switching media sources. You can simulate this in grow using global variables and a main program as follows.
include gro
iptg := 0;
program p() := {
gfp := 0;
rate ( 1 + 10 * iptg / ( 1 + iptg ) ) : { gfp := gfp + 1 };
rate ( 0.001 * gfp ) : { gfp := gfp - 1 };
};
program main() := {
t := 0;
true : { t := t + dt };
t > 50 : {
t := 0,
iptg := 1.0 - iptg,
clear_messages(1),
message ( 1, "IPTG at " <> tostring(iptg) <> "uM/L" )
};
};
ecoli ( [], program p() );
Line 3 initializes a global variable, iptg. This variable is accessible to programs, changable by programs, etc. On line 9, the rate of GFP production is a function of the global variable iptg.
To change the levels of iptg, lines 14 to 27 define a main() program. If you define a main() program, gro will call the main program once every cycle, after updating the programs associated with any cells. The main() program is not supposed to be associated with any cell, but rather should be used to control the simulation in general. To communicate between main() and other programs, the easiest thing to do is to use global variables, as is done above.
You can also use a main function to start and stop simulations. For example, suppose you want to run 50 simulations of growth, each 150 simulated minutes long, and count the number of cells at the end of the each simulation (which will be random). You could do something like the following:
include gro
num_cells := 1;
program p() := {
daughter : { num_cells := num_cells + 1 }
};
program main() := {
mode := 0;
t := 0;
n := 0;
mode = 0 & n < 50 : {
ecoli ( [], program p() ),
num_cells := 1,
mode := 1
}
mode = 0 & n = 50 : {
exit()
}
mode = 1 & t > 150 : {
print ( "simulation ended with ", num_cells, " cells\n" ),
reset(),
t := 0,
n := n + 1,
mode := 0
}
true : { t := t + dt }
}
This program shows the use of the reset() function, which erases all cells and resets time. Alse note the use of exit(), which completely exits gro. And finally, note how the global variable num_cells is shared among all the cells running p() and also with main. This is an easy way for you to gather information from the programs as they are running.