I am currently a member of the Royal Society Te Apārangi’s expert panel for gene editing technologies. This panel was convened to consider the implications of genome editing technologies for NZ as well as developing accessible materials for stakeholders and the public based on those deliberations.

The composition of the panel is multidisciplinary and moderately diverse, although mainly scientists. I represent a perspective that comes from knowledge of evolution, population genetics, computational biology and computer science.

Being a computer scientist (among other things) I have a natural tendency to compare and contrast genetic engineering with software engineering. In a very real way genome editing will allow humanity to more effectively reprogram biological life for our own purposes¹.

Radical parallelism, feedback loops and self-modifying source code

The analogy between a living organism’s genome and the source code of a computer program is not a straightforward one. For a start, the order in which the genomic “source code” is executed is far less obvious than code written in most commonly used computer programming languages. During the development of a human body from a single cell the execution of the genomic code is occuring in a radically parallel way. In every cell, including the first one, thousands of genes are being simultaneously executed (i.e. transcribed and translated), and the results of those parallel executions include feedback to the “source code” itself via gene regulatory networks that regulate subsequent gene expression levels, so that certain parts of the code (genome) are subsequently executed (expressed) more or less often.

These feedbacks can include the activation or inactivation of entire cascades of “subroutines” (i.e. metabolic pathways, signalling pathways and further gene regulation pathways). In certain species, tissues and cell-types this self-modification can even involve direct and permanent physical edits to the genomic “source code” itself (e.g. CRISPR in bacteria, self-modification of genome sequence in immune cells such as B lymphocytes in the adaptive immune system, and telomere shortening).

Parallel computing, and self-modifying source code are concepts that computer scientists are familiar with. But they are also programming paradigms that are often regarded as cryptic and difficult to understand as well as having known complexities to their analysis such as race conditions in parallel computing. Typically, software designed by humans employs a tiny fraction of the level of parallel computing, message passing (signalling) and self-modifying code that multicellular life does.

Replication and self-assembly of a physical cellular automaton

Part of the standard genomic program shared by all biological life involves the complete replication of the genomic source code and bifurcation of the genome’s cell into two daughter cells, each with its own copy of the source code. In multicellular life, like humans, this cell division happens over an over during the development of a single organism. Initially the process is very symmetrical, resulting in an exponentially growing ball of nearly-identical cells. But eventually the symmetry is broken through small perturbations and subtle differences in cellular context and location, so that individual cells begin to diverge in their fates leading to differentiating into different cell types, tissues and organs.

All the while, in every one of the growing multitude of cells, the genomic code is being executed. It is the slight perturbations in the biochemical state of individual cells which lead to bifurcations in cellular behaviour (i.e. which parts of the genomic code are being executed), so that some cells will start executing a very different set of genes than other cells. It is these differences that drive cellular differentiation, and tissue, organ and body development.

As this cell mass grows it takes on a form that is somewhat recognisable to computer scientists: that of a 3-dimensional cellular automaton. To a computer scientist a cellular automaton is an abstract model made up of cells arranged on a lattice. Each cell executes a standard set of rules (e.g. in this analogy the genomic source code), but each of which also has its own specific cellular state. A cell’s state, along with the state of its immediate neighbours, determines how its state will change in the next moment. All cells in the automaton run their own source code simultaneously and in parallel.

Cellular automata are celebrated in computer science for being able to generate arbitrarily complex emergent behaviour at a global level of the lattice, even when the individual cells in the lattice are following very simple local rules. In particular it has been shown that many cellular automata are formally capable of any computation that a standard computer is capable of - it is just a matter of setting up the correct initial lattice configuration (i.e. some cellular automata, including famous examples such as Conway’s game of life are equivalent to Turing machines).

Same genes, different contexts, different behaviours

Whereas in the early parts of development a multicellular organism’s cells behave very similarly to each other, and have essentially the same biochemical state as each other, over time as this physical “cellular automaton”² self-assembles, the same source code (i.e. the same genomic rules) will start behaving differently in different cells due to different inputs. So the genomic source code will, in some cells, program that cell to die, whereas the same code will, in other cells, program them to proliferate, depending on what precise cellular states the code is executed on. Those cellular states are themselves contingent on the unique (and partially stochastic) details of that individual cell’s developmental history and the biochemical signals it has received and is receiving from its neighbouring cells.

The same gene will do one thing in one cell type, and another thing in another. The same gene will do one thing during development and another thing at maturity. The same gene will do one thing with one set of neighbour cells and another thing with different neighbours.

Whereas computer scientists recognize that choosing an appropriate initial configuration of a cellular automaton can in some cases be shown to be formally equivalent to describing a computer program and its inputs, it is nonetheless a much more challenging and non-intuitive way to program computers. So while we could, we don’t generally use cellular automata to program computers :)

Programming life is done by reverse genetics - a trial-and-error approach

Although not usually put in these terms, measuring and understanding biology’s parallelism, self-modifying feedback loops, contingency and locally-driven emergence of global behavioural are central themes in the biological sciences. This is especially true of molecular biology, cell biology and developmental biology. But despite amazing progress in measuring and understanding the cellular and molecular underpinnings of biological life, the consequences of perturbations to the source code of living systems can be extraordinarily difficult to predict. Given the genome sequence and the biochemical context of an organism’s first cell (the zygote), it is a staggeringly difficult task to predict the end result in the organism of most changes to its genomic source code. So in the best tradition of science, geneticists and genome biologists have adopted a trial-and-error approach called reverse genetics, which employs the tools of genetic engineering.

In reverse genetics the standard approach to understanding the “function” of a particular gene in the genome is to just go ahead and make a small change to it and see how the resulting organism differs from the unmodified “wild type”. Unsurprisingly, there are considerable ethical concerns about doing this to a human zygote that will develop to adulthood. On the other hand it is already done rather regularly to so-called model organisms. To the extent that these model organisms behave similarly to the species of interest, they can be used to focus more expensive, careful and in depth research on a smaller set of candidate regions of the genome in the species of specific interest.

Genetic engineering is still far removed from software engineering

This approach to genetic engineering is quite different from how software engineers generally approach computer programming. Generally speaking, a software engineer has a very precise understanding of what will be achieved by modifying the source code in a particular way, before they do it. Software is usually designed in a way to make it easy to understand and modify, using software design principles like modularity, data structures, abstraction and documentation to make manipulation of the code easy and predictable.

As a result, in many instances, many characteristics and bounds on the resulting change in the program’s behaviour can be precisely analyzed and predicted before the code is executed. However it has also been shown in computability theory, that for sufficiently complex computer programs, the only sure way to know what the program will do when given a particular input is to run it. This approach of running the code to find out what it does is the approach that reverse genetics takes to understand our genomic source code.

Are genome editing tools a game changer for programming biological life?

From a purely technical perspective, the main thing that new genome editing tools will provide scientists and biotechnologists is a much faster, cheaper and more precise way to make specific changes to genomes in living organisms, so that they can pursue trial-and-error experiments at a far greater pace and scale. This may sound trivial, but history shows us that the ability to do something routinely, cheaply and at large scale can be truly transformative.

Over time the routine application of genome editing tools will allow us to better understand precisely how the genome actually encodes functions in specific species. This will build on decades of existing understanding and require decades more research by thousands of scientists all around the world to try to reverse engineer the subtlest details of how evolved systems like the human genome can be effectively manipulated (beyond the obvious applications, such as correcting single-gene Mendelian diseases in carriers of the mutant genotype).

In the more immediate future, genome editing won’t make us any more or less naïve about what changes to the genomic source code are required to reprogram life to obtain the specific outcomes we are after in particular species and individuals.