Why your DNA is not like a blueprint

By Rich Feldenberg:

When future historians may look back on the 20th century they may critical of humanities violent tendencies, and rightly so. We drove ourselves to the point of near self-extinction with global warfare and the creation of nuclear weapons. But maybe they will also see a flowering of our more nobel side, as well, with the 20th century ushering up a new understanding and appreciation of nature and ourselves. Many areas of science saw exponential advancements, with general relativity, special relativity, and quantum mechanics, all being born in the last century. As important as these fields have become, the areas of life science have arguably had an even larger impact on society and our everyday lives. It was in the 20th century that we learned that DNA is the molecule of heredity, and the structure of DNA was described by Watson and Crick in 1953 as the now famous double helix. With improvements in the methods of molecular biology in the 1980s and 90s, genomics has lead us to a more complete understanding of the underlying mechanisms of disease and how the normal processes of life operate, develop, and evolve.

The word DNA is now common in the everyday vernacular, even if not everyone remembers that stands for deoxyribonucleic acid. Also nearly everyone has some idea that DNA is vital to genes and inheritance, and is used in forensics, paternity testing, genetic testing for disease mutations, and for mapping phylogenetic trees to understand the relatedness of all life. Somehow, though, we’ve been repeatedly told that DNA is like our blueprint. That it gives the plans for creating you and me, and anything else that has DNA. That analogy is a little misleading, as DNA doesn’t act as a blueprint at all. Looking at the full genetic code of an organism wouldn’t help you know very much about what that organism looked like. The only way you might really infer this from the genetic information would be by comparing the DNA sequences to other organisms that you already know a lot about. If the DNA you are looking at was very close to the DNA coding for octopus and squid then you could guess that this organism looks cephalopod-like. The DNA would not tell you the body plan by analyzing just the code on its own, however.

So how should we think about DNA? Is there something else we can compare it to that would make a more accurate analogy? Well, instead of being a blueprint, like a technical drawing that lays out the structural relationships of each part to the other parts, it is really more like a running computer program. DNA is a lot more like a large collection of computer programs, some are always running, and others are only running at certain times or in certain cell types. The DNA is giving instructions that are carried out by hardware running the code. In this analogy the DNA is the software and the cell and its molecular machinery is the hardware running the software. Software without hardware is hopelessly ineffectual, and hardware without software is nonfunctional. They both need each other to function. The DNA needs a living cell to carry out its instructions. In the proper setting these instructions are powerful, producing a whole human being from just a single cell, as it did with you during your 9 months of gestation in the womb. So how does it work?

Well, it’s important to recall how the information stored in DNA is interpreted by the cell’s internal machinery. The DNA itself is made of two long strands, forming the famous double helix. Each strand is made of sequences of nucleotide bases, and there are four nucleotide bases to choose from in the DNA alphabet – The DNA letters are A, C, G, and T. These letters are chemically distinct nucleotides, and you can picture a gene as being a string of these letters that make a unique sentence. A typical gene may be hundreds to thousands of these letters in length. For example, the human gene for the AVP-2 receptor, found on the X chromosome, codes for a protein located on the cell surface of certain kidney cells, and is critical to regulating normal water balance. The gene contains 4676 of these DNA letters.

Radioactive_Fluorescent_Seq

Starting from letter one to ten of the AVP-2 receptor DNA code, the letters read out as CTGCCCAGCC, but all 4676 letters of the DNA code for this gene are known and can be found in genetic databanks. Each strand of DNA has a complementary strand where every base in one strand pairs to another base in the other strand. A:T are pairs, and C:G are pairs. In other words, if you know the sequence of one strand you can easily deduce the sequence in the other strand, so for our first ten bases in the AVP-2 receptor – CTGCCCAGCC we know that the complementary strand would have to be GACGGGTCGG, based on the pairing rule. It is this complementary base pairing that makes it possible for DNA to replicate itself. Each strand serves as the template for making a new DNA strand. The double helix just needs to be unwound at the right time, the complementary bases added to each of the now single stranded DNA strands, and you now end up with two identical double helix DNA molecules, where you initially had just one. This has to happen for cells to divide so both of the new cells created from the original single cell has the same DNA as the original.

There are two major cell processes involved for turning the DNA code into protein. For the most part it is the protein that does the actual work in the cell, while it is the DNA that is the code-like programming being run. The first process is transcription, where the DNA code is converted or transcribed into an RNA code. The second process is translation, where the RNA code is converted or translated into the protein product. For the sake of simplicity, we are only talking about protein-coding genes, but there are many non-coding RNA genes, as well – we’ll save that topic for another day.

During transcription a molecular machines known as RNA polymerase interprets the DNA code and converts it into an RNA code, in the form of a single strand of messanger-RNA. RNA is quite similar to DNA except for a few key differences. One distinction is that it is single stranded rather than double stranded like it’s DNA cousin. Another is that it contains an extra chemical group called a hydroxyl that is lacking in DNA, and a third distinction is that whereas the letters in DNA are A, C, G, and T, in RNA the T is missing and a U is there in its place. The RNA alphabet, therefore has the letters A, C, G, and U, with A:U forming pairs and C:G forming pairs. The RNA can then be transported to the cell machinery used to make protein, and the DNA (the original code or source code) can remain safe in the chromosome – only the transcribed copy is sent out.

RNA-codons-aminoacids

The messenger-RNA (mRNA), finds it’s way to the ribosomes which are complex molecular machines that take the RNA code and make the actual protein. The RNA code is read by the ribosomes with every three bases forming a codon that specifies an amino acid. The protein is a string of amino acids. A few codons also tell the ribosome where the protein ends, and are called stop codons. The protein may still have a few steps to go before it is fully functional. For example it may need to have certain sugars or other chemical groups added at particular locations. It also needs to be folded into a very specific 3-dimentional shape, and it may need to associate with other proteins to form a part of a larger protein complex. Then it may need to be transported to specific sites in the cell, or even exported out of the cell, to do a job located in a different place in the body.

So why is our DNA not like a blueprint? Well, even if you could read the entire DNA code for all the protein producing genes, you would only see the ingredients that the DNA was coding for. That is far from a blueprint that might show you the structure of a building, where it’s doors, windows, elevators, stairwell, and so on, are located in spacial relation to one another. Knowing the protein products only gives you a list of ingredients. How those ingredients interact together, in time and space, is what creates an organism. The genetic code is a set of instructions that is executed on the code reading machinery of a living cell. The beauty of it is that not all the genes are transcribed at the same time and in the same amounts. Only a fraction of genes would be operational at any given time, and in a complex multicellular organism, only certain genes will ever be transcribed in any particular cell type. That is what makes a kidney cell different from a brain cell different from a cell in the heart muscle, and so on. Every cell has all the genetic programs, but only runs a subset of the total programs necessary for its own type.

DNA without the code reading cell machinery can do nothing on its own, which is why the vital flame of life must be passed down from living cell to living cell, uninterrupted since the very beginning of life itself. The genetic program is sophisticated enough that it causes genes to be transcribed that produce proteins that are themselves transcription factors secreted out of the cell to instruct neighboring cells as to which of their genetic programs to begin running. It is this complex coordination, leading to the switching on or off of particular genes in other cells, that starts the process of building a whole multicellular organism. In this way it is not just the genetic program that is necessary for building a animal, or person, or plant, but the local chemical environment that the program of each cell finds itself living in. The chemical neighborhood is just as important as genetic constituency.

In the language of Object Oriented Computer Programming, like Java for example, we might say that the complete genome of an organism is a program with many Classes (genes), and that when these classes are run instantiate Objects (proteins). Each and every cell in a body has the same program, but depending on it’s interaction with neighboring objects will Call only certain classes for use at any given time, and in some cases will never use particular classes that it has access to. These objects then go on to run all the functions necessary for that cell, including affecting other cells to call on certain objects in some cases. A human kidney cell has the entire “Human Program” as part of its software, but will only call on the classes used by a kidney cell, because it was derived from a cell that at one time could use all classes (Pluripotent stem cell), but at a certain point was instructed by its chemical environments to only allow use of the kidney classes. In other words, it differentiated into a kidney cell, thereby losing the ability to be a different cell type.

This is one reason, that even though we have completely sequenced the human genome, we still have a very incomplete understanding of what most the the genes are doing. Just by looking at their code it is not easy to determine what their affect is in a whole organism. The computer analogy may not be the perfect analogy, but it does illustrate the problem much better than the typical blueprint analogy does.

 

Other interesting things about DNA, and other fun topics:

  1. Intron Retention: a common cause for cancer“.  by Rich Feldenberg. ZME science. 1/25/2016.
  2. Alternative Splicing.  Wikipedia.
  3. Non-coding RNA.  Wikipedia.
  4. Why the Horta would not have looked like a rock monster“.  Darwin’s Kidneys.  June 18, 2015.