For more information on the Evolution 2.0 technology prize, see the book Evolution 2.0: Breaking the Deadlock Between Darwin and Design (BenBella Books September 2015)
Where did the information in DNA come from? This is one of the most important and valuable questions in the history of science. Cosmic Fingerprints has issued a challenge to the scientific community:
“Show an example of Information that doesn’t come from a mind. All you need is one.”
“Information” is defined as digital communication between an encoder and a decoder, using agreed upon symbols. To date, no one has shown an example of a naturally occurring encoding / decoding system, i.e. one that has demonstrably come into existence without a designer.
A private equity investment group is offering a technology prize for this discovery. We will financially reward and publicize the first person who can solve this; details to be announced with the release of the forthcoming book Evolution 2.0. To solve this problem is far more than an object of abstract religious or philosophical discussion. It would demonstrate a mechanism for producing coding systems, thus opening up new channels of scientific discovery. Such a find would have sweeping implications for Artificial Intelligence research.
This would provide a solution to the most perplexing problem currently faced by the Origin Of Life field, namely the origin of coded information. How could the genetic code (or any coding system) come into being? This would represent a landmark discovery in the history of science and alter our fundamental understanding of the universe.
Essential Components of a Communication System (after Shannon, 1948):
DNA matches the pattern in the above diagram. Cosmic Fingerprints is seeking discovery and proof of a naturally occurring code, which also matches this pattern.
The following specification defines the criteria for identifying a naturally occurring code:
1. Humans can design the experiment, with all manner of state-of-the-art laboratory equipment, ideal conditions etc. They just can’t cheat: the submitted system cannot be pre-programmed with any form of code whatsoever.
2. Since the origin of DNA is unknown, the submitted system cannot be a direct derivative of DNA or produced by a living organism. Bee waggles, dogs barking, RNA strands and mating calls of birds don’t count. Such codes are products of animal intelligence, genetically hard-coded and/or instinctual.
3. The origin of the submitted system must be documented such that its process of origin can be observed in nature and/or duplicated in a real-world laboratory according to the scientific method.
4. The submitted system must be digital, not analog.
5. The submitted system must have the three integral components of communication functioning together: encoder, code, decoder.
6. The message passed between encoder and decoder must be a sequence of symbols from a finite alphabet.
7. A symbol is a group of k bits considered as a unit. We refer to this unit as a message symbol mi (i=1, 2, …. M) from a finite symbol set or alphabet. The size of the alphabet M is M = 2^k where k is the number of bits in the symbol. For a binary symbol, k = 1, M = 2. For a quaternary symbol in DNA, k = 2, M = 4.
8. A character is a group of n symbols considered as a unit. We refer to this unit as a message character ci (i=1, 2, …. C) from a finite word set or vocabulary. The maximum size of the character set C is C = M^n. For a standard computer byte, M = 2, n = 8, C=256. For a triplet group of quaternary symbols in DNA, M = 4, n = 3, C=64.
9. The submitted system must be labeled with values of both encoding table and decoding table filled out.
10. For the submitted system, it must be possible to objectively determine whether encoding and decoding have been carried out correctly. For example when you press the “A” key on the keyboard, a letter “A” is supposed to appear on the screen and there is an observable correspondence between the two. In defining biological gender, a combination of X and Y chromosomes should correspond to male, while XX should correspond to female. For any given system, a procedure should exist for determining whether input correctly corresponds to output.
(Above definitions adapted from Digital Communications: Fundamentals and Applications by Bernard Sklar, page 13, Prentice Hall, 2nd edition, 2001)
Isomorphism between Shannon’s Communication System and DNA:
Example Communication Systems:
Example #1: The ASCII Code
Keyboard > ASCII > Computer Screen: When you press the letter “A” on the keyboard, the letter is encoded into ASCII and decoded by the computer and a letter “A” appears on the screen.
ASCII characters contain 7 symbols, so n = 7. The ASCII character set C is 2^7 or 128 characters.
Encoding tables for ASCII (letter on keyboard > binary code):
|Input (letter on keyboard)||Encoded Message|
The complete ASCII table is available at http://en.wikipedia.org/wiki/Ascii#ASCII_printable_characters
Decoding tables for ASCII (binary code > letter on screen or printer):
|Encoded Message||Output (displayed as an arrangement of pixels on screen or printer)|
Example #2: The Genetic Code
Nucleotides > mRNA > Proteins: Base pairs are grouped into codons and encoded (transcribed) into messenger RNA, then decoded (translated) by the ribosomes into proteins.
The DNA symbol unit is a nucleotide, forming a 4 letter alphabet of Adenine, Cytosine, Guanine, or Thymine. Each base pair contains k = 2 bits of information. A character consists of n = 3 symbol units. Character set C is 4^3 which is 64 characters. DNA’s redundancy scheme maps these 64 characters to 20 amino acids.
Encoding tables for DNA (base pairs > mRNA):
|Nucelotides (Input)||Amino Acid (Encoded Message)|
The complete genetic code chart is available at http://en.wikipedia.org/wiki/Genetic_code#RNA_codon_table
Decoding tables for DNA (amino acids > proteins):
|Amino Acid Sequence (encoded message)Legend of Amino Acid Abbreviations||outputPeptide/Protein* (organism name)
|MRTGNAN||Microcin C7 (EC)||7|
|DRVYIHPF||Angiotensin 2 (HS)||8|
|RPKPQQFFGLM||Substance P (HS)||11|
|GGAGHVPEYFVGIGTPISFYG||Microcin J25 (EC)||21|
|RSCCPCYWGGCPWGQNCYPEGCSGPKV||Neurotoxin 3 (AS)||27|
|APLEPVYPGDNATPEQMAQYAADLRRYINMLTRPRY||Pancreatic Hormone (HS)||36|
|KCNTATCATQRLANFLVHSSNNFGAILSSTNVGSNTY||Islet amyloid polypeptide (HS)||37|
|CTPGSRKYDGCNWCTCSSGGAWICTLKYCPPSSGGGLTFA||Serine protease inhibitor 3 (SG)||40|
|DDGLCYEGTNCGKVGKYCCSPIGKYCVCYDSKAICNKNCT||Pollen allergen Amb t 5 (AT)||40|
|VGIGGGGGGGGGGSCGGQGGGCGGCSNGCSGGNGGSGGSGSHI||Microcin B17 (EC)||43|
|ATYNGKCYKKDNICKYKAQSGKTAICKCYVKKCPRDGAKCEFDSYKGKCYC||Antifungal protein (AG)||51|
|GIVEQCCTSICSLYQLENYCN-FVNQHLCGSHLVEALYLVCGERGFFYTPKT||Insulin A-B chains (HS)||51|
|DIPEVVVSLAWDESLAPKHPGSRKNMACYCRIPACIAGERRYGTCIYQGRLWAFCC||Neutrophil defensin 1 (HS)||56|
|CSSNAKIDQLSSDVQTLNAKVDQLSNDVNAMRSDVQAAKDDAARANQRLDNMATKYRK||Major outer membrane lipoprotein (EC)||58|
|RPDFCLEPPYTGPCKARIIRYFYNAKAGLCQTFVYGGCRAKRNNFKSAEDCMRTCGGA||Pancreatic trypsin inhibitor (BT)||58|
|EEYVGLSANQCAVPAKDRVDCGYPHVTPKECNNRGCCFDSRIPGVPWCFKPLQEAECTF||Trefoil factor 3 (HS)||59|
|IRCFITPDITSKDCPNGHVCYTKTWCDAFCSIRGKRVDLGCAATCPTVKTGVDIQCCSTDNCNPFPTRKRP||Long neurotoxin 1 (NK)||71|
*Mature form; **#: number of amino acids. Source: http://www.uniprot.org
This is only a partial listing of the simplest proteins. There are about a million known proteins, many of them extremely complex. More information on protein structures is available at http://www.uniprot.org and http://www.ncbi.nlm.nih.gov/.
Both ASCII and DNA are formal communication systems according to Shannon’s model because they encode and decode messages using a system of symbols. DNA is not like a communication system, or analogous to a communication system; it is formally defined as a communication system.
“Information, transcription, translation, code, redundancy, synonymous, messenger, editing, and proofreading are all appropriate terms in biology. They take their meaning from information theory (Shannon, 1948) and are not synonyms, metaphors, or analogies.” (Hubert P. Yockey, Information Theory, Evolution, and the Origin of Life, Cambridge University Press, 2005).
Similar tables are easily made for other codes and communication systems, like HTML, bar codes, postal codes, Morse code, computer file formats and programming languages.
Miller-Urey Experiment and the Origin of Life
The 1953 “Miller-Urey” experiment*** produced organic compounds from gases thought to be present in earth’s early atmosphere. It is widely cited in textbooks as an explanation of how early life was formed in the ocean.
This experiment only attempted to explain where a handful of the chemicals came from, and it certainly didn’t begin to explain how replication got started. Still, it provided useful insights.
If the Miller-Urey experiment had produced encoding, decoding, and information transmission as defined here, it would most certainly qualify as meeting this challenge.
Public recognition will be awarded to the first person who demonstrates a naturally occurring communication system that meets the engineering specification outlined in this document. Submissions must be identical in format to the above examples of ASCII and DNA. Submissions must include a definition of all symbols, alphabet and the associated encoding / decoding tables.
Download the application form and instructions for submission here.
All submissions, along with our evaluations of those submissions, are available in their entirety for public review at the following page:
***Miller, Stanley L. (May 1953). “Production of Amino Acids Under Possible Primitive Earth Conditions” Science 117 (3046): 528.