Our DNA Research Project
When time permits (which isn’t often), we continue our research into the characteristics of information in DNA sequences using techniques developed for a military application. Specifically, target-recognition algorithms developed for television-guided bombs can be used to distinguish “natural” objects (trees, rivers, rocks) from “artificial” objects designed by man (trucks, tanks, bridges, buildings) that are likely targets.
From an abstract point of view, a TV picture is simply an encoded message that contains some information. A DNA sequence is also an encoded message that contains some information. Similar techniques should exist for recognizing the information content of the sequence and evidence of design.
1. Develop an algorithm that can distinguish actual DNA sequences from random sequences of base pairs.
Many genes have been sequences and published on the Internet. They are represented as a sequence of the letters ACG and T. Our goal is to develop an algorithm that can distinguish actual DNA sequences from a nonsense sequence constructed from these four letters.
2. Determine how much corruption has occurred in actual DNA sequences.
If one took an actual DNA sequence and randomly changed 1% of the letters in the sequence, the computer algorithm should recognize that it is an actual DNA sequence, but at a somewhat lower confidence level. This is generally equivalent to “bit error detection” in a digital signal processing application.
3. Correct corrupted DNA sequences.
If one can determine which base pairs are probably in error, one could replace them with the proper base pairs. This is generally equivalent to “bit error correction” in a digital signal processing application.
If we can identify corrupted DNA sequences (that is, mutations) and figure out what they were before they were corrupted, we could feed this information back to biologists who, using gene splicing techniques, might be able to recreate what we believe to be the original, perfect sequence. Then they can determine experimentally if the corrected DNA sequence does in fact create an organism that is somehow superior to the mutation.
1. Suppose that you don’t speak any Russian, and that your web browser has a Russian font installed. Suppose I provided two links for you to click on. One link goes to a real Russian text file. The other link goes to a file of random Russian characters. We bet that you could instantly tell which was the real text file, even without understanding a single word. The best clue would probably be that spaces appear at frequent intervals in the real Russian text. In the random Russian letter file, there might be many “words” that are more than 100 characters long. You would instantly recognize that Russian words could not possibly be that long. This is analogous to our first goal. We want to devise a way to tell real DNA sequences from false ones.
2. Suppose that an extra-terrestrial life form is trying to determine if there is intelligent life on Earth. It suspects there might be signs of information on the Internet, and somehow downloads millions of English text documents. The life-form doesn’t know what any of the documents mean, but it does build a dictionary. While scanning the documents it comes across the sentence, “He ix here.” It suspects the word “ix” is misspelled, because no other document contains the word “ix”. It recognizes that the sentence is damaged, but doesn’t know what is wrong with it. This corresponds to our second goal.
3. Suppose the extra-terrestrial life form tries to figure out how to correct the sentence, “He ix here.” From its dictionary, it knows that the second word might be “it”, “in” or “is”. It searches the entire data base without ever finding “He it” or “it here”, and concludes that “it” is not the proper correction for “ix”. The data base does contain some sentences with the phrase “in here”, but none with the phrase “He in”, so “in” might be correct, but it is doubtful. When the extra-terrestrial intelligence finds many sentences containing the phrase “He is”, and many other sentences containing “is here”, the extra-terrestrial intelligence can decide with a high degree of confidence that the sentence should read “He is here.” even without knowing what it means. This corresponds to our third goal.
We Are Giving Away the Farm
These are the problems we are trying to solve. But, we don’t have much time to work on them, and it might take quite a while. Meanwhile, someone else might read this web page, have some good ideas, and solve the problems first. That would upset us if we were motivated by he desire to win a Nobel prize for solving them. But we really don’t care who solves them, as long as somebody does. So, feel free to take our ideas and run with them.