Do Cells use Passwords? Do they Encrypt Information?

Living organisms must maintain proper regulation including defense and healing. Life-threatening problems may be caused by pathogens or an organism’s own cells’ deficiency or hyperactivity, in cancer or auto-immunity. Life evolved solutions to these problems that can be conceptualized through the lens of information security, which is a well-developed field in computer science. Here I argue that taking an information security view of cell biology is not merely semantics, but useful to explain features of cell signaling and regulation. It also offers a conduit for cross-fertilization of advanced ideas from computer science, and the potential for biology to inform computer science. First, I consider whether cells use passwords, i.e., precise initiation sequences that are required for subsequent signals to have any effect, by analyzing chromatin regulation and cellular reprogramming. Second, I consider whether cells use the more advanced security feature of encryption. Encryption could benefit cells by making it more difficult for pathogens to hijack cell networks. Because the ‘language’ of cell signaling is unknown, i.e., similar to an alien language detected by SETI, I use information theory to consider the general case of how non-randomness filters can be used to recognize (1) that a data stream encodes a language, rather than noise, and (2) quantitative criteria for whether an unknown language is encrypted. This leads to the result that an unknown language is encrypted if efforts at decryption produce sharp decreases in entropy and increases in mutual information. A fully decrypted language should have minimum entropy and maximum mutual information. The magnitude of which should scale with language complexity. I demonstrate this with a simple numerical experiment on English language text encrypted with a basic polyalphabetic cipher. I conclude with unanswered questions for future research.

amplification, memory, modularity, feedfoward and other motifs, which are reviewed by Krakauer and colleagues 2 , Uda and Kuroda 3 , Mousavian and colleagues 4,5 , Walterman and Klipp 6 , Azeloglu and Iyengar 1 , and Antebi and colleagues 7 . Cell networks can become dysfunctional through somatic mutation, chemical injury, infection, or other processes, that achieve varying degrees of control over the network 8 . Here, I begin to consider these processes through the lens of information security, which as far as I can determine is not common. This is notable for its stark contrast to human telecommunications, where cybersecurity is of paramount importance 9 . In an elegant and trenchant examination of theoretical biology, Krakauer and colleagues argue "before we can look for patterns, we often need to know what kinds of patterns to look for, which requires some fragments of theory to begin with 10 ." Therefore, I propose fragments of theory for information security in cells for the community to begin to hunt for patterns and test predictions.
By explicitly incorporating information security concepts into thinking about biological systems, several outcomes are possible in general: (1) distinctions without differences: rephrasing familiar concepts of immunity and regulation in terms of information security adds no value; (2) crossdisciplinary fertilization occurs as information security concepts are imported into biological theory; (3) new information security knowledge arises from examination of biological systems.
Recent studies on network controllability provide one framework for examining information security in biochemical networks [11][12][13][14][15] . In this essay, a different perspective is taken to analyze whether cells use passwords and encrypt information.

Immune Systems and biological security
The evolution of immune systems and self-defense against injury and mutation are major innovations in the history of life on earth [16][17][18] . By total volume, life on earth has its largest habitat in the deep ocean with an abundance of bacteriophages, suggesting that evolution leads to a proliferation of simple life forms, with consciousness as a kind of statistical accident 19 . Singlecelled and multi-cellular organisms evolved a wide-variety of defense systems, often dichotomized into innate and adaptive systems 20 . These systems can be conceptualized more generally to include protective mechanisms against both external and internal damage. The connection between external and internal injury is seen in the study of viruses, which led to insights in cancer biology and the discovery of oncogenes 17 . Organisms developed the ability to recognize self from non-self and destroy xenobiotic material. However, not all foreign genetic material is completely destroyed, because it can increase fitness, e.g., antibiotic resistance plasmids 20,21 . On the intracellular level, bacterial defense mechanisms include blocking receptor binding (surface modification), genome injection (superinfection exclusion), viral replication (restriction modification, CRISPR-Cas, and prokaryotic Argonaute), and abortive infection (programmed cell death) 21 . Similar mechanisms exist in eukaryotic cells, including, RIG-like receptor proteins that recognize RNA 16 , xenophagy 22 , advanced intracellular nucleic acid recognition systems and other cell-autonomous mechanisms 23 . In plants, sophisticated DICERs defend against retroviruses 24 . Similarly, pathogens use a variety of mechanisms to co-opt, hijack, and counteract host defenses [25][26][27][28] . Mutations leading to oncogenes reprogram signaling networks 29 . All of these attacks and counter-attacks involve changes in signaling and regulatory networks, and therefore, changes in information.

Information security in computer science
Information security has been critically important for millennia, with the Caesar substitution cipher being a prominent early example 30 . (The cipher works by shifting each letter of the English alphabet by 3, i.e., A->C, B->D,...,X->A 30 ) Computer viruses achieved notoriety in 1987 when the Brain, Lehigh, and April Fool viruses came to worldwide attention 31 . Hackers achieved infamy and also contributed to the advancement of information technology 32 . Information security depends on the use of passwords for system access and encryption to alter information so that its meaning is obfuscated 33 . Development of secure encryption systems, e.g., the RSA asymmetric public key cryptography, was an essential innovation in the history of the internet 33 and must constantly evolve to meet new threats 9 . Steganography is an altogether different approach that conceals the existence of information, e.g., writing with invisible ink, and appears to have had played less importance in the history of information technology than cryptography 33 . Attacks on encrypted systems can involve interception, modification, fabrication, or interruption of information 33 . There has been considerable work in adapting biomolecules for use in information security in human telecommunications using biosteganography 34 where information is invisible and molecular cryptography, where synthetic biology is used to re-engineer molecules to decode and encode information 35 . Despite obvious parallels in the world of computers, less explicit attention appears to have been paid to theoretical descriptions of cells in terms of their native information security systems, prompting me to ask: Do cells use passwords? Do they encrypt information?

Information systems in cells
Individual cells have a variety of sophisticated information systems. They encode information through the genetic code, which utilizes double-stranded complementary base pairing to provide built-in error correction, which is a type of backup or repair security system. At the proteome level, cells can greatly expand on the genetic code with a few hundred different post-translational modifications in various combinations, that give rise to numerous proteoforms 36 , which form components of signaling and regulatory networks. Somatic recombination in immunoglobulins and T-cell receptors can vastly increase protein variants in certain cell types 37 . Interactions of these macromolecules form networks that store and transmit information 6 . There is a context specificity to many signaling pathways, including TGF-beta and AKT, which means that cells respond differently to pathway activation depending on the cell type 38,39 . Many intracellular signaling pathways do not match one receptor to a single ligand, but instead use multiple receptors and ligands that interact combinatorially 40 , or use combinations of numerous nuclearreceptor cofactors to regulate activity 41 . Therefore, genetic, epigenetic, transcriptomic and proteomic variation gives rise to a large repertoire of interacting components. These mechanisms are present in complex multicellular organisms, where advanced regulation is needed to control differentiation 42 and also in bacteria for quorum sensing 2 .
Cancer has been shown to involve rewiring cellular networks by oncogenes and therefore, in some sense, these represent alterations in information transmission and compromised security 29,43 . Cells can be reprogrammed through microRNAs and gene regulatory networks in cancer to oncogenic states with distinct metabolism 44 . Similarly, viruses can substantially rewire signaling and regulatory networks to hijack cellular machinery for viral benefit 45 . In the early days of cancer research, similarities between the two systems caused the scientific community to think that viruses cause cancer, and studies into viral biology provided insights into cancer 17,46 . Both pathogenic and pathological processes involve hijacking cellular networks.
In multicellular organisms, combinations of histone modifications give rise to varying chromosomal accessibility and epigenetic states, which are read, written, and erased by chromatin modifiers 47,48 . This epigenetic regulation is capable of encoding memory at the singlecell level 49 . Redundancy and correlation among epigenetic marks, transcription factors, and coregulators provides a system of information compression to specific cell state 50 . For example, ligand identity can be encoded as pulsatile (DLL1-Notch1) or sustained (DLL4-Notch1) to induce opposite cell fates. In the adult human body, several hundred distinct cell types exist in "cell states", some of which can be dynamically reprogrammed from one state to the next using sophisticated perturbations [51][52][53] . The language used to describe these cellular properties (code, encode, read, write, memory, erase, reprogram, compression, rewire) points to their aspects as information systems.

Do cells use passwords?
Password authorization systems allow access based upon entry of a correct code out of many possible entries. They can be viewed conceptually as an initiation sequence of signals without which the system will not respond to subsequent signals. Typically, passwords function as a logical AND operation, i.e., each character must be entered correctly to allow system access.
However, a logical AND gate is not strictly required. For example, a bouncer at a nightclub may listen for the password "more cheese" but accept partial matches, such as "more these" or "Moishe's". I consider whether there is an evidence for the existence of passwords, i.e., an initiation sequence of signals without which the system will not respond to subsequent signals using the example of transcription factor-chromatin accessibility.
Organization of chromatin into highly compact, inaccessible regions, and open, accessible regions appears on its face to be a form of cellular information security because some genes are reprogramming.

Do cells encrypt information?
If cell signaling networks use encryption, how might we know? Put another way, if we do not know the underlying language, i.e., the unencrypted information, how can we recognize encrypted information? To explore this question, several concepts from information theory are useful. The Shannon entropy is defined as 55 : where H is the entropy in bits, defined as the expected information of a distribution of random variables X. The entropy can be thought of as how predictable the next character in a transmitted message is. A message that is purely random characters and therefore, not meaningful language, will have the highest entropy 55 . Considering only the 26 letters in the English alphabet, the maximum entropy is log 2 (26)=4.7 bits. Shannon analyzed words of size N up to 8 letters and found the entropy of the English language to be roughly 2.3 bits per letter, a 50% reduction over random 56 . The English alphabet could eliminate the letter c with either k or s without any meaningful effects. Moreover, English text can be re-coded and stored in smaller file sizes without loss of information (lossless compression) using sophisticated algorithms 55 . Entropy provides a limit on lossless compression 55 .
A related concept to entropy is Zipf's law, which states that a word's probability is inversely proportional to its rank and has been found in English language phrases, and also other fields, e.g., city sizes, firm sizes, and neural activity 57 .

(2)
A large number of explanations has been proposed for why Zipf's law exists, which are reviewed by Piantadosi 58 . Purely random texts do not follow Zipf's law 59

. Salge and colleagues found that
Zipf's law emerges through minimization of communication inefficiency and direct signal cost 60 .
Williams and colleagues found that Zipf's law held more generally for phrases in English than words, which is intriguing because phrases are "the most coherent units of meaning in language 61 ." Language has additional structure that can be captured through analysis of pairwise and higherorder interactions 62 . One measure of association is mutual information 6 . It can be defined between two sets of variables X and Y, e.g., adjacent letters in the English alphabet as where H(X,Y) is the joint entropy between the X and Y, which is defined as When X and Y are statistically dependent, the joint entropy H(X,Y) is lowest and the mutual information is maximized.
Doyle and colleagues describe the search for extraterrestrial intelligence (SETI) as fundamentally applying Zipf's law and higher-order information-entropic filters to received sources of electromagnetic radiation 63 . Cell signaling and gene expression have been shown to pass both of these non-randomness filters 6,64 . These non-random filters can also be applied to any sort of data stream to check if it is non-random.
If a simple substitution cipher is applied to an unknown language, the frequency distributions of letters, words, and phrases do not change, and therefore, given enough text would be recognizable as language, although perhaps untranslatable. For a more complex cipher, e.g., a polyalphabetic cipher, the entropy will increase and frequency distributions will deviate from Zipf's law. In other words, if SETI receives a long stream of an alien communication that is encrypted by relatively simple methods, its non-randomness filters should recognize it as a language. If the alien language is encrypted with a polyalphabetic cipher, which was subsequently decrypted, the plaintext would have lower but non-trivial entropy.
A quantitative test for whether a text is encrypted is whether there is a decryption, such that: Where d is a decryption out the set of all possible decryptions D, E is the decrypted plaintext, and MI is the mutual information in the decrypted plaintext, e.g., the mutual information in adjacent letters, and H is the entropy of the decrypted plaintext, e.g., per letter. In other words, a signal stream is encrypted if a decryption can be found, such that the entropy is minimized and the mutual information is maximized.
To demonstrate this, I provide a simple numerical example. The text of Jane Austen's novel Pride and Prejudice was downloaded from the Gutenberg project 65 , processed and cleaned of special characters in the R programming language using the textclean package 66 , and encrypted with a simple polyalphabetic substitution cipher of 0,+1,+2. Figure 1A shows the frequency distributions of adjacent letters in the plaintext. Figure1B shows how the frequency distributions of adjacent letters in the encrypted text result in an increase in entropy. The Entropy R package was used to compute entropy per letter and mutual information for adjacent letters 67 . Figure 1C shows how applying varying levels of decryption using several different methods results in changing entropy per letter and mutual information of adjacent letters. As the text is decrypted more completely, the entropy per letter decreases and the mutual information per pair of adjacent letters increases.
Complete decryption produces a maximum of this mutual information and a minimum of entropy.
Therefore, we can begin to look for patterns that may involve encryption in very rich data of cell signaling by applying this quantitative criterion.

Conclusions and open questions
Evolutionary potential is vast and a complex interplay among environmental change, ecosystems, speciation, niche diversification, extinctions, and innovation have shaped life on earth 68,69 .
Considering how rapidly passwords and encryption evolved in human telecommunications, it is natural to ask whether they are used in nature by cells. This theoretical exploration suggests that cells may use passwords to lock-in cell state, which must be unlocked through the right