TY - JOUR T1 - Not All Experimental Questions Are Created Equal: Accelerating Biological Data to Knowledge Transformation (BD2K) via Science Informatics, Active Learning and Artificial Intelligence JF - bioRxiv DO - 10.1101/155150 SP - 155150 AU - Simon Kasif AU - Stan Letovsky AU - Richard J. Roberts AU - Martin Steffen Y1 - 2017/01/01 UR - http://biorxiv.org/content/early/2017/06/25/155150.abstract N2 - Pablo Picasso, when first told about computers, famously quipped “Computers are useless. They can only give you answers.” Indeed, the majority of effort in the first half-century of computational research has focused on methods for producing answers. Incredible progress has been achieved in computational modeling, simulation and optimization, across domains as diverse as astrophysics, climate studies, biomedicine, architecture, and chess. However, the use of computers to pose new questions, or prioritize existing ones, has thus far been quite limited.Picasso’s comment highlights the point that good questions can sometimes be more elusive than good answers. The history of science offers numerous examples of the impact of good questions. Paul Erdős, the wandering monk of mathematical graph theory, offered small prizes for anyone who could prove conjectures he identified as important (1). The prizes varied in cash amounts based on the perceived complexity of the problem posed by Erdős.Posing technical questions and allocating resources to answer them has taken on a new guise in the Internet age. The X-Prize foundation (http://www.xprize.org/) offers multi-million dollar bounties for grand technological goals, including goals for sequencing genomes or space exploration. Several companies provide portals where customers can place cash bounties on educational, scientific or technological challenges, while potential problem solvers can compete to produce the best solutions for these problems. Amazon’s Turk site (https://www.mturk.com/mturk/welcome) links people requesting performance of intellectual tasks to people willing to work on them for a fee. Such crowd-sourcing systems create markets of questions and answers, and can help allocate resources and capabilities efficiently.This paradigm suggests a number of interesting questions for scientific research. In a resource limited environment, can funds and research capacity be allocated more efficiently? Can knowledge demand provide an alternative or complementary mechanism to traditional investigator-initiated research grants?The fathers of Artificial Intelligence (AI) and Herbert Simon in particular envisioned the application of AI to Scientific Discovery in different forms and styles (focusing on physics). We follow on these early dreams and describe a novel approach aimed at remodeling of the biomedical research infrastructure and catalyze gene function determination. We aim to start a bold discussion of new ideas aimed towards increasing the efficiency of the allocation of research capacities, reproducibility, provenance tracking, removing redundancy and catalyzing knowledge gain with each experiment. In particular, we describe a tractable computational framework and infrastructure that can help researchers assess the potential information gain of millions of experiments before conducting them. The utility of experiments in this case is modeled as the predictive knowledge (formalized as information) to be gained as a result of performing the experiment. The experimentalist would then be empowered to select experiments that maximized information gain if they wished, recognizing that there are frequently other considerations, such as a specific technological or medical utility, that might over-ride the priority of maximizing information gain. The conceptual approach we develop is general, and here we apply it to the study of gene function. ER -