Dissecting Complexity: The Hidden Impact of Application Parameters on Bioinformatics Research

Biology is a quest; an ongoing inquiry about the nature of life. How do the different forms of life interact? What makes up an ecosystem? How does a tiny bacterium work? To answer these questions biologists turn increasingly to sophisticated computational tools. Many of these tools are highly configurable, allowing customization in support of a wide range of uses. For example, algorithms can be tuned for precision, efficiency, type of inquiry, or for specific categories of organisms or their component subsystems. Ideally, configurability provides useful flexibility. However, the complex landscape of configurability may be fraught with pitfalls. This paper examines that landscape in bioinformatics tools. We propose a methodology, SOMATA, to facilitate systematic exploration of the vast choice of application parameters, and apply it to three different tools on a range of scientific inquires. We further argue that the tools themselves are complex ecosystems. If biologists explore these, ask questions, and experiment just as they do with their biological counterparts, they will benefit by both finding improved solutions to their problems as well as increasing repeatability and transparency. We end with a call to the community for an increase in shared responsibility and communication between tool developers and the biologists that use them in the context of complex system decomposition.

Scientific investigation in biology has become increasingly dependent on computational 2 discovery. Modern research workflows rely on the use of various evolving bioinformatics 3 software applications which has led to a sophisticated computational ecosystem that 4 accommodates increasingly diverse and integrated data sets along with their associated 5 experimental probes, as well as comprehensive conceptual models of scientific 6 understanding. As an example, consider an investigation of the mechanisms of 7 promoting growth related to plant and microbe interactions as described in Finkel et 8 al. [1]. They found a trait -such as the inhibition of root growth -can be 9 significantly reversed by interaction with a single operon found in the bacteria 10 This manuscript has been co-authored by UT-Battelle, LLC under Contract No. DE-AC05-00OR22725 with the U.S. Department of Energy. The United States Government retains and the publisher, by accepting the article for publication, acknowledges that the United States Government retains a non-exclusive, paid-up, irrevocable, worldwide license to publish or reproduce the published form of this manuscript, or allow others to do so, for United States Government purposes. The Department of Energy will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (https://energy.gov/downloads/doe-public-access-plan). each of these tools may be considered almost as complex as the biological subsystems 23 being explored. 24 The above example illustrates a single software workflow. Different research 25 questions will require different workflows with different application parameters. These 26 application parameters (or settings) can be coefficients or rate constants for biological 27 processes, or they can be parameters to algorithms. This has led to the inclusion of 28 many parameters in bioinformatics tools. We use the term configuration to refer to a 29 specific set of chosen parameters. For instance, BLAST (Basic Local Alignment Search 30 Tool ) [4], one of the most commonly used tools to study DNA and protein sequences, 31 has approximately 50 parameters a user can modify and customize. Changing these can 32 impact core functionality such as increasing the string match length, or modifying the 33 output format. Users can simply leave the default settings, or modify a few settings 34 based on their laboratory norms, but this practice may result in the full power of these 35 tools not being leveraged with less than optimal results or missed observations. 36 However, modifying parameters without a full understanding of their purpose can 37 also be problematic. For example, in a letter to the editor of Bioinformatics by Shah et 38 al. [5], they discuss a misunderstanding of a commonly used BLAST parameter 39 (max target seqs) by another researcher. BLAST documentation states the following 40 definition of max target seqs [6]:  Ties are broken by order of sequences in the database. 45 Instead of this parameter filtering at the end as this may be commonly interpreted, 46 it actually filters results during the algorithm's iterative search. Thus it may not be 47 clear to a user how it impacts their results. The authors of the referenced paper stated 48 their explicit reason for limiting max target seqs to a value of 1, under the assumption 49 it filtered out only the best (top) result at the end of the BLAST search. However, they 50 misunderstood the parameter's purpose. In a response from the NCBI team [7], they 51 acknowledge that Shah discovered a bug (which they fixed) and that they have updated 52 their documentation. Follow on work by González-Pech et al. [8] point to a number of 53 misunderstood parameters of the tool. 54 In addition, several studies have examined how users interact with common 55 bioinformatics tools. Morrison-Smith et al. noted that users are unsure of how to select 56 the correct parameters and have to trust a tool's correctness without real confidence [9]. 57 For example participants have said "At the end of the day, you just have faith that the 58 program's working or not." Others have examined how changing parameters impacts 59 system performance [10] or correctness [11], however, there is a lack of research on how 60 to train scientists to efficiently leverage the potential of parameters. We argue that if 61 March 10, 2023 2/26 scientific software is viewed as a magic box, the community loses scientific potential. 62 Instead, the bioscience user community needs to peel away the covers and develop 63 techniques that explore the impact these parameters have on their results, managing 64 them using a controlled and scientific approach. Reading documentation alone may not 65 be sufficient. 66 Without a set of systematic techniques, it can be difficult to manage this software 67 complexity. Since scientists use these tools to help form the basis of their conclusions, a 68 lack of understanding of how parameters change the outcome of the results is both a 69 missed opportunity and potentially problematic. In fact, it has already led to 70 publication retractions [12,13]. Biologists, hence, often develop a laboratory-specific 71 routine protocol of what to change and leave the rest of the options alone. In this article 72 we present a systematic and exploratory protocol (SOMATA) to work with scientific 73 software. We also argue for more interaction between tool developers and end-users to 74 build a shared body of knowledge. This will aid the advancement of computational 75 methods to match the rigor that already exists in the documentation of lab work.

76
In the following sections, we first present an example and approach for working with 77 bioinformatics software that matches the rigor and practice that already exists in the 78 documentation and execution of laboratory work. Then we discuss background on 79 software configurability and the bioinformatics tools used in this work. Next, we 80 propose a systematic protocol (SOMATA) for breaking down the complexity of software 81 tools. We then apply this protocol in a series of experimental studies with one study 82 focused on empirically measuring the effect of using different configurations (set of 83 choosen parameters when using a tool), and a second study demonstrating an approach 84 to exploring the parameter space. Finally, we end with a discussion mapping out three 85 key takeaways.

86
Systems View -A Motivating Example 1 87 A systems view in the biological context is a useful approach to study complex and 88 potentially interacting systems. In this paper we take a systems view of software itself. 89 Instead of using software as a single tool, we demonstrate how users can approach 90 software with the same mindset as the organisms they study; as a complex system that 91 can be optimized based on one's environment and goals. We demonstrate this idea with 92 an example showing how application parameters can change how a researcher perceives 93 and interacts with bioinformatics tools while conducting a common experimental 94 scenario. In this scenario we are trying to understand how different chemical compounds 95 in a growth media change the metabolic pathways utilized in Escherichia coli (E. coli ). 96 For our growth medium we use the minimal media formulation Carbon-D-Glucose.

97
Before running extensive laboratory experiments we might want to exploit a 98 common computational method simulating an organism's growth using a genome-scale 99 metabolic network [14]. We can run a Flux Balance Analysis (FBA) which calculates 100 the flow of metabolites through the metabolic network in a specific growth media to 101 optimize for growth (measured by flux through the biomass reaction) using a linear 102 programming algorithm. The FBA tool used here (further discussed in experimental 103 study methodology) provides input options such as the FBA model and the media, as 104 well as algorithmic options such as the reaction to maximize, custom flux bounds, 105 expression thresholds, and maximum uptakes of different nutrient sources. One output 106 of this tool is the objective value (OV) which represents the maximum flow through the 107 biomass reaction as a measurement of growth [15]. If we do not manually change any of 108 the parameters (i.e. run the default configuration) E. coli produces an OV of 0.164692 109 mmol/g CDW hr.

110
For the purpose of this example we will explore the effect of the max oxygen 111 uptake parameter which is described in the documentation as the "maximum number 112 of moles of oxygen permitted for uptake (default uptake rates varies from 0 to 100 for 113 all nutrients)". This parameter has no fixed default value since the maximum is 114 constrained by the media input. We will change this to a value of 0 to see what the 115 behavior is. Under this new configuration, E. coli no longer grows (OV of 0 is returned). 116 Next we increase this value in increments of 10 to observe how the OV changes. Results 117 for values from 0 to 100 can be seen in Table 1. We see the OV increases linearly until it 118 reaches the maximum (also the default growth) of 0.164692 when max oxygen is set to 119 40. If we set this value past 40, there is no added effect. We will explore why this is in a 120 later study.  sides resulting in a net of zero. For example, reaction 1, below, shows the conversion of 127 glucose to glucose-6-phosphate, a reaction with positive flux, from glycolysis. We can observe these changes to the reactions and corresponding pathways directly 141 in the KEGG pathway maps as seen in  Green represents a positive net flux, red is a negative net flux, and blue means the enzyme is present but the net flux is zero. The shade of the color represents the magnitude of the flux (a darker color represents a larger value).

147
These configuration changes can be considered as a "state change" to the system.

158
By approaching this software tool with a systems view, we have demonstrated that 159 not only can the ultimate objective change based on the parameters used (e.g. objective 160 value), but the internal biological state (the direction and magnitude of the internal 161 metabolic reactions) can change in significant ways as well. In the rest of this paper we 162 demonstrate via case studies how using a systems mindset allows the scientist to explore 163 and understand dependencies between application parameters and the final scientific 164 result. This improved understanding can build confidence and lead to better science.

165
Next we present some background on configurability to lay the groundwork for our 166 case studies. JavaScript (or not) are all options that can be customized. Current versions of Firefox 178 have more than 2,000 different options (e.g. parameters) a user can manipulate [16,17]. 179 Configurability benefits developers by allowing them to create software for a broad 180 audience through building on reusable components, of which the cost of development 181 can be amortized over the full software development lifecycle [18]. It also has been 182 demonstrated to lead to higher quality of individual components since these are tested 183 and used in multiple systems [18]. This type of development stands in contrast to 184 building many unique specialized programs, each with individual code, and only slight 185 variations in behaviors. From a user's perspective, this would require them to switch 186 between different software tools as their goals change.

187
A configuration of a software system is the set of choices, one for each of the system 188 parameters (also known as configuration options or settings). For instance, in Firefox a 189 configuration would be defined by the set of choices for all 2,000+ options. Most will 190 have a default value that is assigned when the application starts, hence the user rarely 191 changes more than a few of these on their own; in theory any combination of parameters 192 can be used together. Hence, the complete configuration space of a configurable system 193 is the set of all possible unique sets of parameter values (or the Cartesian product of the 194 options for the individual values). In the simplest case, where all parameter are binary 195 (turn the option on/off), the full configuration space would be 2 |p| where |p| represents 196 the number of parameters of the system. Returning to the Firefox example, assuming 197 2,000 parameters, this would be 2 2000 .

198
By design, changing parameters modifies how an application behaves. The possible 199 space of behavior of a configurable software system is usually infeasible to exhaustively 200 enumerate [11,16,20]. The range of system behaviors can lead to unexpected failures  [19].
might increase some data that is being cached. When the new data is cached it might 204 overflow the internal structures for caching which have now been reduced in size. There 205 has been a large body of research that has identified efficient ways to sample a 206 configurable software system's configuration space and discover a large portion of the 207 possible behavior [16,21,22]. We do not attempt to survey that literature here.

208
In this paper we focus on configurable software in the bioinformatics domain. We 209 consider a software parameter (herein shortened to parameter ) to be an option that can 210 be specified by the user and then used by the program or application when it is running. 211 As noted previously, users have highlighted difficulties understanding parameters, and 212 this has even led to retracted articles [12,13]. For a concrete biological example of a 213 parameter, consider the graphical user interface of NCBI's BLASTn implementation 214 ( Figure 3). This interface has more than 15 parameters for a user to adjust. In fact, 215 BLASTn actually has over 50 parameters that can be modified thus not all are exposed 216 in the NCBI GUI. Some of the parameters are in the form of a drop-down menu or a set 217 of options such as Max target sequences and gap costs. Others are check boxes or 218 binary parameters such as short queries and Mask lower case letters. Others are 219 open text fields such as Expect threshold which can be set to a numeric value. Any 220 parameter that is not explicitly defined will use the default value as determined by the 221 program. While many laboratories have their own custom set of parameter option they 222 regularly choose, there is no one best configuration that is best for all research 223 questions, hence, understanding ways to navigate and explore these, is an important 224 part of scientific discovery.

226
A common bioinformatics workflow, as shown in Figure 4, starts with read files output 227 from a DNA sequencer that are then assembled (1) into reconstructions of the  term functional objective is used here referring to the biological output of the methods 238 relative to the user's biological query.

240
The read file output from a DNA sequencer contains a set of short sequences of 241 base-pairs. Each of the short sequences is called a read, and are each an exact copy of a 242 short segment of the DNA in the original sample. Depending on the sequencing 243 technology and methods, each read ideally overlaps with many others. Assembly is the 244 computational method to align and recombine reads to reconstruct the original genome 245 sequence. When this is not fully accomplished due to sequencing error or lack of 246 sufficient coverage, and especially with metagenomic samples containing genomes of 247 many different species, the result is a set of assembled contigs, longer contiguous 248 sequences reflecting the underlying genomes with hopefully high fidelity. A common 249 functional objective of DNA assembly is to obtain a few long contigs, ideally a 250 complete one, per species. For metagenomics there can be a trade off between 251 completeness of individual species genomes and a more comprehensive representation of 252 most species in the sample [23,24]. Here we will focus on and optimize for the first. database of known sequences [25]. The unknown sequence is often referred to as the 257 query sequence. The alignment algorithm compares the query with all other sequences 258 in the database, and calculates the significance of each match. We investigate a common use case, searching for the most similar known sequences in order to infer the 260 likely evolutionary source or biological function of the unknown query sequence. The 261 matches or hits found in the database to the query sequence are returned with a variety 262 of quality scores that indicate how confident the alignment is with respect to each 263 possible match. Two common metrics are the percentage identity (how identical two 264 sequences are) and the expected value (or e-value) which is a measure of how likely the 265 hits would be found in the database by chance. The functional objective of DNA 266 alignment can vary by use case. We define one such objective that covers a range of use 267 cases as the "number of quality hits" where the exact definition of quality may vary by 268 use case, but typically involves a combination of several of the quality metrics such as 269 percentage ID and e-value. We propose a process SOMATA, a core methodology to analyze any complex software 293 systems and systematically break it down. SOMATA involves Selecting tools and data, 294 identifying Objective metrics, Modeling the parameter space, choosing a sample design 295 Approach, Testing, and Analyzing. SOMATA includes techniques from the field of 296 software testing as its foundation. The details of each step are as follows:  • The metric to evaluate is an effect or output of the tool. The metric should 304 be related to the user's objective of interest. This could be a functional 305 output such as a measurement associated with growth, as in our motivating 306 example, or a non-functional output such as runtime. Several metrics may be 307 selected depending on what is important in a particular use case. In our 308 example the objective metric was the OV (a measurement of growth).  • Once complete, observe the effect of changing the model parameters given by 331 Step 3, and conduct deeper analysis if desired. Given the results, to gain 332 further insight, it may be beneficial to return to a previous step and alter the 333 sections. For our example we examine the differences in OVs of growth and 334 the impact on various associated pathways. 335 We want to emphasize the highly customizable nature of this process. For instance, 336 in Step 4 there are several sampling design approaches that can be used. We can use 337 techniques from software testing and design of experiments, or simply use a random 338 method of investigation. In Step 5, simulation can replace experimentation. Finally, in 339 Step 6 we could add in additional methods of analysis such as tools from machine 340 learning. We follow this methodology in our Experimental Evaluation.

341
This process does not in itself contain any unprecedented elements individually.   We further split up Experiment 2 into two studies (Experiment 2a and Experiement 2b). 364 We followup our study by sharing several lessons learned during this work that can lead 365 to a set of best practices for users of highly configurable scientific software tools.

367
In Experiment 1 we leverage prior work which explored a set of four bioinformatics 368 tools [27]. That work demonstrated the effect of parameters on various objectives, and 369 identified software failures, but did not analyze the impact on the scientific outcome in 370 depth. In this work, we provide an alternative analysis of the data and focus on the 371 extent of the changes in the results, and whether they conform to scientific expectations. 372 We use an exemplar tool for each of the three bioinformatics applications described 373 in the Background. For one of these applications we use two different versions of a tool 374 resulting in four subjects in total. Criteria for choosing the tool subjects include: (1) 375 the tool is frequently used in the community, (2) has a set of modifiable parameters, (3) 376 can be run using automated scripts, and (4) includes a tutorial (or sample data) and 377 documentation. We explicitly use data from others to avoid biasing our findings, and community dynamics [28].

383
The exemplar DNA alignment subject is the Nucleotide BLAST Basic Local 384 Alignment Search Tool (BLASTn), a popular bioinformatics tool developed by the 385 National Center for Biotechnology Information (NCBI) [4,29]. We use the command line 386 implementation of nucleotide BLAST (BLASTn), referred to simply as BLAST from 387 here on [30]. The second subject is a DNA assembly tool. We chose the MEGAHIT 388 algorithm [31,32], and use a version implemented within KBase (version 2.2.8) [33].

389
The last two subjects perform Flux Balance Analysis (FBA). The first subject is the 390 graphical user interface (GUI) app, Run Flux Balance Analysis [34], from within   The methodology here follows the experimental approach previously outlined: SOMATA: 402 Select tool and data, choose Objective metric, Model parameter space, choose sample 403 design Approach, Test, and Analyze. Starting with step 1 we Select the tools to be 404 evaluated, we chose four subjects: BLAST, FBA-GUI, FBA-CLI, and MEGAHIT.

405
In step 2 we define a set of functional outputs as Objective metrics for each tool. 406 For BLAST we use the number of quality hits using a strict definition of 100% identity 407 value and and e-value of 0.0. This is a common measurement used to determine how 408 well the query has performed. For FBA-GUI and FBA-CLI we use an objective metric 409 that represents growth or biomass called the objective value. For MEGAHIT we selected 410 four different metrics. The first two metrics count the number of contigs after a filter is 411 applied. The metric gt1kbp is the number of contigs of length greater than one thousand 412 base pairs, and the metric gt10kbp is the count greater than 10 thousand base pairs.

413
Metric N50 is the N50 score provided by MEGAHIT which is a weighted median 414 statistic on the contig length. The last metric, Max-bp, is the length of the longest 415 contig found. Note that these metrics can be adjusted to suit multiple use-cases. In this 416 work we are simply restricting that choice.

417
Next in step 3, we define a Model of the parameter space for each subject tool by 418 selecting a sample of its parameters to test as seen in Table 3. For each tool we show 419 the number of parameters, and for each parameter we list the type (i.e. Boolean, 420 integer, float, or string) and state the number of choices used. For instance, we have two 421 choices for Boolean parameters, three for strings and five each for integers or floats. As 422 an example, if we examine BLAST we can see the model has three Boolean, three floats, 423 and one string parameter for a total of seven parameters. This results in a 424 combinatorial parameter space of 2 3 × 5 3 × 3 or 3, 000 different configurations. Please 425 refer to our supplementary data for the exact parameters tested.

426
In step 4 we design an Approach for sampling the parameter space. For BLAST, 427 MEGAHIT, FBA-GUI we exhaustively test all combinations of their configuration 428 options. For FBA-CLI the total configuration space is 1.28 × 10 16 which is too many to 429 exhaustively test. Thus we took a random sample of 125,000 configurations. Note that 430 the configuration space for MEGAHIT of size 260 has been reduced from the full space 431 of 625 due to constraints between parameters. Please refer to our supplementary data 432 for more information.

433
To measure the impact of changing parameters we measure several data points to 434 observe the number of configurations that change the output (for better or for worse). 435 We can also look at the entire configuration space given our input data and research 436 objective, and ask how hard it is to find a better, or even the best result compared to 437 the default value.

439
After executing the experiment (Test in step 5), Table 4 presents the results for our 440 Analysis of the effects in step 6. Each row represents one subject tool and objective 441 metric combination. Table 4a reports the number of configurations tested, the number 442 of unique output metrics, and a summary of the range of output metrics (default, best, 443 worst, and most common). This demonstrates the impact on the raw objective metric demonstrating the effect the configuration space has on the scientific output. Table 4b 445 breaks down the percentage variation to give an idea of the landscape of the outputs.

446
In Table 4a we can see the number of unique objective values is low compared with 447 the total number of configurations tested in most subjects, suggesting random sampling 448 may be very helpful. However in MEGAHIT, for the N50 and gt1kbp metrics, we see 449 about half of the configurations leading to unique objective values. This may suggest we 450 need to create larger samples to fully explore all potential behaviors of those objectives. 451 Looking at the range of the objective metrics (last four columns), we can see in some 452 cases there is a large increase in the result comparing the default output to the best 453 (BLAST, FBA-CLI, MEGAHIT gt1kbp). Other subjects show a smaller range. However 454 all subjects have a substantially lower worst metric, and all except BLAST have an 455 output of zero as their most common output metric.

456
In Table 4b we can see the landscape of the distribution of the scores varies greatly 457 on the subject and objective. For BLAST 72% of the configurations improve over the achieved. For MEGAHIT the odds of improving over the default range from 2.69% (for 464 N50 score) to 66.15% for the number of contigs greater than 1,000 base-pairs. Finding 465 the optimal value for any MEGAHIT metric is low across all subjects ranging from a 466 0.38% to 1.54%. We see that the number of configurations that achieve the same result 467 as the default ranges across subjects, as low as 0.38% for MEGAHIT (measured by 468 maximum base pairs) and as high as 18.43% for FBA-CLI. In all subjects more than 469 80% of the results are different from the default. 470  Table 4. Analysis of objective metric impacts.

Summary
These results show that the default configuration does not always provide the best objective value, and that instead of randomly choosing configurations, a more systematic method of investigation is needed. For example in all subjects except BLAST, if a user randomly chooses a configuration, most often the worst value (0) will be obtained. The variation in the value of the result also demonstrates the importance of reporting the exact configuration used in any documentation or publications; otherwise the results are not reproducible.
The ranges in the configuration distributions in MEGAHIT tell us several things. We can see that the odds of improving over the default N50 score is significantly less than the other use cases. This signals that MEGAHIT may be optimized for the N50 score over other use cases. However, if a user is more concerned with achieving more contigs with sizes greater than 1,000 base-pairs they may need to consider other configurations. Configurations are not one-size-fits-all. This demonstrates how essential it is for the user to understand their particular use case, have an effective metric that is representative of the objective value, and consider how different parameters may affect it. Motivated by the results of Experiment 1 demonstrating tool configurations can have a 473 large impact on the scientific conclusion, in this next study we ask how a user might use 474 an experimental approach to explore the parameter space and infer the internal 475 behavior of the system. We do so through two studies. In Experiment 2a, we focus on a 476 single parameter to explore in depth how this parameter affects an objective. In the 477 process we uncover some unexpected behavior. In Experiment 2b we explore the overlap 478 in functionality between multiple parameters. We noticed that some parameters can be 479 changed from either setting a single parameter within the GUI interface, or alternative 480 sources. We wanted to understand the interplay between these choices. Experiment 2a 481 demonstrates how a user can systematically probe a software tool to verify and learn values (from 0 to 100 in steps of 10) for max oxygen. In order to test the bounds of the 501 parameter, we also tested values larger than the maximum according to the 502 documentation (100), thus we also tested values of 101, 250, 500, and 1000. 503 We consider a use case where the user may want to identify what setting for the max 504 flux of oxygen results is the highest growth. However they do not want to oversupply 505 the levels of oxygen, so they want to identify the minimal value of the flux of oxygen to 506 achieve the highest OV.

508
After executing the experimental Test in step 5, Table 5 presents our results for all 509 three organisms for Analysis in step 6. E. coli and R. sphaeroides both showed similar 510 behavior. The growth for each starts at 0 when max oxygen is set to zero, then 511 increases to the default growth at a max oxygen value of 40 for E. coli and 50 for R. 512 sphaeroides. Any value higher results in no additional growth.

513
S. oneidensis also grows at a linear rate starting from 0 growth at a max oxygen 514 value of 0. However it does not reach the default growth at the maximum value of 100. 515 However, if we go beyond the documentation specified range, the default growth can be 516 achieved by setting max oxygen to 500. Surprised by this behavior we asked the 517 developers of this tool for further information where we learned that the documentation 518 was misleading and that 100 is not a hard maximum value.

Summary
Using a systematic approach we were able to verify the expected behavior that increasing the maximum allowed oxygen flux would allow for an increase in organism growth. This increases the user trust in the tool and verifies our understanding is in line with the tool's functionality. We were also able to identify the smallest value of this setting that results in the best scientific output (40 for E. coli, 50 R. sphaeroides, and 500 for S. oneidensis). This answers our use case question. Furthermore, we identified some unexpected behavior that was not in line with the documentation. Tools are not without bugs or mistakes in documentation and it is useful for users to be equipped with alternative methods to be able to identify any unexpected behaviors on their own data and use case, and clarify them with developers. As previously noted, we identified the flux limits can be altered in more than one 523 manner. In this experiment we further investigate how these different methods interplay. 524 One way to control flux is within the media input file via a field for max uptake of each 525 compound with units "mol/per CDW hr". As a second method, from the GUI the Analysed (step 6) and went back to add to the Model (step 3) and re-evaluate. Since 568 there was no change as the Max uptake of N was increased from 10 to 20 and the max 569 exchange flux was observed to be 1.5551, further exploration was performed by limiting 570 the max uptake of N at values between 1 and 2.

572
Results for exploring the MaxC parameter can be seen in Table 6. The rows upper bound, 573 max exchange flux, exchange flux, and OV are all output values of the FBA object.

574
Based on these results we can extrapolate about how the tool is working. We can see that the upper bound is consistently at a value of 5, this must mean the 576 upper bound is set by the media file. The field for max exchange flux however varies.

577
Under the default parameter we see it is identical to the upperbound. However when we 578 constraint the amount of MaxC this value varies. It is reduced in MaxC=10 and MaxC=20, 579 but at MaxC=30 we see it is again equal to the default of 5. When E.coli has at least 5 580 units of carbon, it will grow at is maximum amount (0.675). Furthermore, we can see  We repeat this process for nitrogen seen in Table 7. In this case we vary the nitrogen 585 supplying compound (NH3) initially at levels of 10 and 20. In follow-up exploration, we 586 also used levels of 1, 1.5, 1.5551, and 2. At 2 units and above, E.coli grows at is 587 maximum (default) amount. 588 We observe that the exchange flux for NH3 is limited to 1.55528 as this is the 589 amount of uptake for 2 through 20 units of nitrogen as well as the default case. Further, 590 E.coli grows at its maximum (default) amount for these amounts of NH3. If we set a 591 value slightly smaller than that as our MaxN (e.g. 1.5551) we see that it now grows at 592 just 0.000002 less than its maximum potential. Further, at MaxN=1 it grows at 70.74% 593 of its maximum amount while at MaxN=1.5 it grows at 99.74% of its maximal potential. 594

Summary
This experiment reverse engineered the effect of changing max flux parameters in FBA, and investigated how the application parameter interacts with the media options. We were able to extract the meaning of two of the output values that were unclear from documentation. We found that upper bound is the flux bound by the media and max exchange flux is bound by the application parameter. We found that the max exchange flux is correlated with which compounds in the media contain those nutrients, and that the minimum value between the two is used as the limiting value. We can also identify the minimal required exchange flux by looking at the actual exchange flux reported by the output.
This process shows how a user can systematically perturb a scientific software system in order to reverse engineer pieces of the underlying algorithm. It was initially unclear from documentation how the interaction between these options operated, and what the meaning of the different output fields meant. By using this experimental approach the user was able to discover the answers on their own. Similar experimental designs could be set up to ask different types of questions.

596
We have seen using three exemplar experiments that biologists can benefit from 597 approaching their computational tools the same way they approach their bench 598 experiments. Instead of using tools as magic boxes, we propose SOMATA, an 599 exploratory approach that leads to computational understanding and expertise. By 600 leveraging application parameters (which were developed for specific use cases), a 601 biologist can better understand the range of a tool's application and thereby optimize 602 the results for their use case. However, this is also a cautionary tale. Modifying the 603 configuration of a tool may change how the tool works in unintended ways, and doing so 604 in an ad-hoc manner can lead to results that are not reproducible or are potentially 605 sub-optimal. 606 We also demonstrated that some parameters, while well documented, provide 607 inaccurate information. While the tool authors document the valid values, these were in 608 fact incorrect limits for the tool. We discovered via experimentation, and confirmed 609 with the developers, that we can use larger values in the application, and that larger Accessible, Interoperable and Reusable) approach [36] not only with data collection but 613 also to using and reporting interactions and final settings with all tools, just as FAIR 614 has been proposed to be extended to research tools [37].

615
The FAIR initiative came about in recognition that science was being impeded by 616 the lack of reproducibility caused by lax data standards. In a similar way, the 617 advancement of research may also be hampered by ad hoc use of software tools that are 618 increasingly becoming an integral part of scientific investigation. For instance Shade and 619 Teal proposed a generic method for approaching workflows which could be considered a 620 high-level abstraction of SOMATA [38]. As noted in the original article on FAIR [36] 621 the principles should be applied equally to workflows and software tools, and data: "All 622 scholarly digital research objects [39] -from data to analytical pipelines-benefit from 623 application of these principles, since all components of the research process must be 624 available to ensure transparency, reusability and reproducibility." Therefore, it is 625 incumbent on researchers to have a working understanding of the tools used in the 626 context of their research, and to properly investigate and communicate their findings in 627 a manner adhering to the principles of the scientific method. 628 We now synthesize and discuss some of the takeaways (or key lessons learned) during 629 this study.  Configurations matter, and the creation, documentation, and experimentation 639 with them should be a community effort. We envision community resources where 640 information from both experimentalists and tool developers interact. Rather than 641 simply FAIR we suggest we also need to adopt a SHARE (Simple, Helpful, 642 Accessible, Readable with Explanations) principle. First, we need methods that 643 are easy to implement and understand so that biologists use them. Second, our 644 systems should provide more hints (or help) to aid users in understanding the 645 various configuration options. Third, the differences (or impact of) changing 646 application parameters should be annotated with readable explanations. None of 647 this can be achieved in isolation, hence we think this is a community effort. 648 2. Mapping Configurations to Behavior May not be Obvious. If we return 649 to the documented max target sequence parameter in BLAST (discussed in the 650 Introduction), we saw that a user could easily interpret the use of setting that 651 option to represent an output filter after the search returns some number of hits; 652 hence setting it to a small number could provide just the best results. But in fact, 653 this parameter is used internally to filter results early in the algorithm, hence it 654 may cause the search to perform sub-optimally. Why this parameter exists is out 655 of the scope of this paper, but it was likely added to improve performance (i.e. 656 make the search run faster). When that objective is required, then a less than 657 optimal search may be beneficial. Had the author of that paper used our approach 658 to explore parameter options they would have seen inconsistent results when 659 setting this parameter (in fact it would only 'by chance' return only a single hit). 660 The lesson here is that relying on simplified (or only) documentation could be 661 deceptive. 662 3. We Need Better Tools. While we propose the use of exploration in this paper, 663 we also see the need for next generation tools which provide explanations and 664 expose configuration behavior. As science continues to advance, computational 665 tools are increasingly vital to discovery. In order to maximize their benefit, we 666 should strive to ensure they perform in an explainable, transparent, and intuitive 667 way.

669
In biological research, as the variety and resolution of data grows to increasingly reflect 670 the complexity of biology, so too are the associated computational tools becoming more 671 comprehensive mirrors of the same complexity. In order to investigate the increasing 672 biological complexity, laboratory protocols are shifting not just through reductionism, 673 but also taking an increasingly more comprehensive systems approach. In a similar 674 approach here, we propose through SOMATA taking a systems view to bioinformatics 675 research and tool development that mimics how biologists explore organisms in the 676 laboratory. This provides a common basis for communication and interaction that will 677 help the research community more productively develop and use those tools, and 678 thereby advance their research efforts. 679 We have demonstrated that different results can be obtained using different 680 algorithm and tool parameters and that this is a potential minefield for the novice user. 681 Rather than limit exploration, we argue that the community of bioinformatics 682 researchers should provide open, explainable tools and that this information should be 683 shared and crowd sourced to provide a more repeatable environment. While the FAIR 684 principles are of value here, we see a need for a more comprehensive way to view 685 configurability in software tools.

686
A systems approach to biology implies the need for this same kind of interchange