Abstract
Quality control of MR images is essential for excluding problematic acquisitions and avoiding bias in subsequent image processing and analysis. However, the visual inspection of individual images is time-consuming and limited by both intra- and inter-rater variance. The difficulty of visual inspection scales with study size and with the heterogeneity of multi-site data. Here, we describe a tool for the automated assessment of Tl-weighted MR images of the brain – MRIQC. MRIQC calculates a set of quality measures from each image and uses them as features in a binary (include/exclude) classifier. The classifier was designed to ensure generalization to new samples acquired in different centers and using different scanning parameters from our training dataset. To achieve that goal, the classifier was trained on the Autism Brain Imaging Data Exchange (ABIDE) dataset (N=1102), acquired at 17 locations with heterogeneous scanning parameters. We selected random forests from a set of models and pre-processing options using nested cross-validation on the ABIDE dataset. We report a performance of ~89% accuracy of the best model evaluated with nested cross-validation. The best performing classifier was then evaluated on a held-out (unseen) dataset, unrelated to ABIDE and labeled by a different expert, yielding ~73% accuracy. The MRIQC software package and the trained classifier are released as an open source project, so that individual researchers and large consortia can readily curate their data regardless the size of their databases. Robust QC is crucial to identify early structured imaging artifacts in ongoing acquisition efforts, and helps detect individual substandard images that may bias downstream analyses.
Introduction
Image analysis can lead to erroneous conclusions when the original data are of low quality. MRI images are unlikely to be artifact-free, and assessing the quality of images produced by MR scanning systems has long been a challenging issue [1]. Traditionally, all images in the sample under analysis are visually inspected by one or more experts, and those showing an insufficient level of quality are excluded (some examples are given in Fig 1A). Visual assessment is time consuming and prone to variability due to inter-rater differences (see Fig 1B), as well as intra-rater differences arising from factors such as practice or fatigue. An additional concern is that some artifacts evade human detection entirely [2] for example those due to improper choice of acquisition parameters. Even though magnetic resonance (MR) systems undergo periodic inspections and service, some machine-related artifacts persist unnoticed due to lenient vendor quality checks, and drift from the system calibration settings. In our experience, automated Quality Control (QC) protocols help detect these issues early in the processing stream. The current trend of neuroimaging towards acquiring very large samples across multiple scanning sites [3–5] introduces additional concerns. These large scale imaging efforts render the visual inspection of every image infeasible and add the possibility of between-site variability. Therefore, there is a need for fully-automated, robust, and minimally biased QC protocols. These properties are difficult to achieve for three reasons: 1) the absence of a gold standard impedes the definition of sensitive quality metrics; 2) human experts introduce biases with their visual assessment; and 3) cross-study and inter-site acquisition differences also introduce uncharacterized variability.
Machine-specific artifacts have been traditionally tracked down using phantoms [6] in a quantitative manner. However, many forms of image degradation are participant-specific or arise from practical settings (see Fig 1, panel A). Woodard and Carley-Spencer [7] conducted one of the earliest evaluations of automated quality assessment on a large dataset of 1001 T1w images from 143 participants. They defined a set of 239 no-reference1 image-quality metrics (IQMs). The IQMs belonged to two families depending on whether they were derived from Natural Scene Statistics or quality indices defined by the JPEG consortium. The IQMs were calculated on image pairs with and without several synthetic distortions. In an analysis of variance, the IQM from both families reliably discriminated among undistorted images, noisy images, and images distorted by intensity non-uniformity (INU). Mortamet et al. [8] proposed two quality indices focused on detecting artifacts in the air region surrounding the head, and analyzing the goodness-of-fit of a model for the background noise in that air area. One principle underlying their proposal is that most of the artifact signal propagates over the image and into the background. They applied these two IQMs in 749 T1w scans from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database. Different cutoff thresholds were defined for the two IQMs and compared to a binary (high/low quality) classification performed by a human rater, concluding that more specific research was required to determine these thresholds and generalize them to different datasets. They achieved an 85% accuracy in an intra-site validation approach. However, many potential sources of uncontrolled variability exist between studies and sites, including magnetic resonance imaging (MRI) protocols (scanner manufacturer, MR sequence parameters, etc.), scanning settings, participant instructions, inclusion criteria, etc. For these reasons, the thresholds they proposed on their IQMs are unlikely to generalize beyond the ADNI database. Recently, Pizarro et al. [9] proposed the use of a support-vector machine classifier (SVC) trained on 1457 structural MRI images acquired in one site with constant scanning parameters. They proposed three volumetric features and three features targeting particular artifacts. The volumetric features were the normalized histogram, the tissue-wise histogram and the ratio of the modes of gray matter (GM) and white matter (WM). The artifacts addressed were the eye motion spillover in the anterior-to-posterior phase-encoding direction, the head-motion spillover over the nasio-cerebellum axis (which they call ringing artifact) and the so-called wrap-around (which they refer to as aliasing artifact). They reported a prediction accuracy around 80%, assessed using 10-fold cross-validation. Some other recent efforts to develop IQMs appropriate for MRI include the Quality Assessment Protocol2 (QAP) under the preprocessed-connectomes project (PCP), and the UK Biobank [10].
The hypothesis behind this study is that we can predict the quality ratings of an expert on previously unseen datasets (with dataset-specific scanning parameters) in a supervised learning approach that uses features derived from a broad selection of IQMs. To demonstrate that the trained classifier correctly predicts the quality of new data, we used two unrelated databases to configure the training and held-out (test) datasets [11]. We first select the best performing model on the training dataset using a grid strategy in a nested cross-validation setup. We use the ABIDE database [4] for the training set because data are acquired in 17 different scanning sites with varying acquisition parameters (Table 1). These data show great variability in terms of imaging settings and parameters, what represents the heterogeneity of real world data. The best performing classifier is then trained in the full ABIDE dataset, and tested in the held-out dataset [12] to assess whether the performance on unseen data falls within the range predicted by the nested cross-validation.
The contributions of this work are summarized as follows. First, we release a software tool called MRIQC (described in The MRIQC tool) to extract of a number of IQMs (Extracting the Image Quality Metrics) that characterize each input image. Second, MRIQC includes a visual reporting system (described in the Visual reports section) to ease the manual investigation of potential quality issues. These visual reports allow researchers to quickly evaluate the cases flagged by the MRIQC classifier or visually identify potential images to be flagged by looking at the group distributions of IQMs. Finally, we report the results from a pre-registered analysis of this study (https://osf.io/haf97/) on the feasibility of automatic quality control labeling (sections Supervised classification and Results).
Materials and Methods
Training and test datasets
A total of 1375 T1w scans are used as training (1102 from ABIDE) and test (273 from ds030) samples. These databases were intentionally selected for their heterogeneity to match the purpose of the study. A brief summary illustrating the diversity of acquisition parameters is presented in Table 1, and a full-detail table in Table SI1.
Labeling protocol
The labeling process is aided by surface reconstruction, using the so-called white (WM-GM interface) and the pial (delineating the outer interface of the cortex) surfaces as visual cues for the rater. We utilize FreeSurfer [13] to reconstruct the surfaces. FreeSurfer has been recently proposed as a visual aid tool to assess T1w images [14]. For run-time considerations, and to avoid circular evaluations of FreeSurfer, this tool is not used in the MRIQC workflow (see The MRIQC tool section).
The following protocol was used for the manual assessment of T1w images: 1) The 3D cortical surfaces were reconstructed using FreeSurfer 5.3.0. 2) An animated GIF (graphics interchange format) file was generated from the coronal slices of the 3D volume, including the projection of the 3D cortical surfaces in each slice3. Each animation had a duration of around 20s. 3) A trained expert inspected the animation several times (generally, three times), and assigned a quality level (“exclude”/“doubtful”/“accept”).
During the visualization, the rater assessed the overall quality of the image. The white and pial contours were used as evaluation surrogates, given that “exclude” images usually exhibit imperfections and inaccuracies on these surfaces. When the expert found general quality issues or the reconstructed surfaces revealed more specific artifacts, the “exclude” label was assigned and the rater noted a brief description, for example: “low signal-to-nose ratio (SNR)”, “poor image contrast”, “ringing artifacts”, “head motion”, etc.). The images in ds030 were randomized before rating.
Software instruments and calculation of the IQMs
The MRIQC tool
MRIQC is an open-source project, developed under the following software engineering principles. 1) Modularity and integrability: MRIQC implements a nipype [15] workflow (see Fig 2) to integrate modular sub-workflows that rely upon third party software toolboxes such as FSL [16], ANTs [17] and AFNI [18]. 2) Minimal preprocessing: the workflow described before should be as minimal as possible to estimate the IQMs on the original data or their minimally processed derivatives. 3) Interoperability and standards: MRIQC follows the the brain imaging data structure (BIDS, [19]), and it adopts the BIDS-App [20] standard. An example of the ease of running MRIQC is presented in Listing SI1. 4) Reliability and robustness: the software undergoes frequent vetting sprints by testing its robustness against data variability (acquisition parameters, physiological differences, etc.) using images from the OpenfMRI resource. Reliability is checked and maintained with the use of a continuous integration service.
Extracting the Image Quality Metrics
The final steps of the MRIQC’s workflow compute the different IQMs, and a summary JSON file per subject is generated. The IQMs can be grouped in four broad categories (see Table 2), providing a vector of 56 features per anatomical image. Some measures characterize the impact of noise and/or evaluate the fitness of a noise model. A second family of measures use information theory and prescribed masks to evaluate the spatial distribution of information. A third family of measures look for the presence and impact of particular artifacts. Specifically, the INU artifact, and the signal leakage due to rapid motion (e.g. eyes motion or blood vessel pulsation) are identified. Finally, some measures that do not fit within the previous categories characterize the statistical properties of tissue distributions, volume overlap of tissues with respect to the volumes projected from MNI space, the sharpness/blurriness of the images, etc. The ABIDE and ds030 datasets were processed with MRIQC-υ.0.9.0-rc2 using the Lonestar5 supercomputer at the Texas Advanced Computing Center, University of Texas, TX, USA.
Visual reports
In order to ease the screening process of individual images, MRIQC generates individual reports with mosaic views of a number of cutting planes and supporting information (for example, segmentation contours). The most straightforward use-case is the visualization of those images flagged as low-quality by the classifier.
After the extraction of IQMs in all the images of our sample, a group report is generated (Fig 3). The group report shows a scatter plot for each of the IQMs, so it is particularly easy to identify the cases that are outliers for each metric. The plots are interactive, such that clicking on any particular sample opens the corresponding individual report of that case. Examples of group and individual reports for the ABIDE dataset are available online at mriqc.org.
Supervised classification
Our supervised learning approach to predicting the binary ratings of a human expert is structured in two steps. First, we perform a preliminary model selection and evaluation using repeated (x1000) and nested cross-validation, on the ABIDE dataset (see Step 1: Tested models and selection). Then, a second optimization in a refined grid of hyper-parameters for the model selected previously is performed with a single-loop cross-validation on the ABIDE dataset. The best performing model of this second cross-validation step is evaluated using the held-out dataset (see Step 2: Validation on the held-out dataset). The cross-validation workflows are built upon scikit-learn [26] and run using the Stampede supercomputer at the Texas Advanced Computing Center, University of Texas, TX, USA.
Step 1: Tested models and selection
Based on the number of features (56) and training data available (~1100 data points), we compare two families of classifiers: SVCs and random forests classifiers (RFCs). Given the diversity of scanning sites, in the model selection loop we also investigate the need for normalizing (zscoring) features. In the following, models including a preliminary zscoring will show the suffix “−zs” while those using the original features without such transformation are noted with the suffix “−nzs”.
The support-vector machine classifier (SVC)
A support-vector machine [27] finds a hyperplane in the high-dimensional space of the features that robustly classifies the data. The SVC then uses the hyperplane to decide the class that is assigned to new samples in the space of features. Two hyper-parameters define the support-vector machine algorithm: a kernel function that defines the similarity between data points to ultimately compute a distance to the hyperplane, and a regularization weight C. In particular, we analyzed here the linear SVC implementation (as of now, “SVC-lin”) and the one based on radial basis functions (denoted by “SVC-rbf”). During model selection, we evaluated the regularization weight C and the γ parameter (kernel width) of the SVC-rbf.
The random forests classifier (RFC)
Random forests [28] are an nonparametric ensemble learning method that builds multiple decision trees. Then, a RFC assigns to each new sample the mode of the predicted classes of all decision trees in the ensemble. In this case, random forests are driven by a larger number of hyper-parameters. Particularly, in this work we analyze the maximum tree-depth, the minimum number of samples per split and the total number of decision trees.
Objective function
The performance of each given model and parameter selection can be quantified with different metrics. Given the imbalance of positive and negative cases –with lower prevalence of “reject” samples–, we select the area under the curve (AUC) of the receiver-operator characteristic as objective score. We also report the classification accuracy as an additional performance measure.
Cross-validation and nested cross-validation
Cross-validation is a model selection and validation technique robust to inhomogeneities [29]. We use nested cross-validation, which divides the process in two validation loops: an inner loop for selecting the best model and hyper-parameters, and an outer loop for evaluation. In cross-validation, the data are split into a number of folds, each containing a training and a test set. For each fold, the classifier is trained on the first set and evaluated on the latter. When cross-validation is nested, the training set is split again into folds within the inner loop, and training/evaluation are performed to optimize the model parameters. Only the best performing model of the inner loop is then cross-validated in the outer loop. In order to increase the robustness against model variability, we repeat the nested cross-validation procedure 1000 times.
Data split scheme
Since we wanted to estimate the performance in datasets acquired at sites and with parameters different from those in the ABIDE dataset, we selected a leave-one-site-out (LoSo) partition strategy for the outer loop of cross-validation. The LoSo split leaves a whole site as a test set at each cross-validation iteration. Therefore, no knowledge of the testing set is leaked into the training set (the remaining N — 1 sites). For the inner loop (model selection) we compared the performance of a stratified 10-fold and LoSo over the remaining 16 sites (one site is held out by the outer loop). All the possible combinations of models and their hyper-parameters (over 5000) are evaluated repeatedly (1000 times) in a grid search for the best average AUC score in the inner cross-validation loop.
Feature ranking
One tool to improve the interpretability of the RFC is the calculation of feature rankings [28] by means of variable importance or Gini importance. Since we use scikit-learn, the implementation uses Gini importance, defined for a single tree as the total decrease in node impurity weighted by the probability of reaching that node. We finally report the median feature importance over all trees of the ensemble.
Step 2: Validation on the held-out dataset
In the second step, we cross-validated the model selected in step 1, optimizing a grid search refined to the selection of parameters done before. For this second cross-validation, we use the LoSo split strategy given the results obtained in the previous step. The best performing model is then trained on the full ABIDE dataset and the resulting classifier is used in the prediction of quality ratings of the held-out dataset (ds030).
Results
All images included in the selected datasets were processed with MRIQC. After extraction of the IQMs from the ABIDE, a total of 1102 images had both quality ratings and quality features (ten T1w images of the ABIDE are missing in the database). In the case of ds030, 265 images had the necessary quality ratings and features (eight images were not rated and/or failed during feature extraction).
Model selection
The results of the step 1 (nested cross-validation) are summarized in Fig 4. The best performing model, regardless of inner loop split strategy, was the random forests classifier without zscoring (RFC-nzs). The RFC-nzs using LoSo in the inner loop yielded the following averaged scores off all repeated outer loops: AUC=0.862 (σ=±0.121) and accuracy of 89.4% (σ=±9.95%). The corresponding averaged scores for the 10-fold strategy were: AUC=0.848 (σ=±0.135) and accuracy of 88.6% (σ=±11.5%). These results indicated that there is no practical difference between the two split strategies as regards model selection through cross-validation on this dataset. Therefore, since the averaged scores using LoSo cross-validation in the inner loop were slightly higher, it was selected as split strategy for the cross-validation in step 2. Note that the split strategy is not a model feature, and thus this decision can be made based on the results of the outer loop of nested cross-validation, as opposed to the model selection that is done based on the inner cross-validation loops.
The best performing model and parameters selected as the maximum average of the AUC score in the inner loop, across all repetitions of the nested cross-validation was the RFC-nzs, with 50 trees (n_estimators), maximum tree depth (max_depth) of 20, and a; minimum of 2 samples per split (min_samples_split). However, the cross-validation was very variable in the selection of hyper-parameters, indicating that there was little difference in performance for all the points in the hyper-parameters grid.
Evaluation on held-out data
In the second cross-validation step, only the previously selected RFC-nzs model was optimized, in a refined grid centered around the best performing parameters of step 1 (n_estimators=50, max_depth=20, min_samples_split=2). The AUC on the evaluation set was 0.695 and the accuracy 72.83%. We also analyzed the relevance of each feature in the overall forest decision (Fig 5). The most relevant features are the coefficient of joint variation (CJV) and the SNR measured on the WM tissue mask.
Discussion
We propose a quantitative approach to quality control T1w MRI of the brain, enabling the automatic identification of sub-standard acquisitions. Quality control protocols are implemented to exclude faulty datasets that can bias the final results. Human brain images can be degraded by various sources of artifacts, related to the scanning device and settings or due to the participants themselves. Machine-derived artifacts are efficiently mitigated in a quantitative manner with calibration. However, due to the lack of reliable quality quantification tools, subject-specific artifacts and drifts from the service settings are assessed visually. The visual inspection of every MRI acquisition of the brain is a time-consuming and bias-prone task that would be ideally replaced by decision algorithms. Automating the QC process is particularly necessary for ongoing studies such as the UK Biobank that will collect data from tens of thousands of individuals.
Previous efforts [7, 30] in the quantification of image quality for their assessment included the definition of image-quality metrics (IQMs). However, the approach was unfeasible for a total automation due to the limited sensitivity of the available IQMs to the most prevalent artifacts. Subsequent efforts [8] were focused on specific samples, setting generalization to new datasets as a future line of their work. Pizarro et al. [9] recently presented a similar approach to quality control images. They obtain a cross-validated accuracy of ~80% for their support-vector machine classifier (SVC), in a single-site sample with homogeneous acquisition parameters.
In this work, we train a random forests classifier and evaluate its performance to predict the quality assessment of human raters on completely novel samples. We show that linear SVCs do not perform well on heterogeneous samples with diversity of acquisition parameters, and they always require normalization of features derived from multi-site data. Our results invariably indicated a better performance of a random forests classifier (RFC), with and without normalization of features. Particularly, the best performing model (RFC-nzs, for “not zscored”) achieved a ~89.4% (σ=±9.95%) accuracy. This improved performance over the one reported by Pizarro et al. may also be related to the selection of classification features proposed in this paper. Even though they reported that classification improved with the addition of features addressing certain artifacts, in our feature importance analysis the first IQM addressing a specific artifact was ranked in 9th position. This result suggests that there are complex relationships between the features (in multi-site studies) that may not be captured by SVCs. When tested on unseen data, the RFC-nzs classifier yielded an area under the curve (AUC) score of ~0.695 and accuracy of 73%. This performance falls within the performance previously evaluated with nested cross-validation. We could not compare these results with [9] since they did not test their resulting classifier on a held-out dataset. The performance drop between the nested cross-validated score (~89%) and the score obtained on the held-out data (~73%) may be explained by the interplay of several factors. First, we introduced an unplanned inter-rater bias since the held-out dataset could not be rated by the same expert who rated the ABIDE dataset. This limitation could be reduced by calibrating the ratings of the held-out data having the second expert rate a random subsample of the training dataset. Second, the share of scanning vendors, models and corresponding images in the ABIDE dataset is not uniform. The use of a more uniform training dataset could potentially help generalize better to new datasets.
We used nested cross-validation to select the most predictive classifier, ensuring that the evaluation loop was unbiased using a leave-one-site-out (LoSo) splitting strategy. In this cross-validation scheme, the accuracy is bound below by that measured during the test validation loop. Therefore, the final classifier is ultimately trained using all the available data to push its predictive accuracy above the evaluated performance.
This quantitative assessment of quality is the central piece of the three-fold contribution of this paper. The first outcome of this study is the MRIQC toolbox, a set of open-source tools which compute quality features. Second, MRIQC generates interactive visual reports that allow further interpretation of the decisions made by the classifier. Finally we propose the automated quality control tool described before to generate include/exclude decisions. The MRIQC toolbox is a fork of the Quality Assessment Protocol (QAP). Since MRIQC was started as a standalone project, the implementation of most of the IQMs have been revised, and some are supported with unit tests. As QAP, MRIQC also implements a functional MRI (fMRI) workflow to extract IQMs and generate their corresponding visual reports. Some new IQMs have been added (for instance, the CJV, those features measuring the INU artifacts, or the rPVEs). The group and individual visual reports for structural and functional data are also new contributions to MRIQC with respect to the fork from QAP. The last diverging feature of MRIQC with respect to QAP is the cross-validation work and the release of the trained classifier.
MRIQC is one effort to standardize methodologies that make data-driven and objective QC decisions. Automated QC can provide unbiased exclusion criteria for neuroimaging studies, helping avoid “cherry-picking” of data. A second potential application is the use of automated QC predictions as data descriptors to support the recently born “data papers” track of many journals and public databases like OpenfMRI [31]. The ultimate goal of the proposed classifier is its inclusion in automatic QC workflows, before image processing and analysis. Ideally, minimizing the run time of MRIQC, the extraction and classification process could be streamlined in the acquisition, allowing for the immediate repetition of ruled out scans. Integrating MRIQC in our research workflow allowed us to adjust reconstruction methodologies, tweak the instructions given to the participant during scanning, and minimize the time required to visually assess one image with the visual reports.
Conclusion
We propose MRIQC, a quality control software tool to assess structural MRI of the human brain. MRIQC generates visual reports to speed the screening process, and a set of features which were used to train an automated decision tool. We trained a random forests classifier on the ABIDE dataset (N=1102), acquired at 17 scanning sites with diverse acquisition parameters. We utilized repeated-and-nested cross-validation, with a leave-one-site-out splitting strategy. This avoided hidden feature relationships leaking from the site under test to the training set, ensuring that the evaluated performance was agnostic to site and ultimately represented well the generalization of performance to unseen data. The nested cross-validation evaluation yielded a ~89.4% (σ=±9.95%) accuracy. We double checked this generalization evaluating the performance of the classifier in a previously unseen dataset (N=265) unrelated to ABIDE. The performance on the held-out dataset was ~73% accuracy. This performance fell within the spread of the cross-validated evaluation. We release MRIQC open-source, along with the best performing classifier. The automatic QC of MRI scans, and the implementation of tools to assist the visual assessment of individual images are two tools in high demand for neuroimaging research.
Author contributions
OE lead the development of MRIQC, implemented the cross-validation workflow, pre-registered the report, drafted the manuscript, run the experiments and interpreted the results. KJG devised the machine learning approach to quality control, coordinated the project, contributed to MRIQC and the cross-validation workflow, pre-registered the report, and interpreted the results. MS rated the ABIDE dataset, helped understanding the problems of inter- and intra- rater variabilities. DB rated the ds030 dataset. OOK contributed in the design of the cross-validation workflow and interpreted the results. RAP devised and coordinated the project, advised in all aspects of MRIQC, the cross-validation workflow and the manuscript design, pre-registered the report and interpreted the results. All the authors have read and edited the manuscript.
Availability of MRIQC and the trained classifier
MRIQC is available under the BSD 3-clause license. Source code is publicly accessible through GitHub (https://github.com/poldracklab/mriqc). We provide four different installation options: 1) using the source code downloaded from the GitHub repository; 2) using the PyPi distribution system of Python; 3) using the poldracklab/mriqc Docker image; or 4) using BIDS-Apps [20]. For detailed information on installation and the user guide, please access http://mriqc.rtfd.io. A distributable version of the classifier is also released, trained on all the available data (including the full-ABIDE and the ds030 datasets).
Supporting Information
Table SI1 Image acquisition parameters
A table containing all the acquisition parameters is maintained in GitHub: https://github.com/oesteban/mriqc/blob/c9bdfa863ca47894d5cdcb605071a5088840afcc/mriqc/data/csv/scan_parameters.tsv.
Listing SI1 Running MRIQC
The BIDS standard makes MRIQC compatible with almost any input dataset without need for custom settings. Since all the metadata associated to the dataset are found in bids-data/, the following example would nicely run without further settings. The second positional argument, out/ indicates where the outputs will be written, and finally, the participant keyword instructs MRIQC to run the first level analysis as specified in BIDS Apps.
mriqc bids-data/ out/ participant
mriqc bids-data/ out/ participant --participant_label S001 S002
Listing SI2 Running MRIQC – Group Level
If the participant level was run setting some --participant_label, the group level is not triggered by default. It can be done manually, pointing the input data folder to the derivatives folder generated with the participant level analysis:
mriqc out/derivatives/ out/ group
Listing SI2 Predicting quality
Although the group runlevel will generate a CSV table with the quality label predicted for each sample, it is possible to run the classifier individually:
mriqc_clf --load - classifier -X aMRIQC.csv -o mypredictions.csv
The default classifier can be replaced by a custom one using:
mriqc_clf --load - classifier my_custom_classifier. pk1z -X aMRIQC.csv -o mypredi ctions.csv
The documentation website contains more detailed information on how to train custom classifiers, or generate refined results from prediction: http://mriqc.readthedocs.io/en/latest/classifier.html.
Figure SI1 Extended caption of Fig 1A
An example scan (top) is shown with severe motion artifacts. The reduced contrast between tissues and the ringing intensity waves in the anterior region of the brain in the presented slice suggest a large head movement occurred during acquisition. The green arrows point to signal spillover due to eye movements through the phase-encoding axis (in this case, right-to-left –RL–). Oftentimes, the RL or LR axes are selected for phase-encoding because the signal leakage from the eyeballs does not overlap with brain tissue, as opposed to selecting anterior-posterior directions. However, the red arrows point to signal spillover caused by vessel pulsations. Given the location of the vessel, in this case signal leakage overlaps brain tissue and affects the quality of this image. The phase-encoding axis has less bandwidth and thus, is more sensitive to movement. For that reason, it is generally selected to have the shortest field of view. A second example scan (bottom) shows severe coil artifacts.
Acknowledgments
This work was supported by the Laura and John Arnold Foundation. The authors want to thank the QAP developers (C. Craddock, S. Giavasis, D. Clark, Z. Shezhad, and J. Pellman) for the initial base of code which MRIQC was forked from, W. Triplett and CA. Moodie for their initial contributions with bugfixes and documentation, and J. Varada for his contributions on th source code. JM. Shine and PG. Bissett reviewed the first draft of this manuscript, and helped debug early versions of MRIQC. S. Bhogawar, J. Durnez, I. Eisenberg and JB. Wexler routinely use and help debug the tool.
Footnotes
↵* phd{at}oscaresteban.es
↵1 A measure is called “no-reference” when no ground-truth of the same image without degradation is available.
↵2 Available online: http://preprocessed-connectomes-project.org/quality-assessment-protocol/.
↵3 We distribute with MRIQC the script fs2gif which produces such animations. The animations used to evaluate the ds030 dataset are found here https://drive.google.eom/drive/u/1/folders/0BxI12kyv2olZTDhiUVVMc2FyRDg.