Abstract
With the advance of next-generation sequencing technologies, non-invasive prenatal testing (NIPT) has been developed and employed in fetal aneuploidy screening on 13-/18-/21-trisomies through detecting cell-free fetal DNA (cffDNA) in maternal blood. Although Z test is widely used in NIPT nowadays, there is still necessity to improve its accuracy for removing a) false negatives and false positives, and b) the ratio of unclassified data, so as to reduce the potential harm to patients caused by these inaccuracies as well as the induced cost of retests.
Employing multiple Z tests with machine-learning algorithm could provide a better prediction on NIPT data. Combining the multiple Z values with indexes of clinical signs and quality control, features were collected from the known samples and scaled for model training in support vector machine (SVM) discrimination. The trained model was applied to predict the unknown samples, which showed significant improvement. In 4752 qualified NIPT data, our method reached 100% accuracies on all three chromosomes, including 151 data that were grouped as unclassified by one-Z-value based method. Moreover, four false positives and four false negatives were corrected by using this machine-learning model.
To our knowledge, this is the first study to employ support vector machine in NIPT data analysis. It is expected to replace the current one-Z-value based NIPT analysis in clinical use.
List of Abbreviations
- NGS
- Next-Generation Sequencing
- NIPT
- Non-Invasive Prenatal Testing
- SVM
- Support Vector Machine
- RBF
- Radical-Based Function
- CFDA
- Chinese Food and Drug Administration
- DNA
- Deoxyribonucleic Acid
- LDA
- Linear Discriminant Analysis
- QDA
- Quadratic Discriminant Analysis
- QC
- Quality Control