Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
New Results

Atria: An Ultra-fast and Accurate Trimmer for Adapter and Quality Trimming

View ORCID ProfileJiacheng Chuan, Aiguo Zhou, Lawrence Richard Hale, Miao He, Xiang Li
doi: https://doi.org/10.1101/2021.09.07.459340
Jiacheng Chuan
1Canadian Food Inspection Agency, Charlottetown, PE C1A5T1, Canada
2Department of Biology, University of Prince Edward Island, Charlottetown, PE C1A4P3, Canada
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Jiacheng Chuan
Aiguo Zhou
1Canadian Food Inspection Agency, Charlottetown, PE C1A5T1, Canada
3Guangdong Laboratory for Lingnan Modern Agriculture, South China Agricultural University, Guangzhou 510642, China
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Lawrence Richard Hale
2Department of Biology, University of Prince Edward Island, Charlottetown, PE C1A4P3, Canada
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Miao He
4School of Life Sciences, Sun Yat-sen University, Guangzhou, Guangdong, 510275, China
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Xiang Li
1Canadian Food Inspection Agency, Charlottetown, PE C1A5T1, Canada
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • For correspondence: sean.li@inspection.gc.ca
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Supplementary material
  • Preview PDF
Loading

Abstract

Background As Next Generation Sequencing takes a dominant role in terms of output capacity and sequence length, adapters attached to the reads and low-quality bases hinder the performance of downstream analysis directly and implicitly, such as producing false-positive single nucleotide polymorphisms (SNP), and generating fragmented assemblies. A fast trimming algorithm is in demand to remove adapters precisely, especially in read tails with relatively low quality.

Findings We present a trimming program named Atria. Atria matches the adapters in paired reads and finds possible overlapped regions with a super-fast and carefully designed byte-based matching algorithm (O(n) time with O(1) space). Atria also implements multi-threading in both sequence processing and file compression and supports single-end reads.

Conclusions Atria performs favorably in various trimming and runtime benchmarks of both simulated and real data with other cutting-edge trimmers. We also provide an ultra-fast and lightweight byte-based matching algorithm. The algorithm can be used in a broad range of short-sequence matching applications, such as primer search and seed scanning before alignment.

Availability & Implementation The Atria executables, source code, and benchmark scripts are available at https://github.com/cihga39871/Atria under the MIT license.

Competing Interest Statement

The authors have declared no competing interest.

Footnotes

  • E-mail: jiacheng.chuan{at}inspection.gc.ca (Chuan J)

    aiguozhou{at}scau.edu.cn (Zhou A)

    lhale{at}upei.ca (Hale L)

    lsshem{at}mail.sysu.edu.cn (He M)

    sean.li{at}inspection.gc.ca (Li X)

  • Research Area: Software and Workflows

  • Abbreviations

    CPU
    Central processing unit
    DNA
    Deoxyribonucleic acid
    GB
    Gigabyte
    MCC
    Matthew’s correlation coefficient
    NGS
    Next-generation sequencing
    PPV
    Positive predictive value
    RAM
    Random-access memory
    RNA
    Ribonucleic acid
    SNP
    Single nucleotide polymorphism
    SSD
    Solid-state drive
    TB
    Terabyte
    UInt
    Unsigned integer
    UInt64
    Unsigned 64-bit integer
    WGS
    Whole-genome sequencing.
  • Copyright 
    The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. All rights reserved. No reuse allowed without permission.
    Back to top
    PreviousNext
    Posted September 09, 2021.
    Download PDF

    Supplementary Material

    Email

    Thank you for your interest in spreading the word about bioRxiv.

    NOTE: Your email address is requested solely to identify you as the sender of this article.

    Enter multiple addresses on separate lines or separate them with commas.
    Atria: An Ultra-fast and Accurate Trimmer for Adapter and Quality Trimming
    (Your Name) has forwarded a page to you from bioRxiv
    (Your Name) thought you would like to see this page from the bioRxiv website.
    CAPTCHA
    This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
    Share
    Atria: An Ultra-fast and Accurate Trimmer for Adapter and Quality Trimming
    Jiacheng Chuan, Aiguo Zhou, Lawrence Richard Hale, Miao He, Xiang Li
    bioRxiv 2021.09.07.459340; doi: https://doi.org/10.1101/2021.09.07.459340
    Reddit logo Twitter logo Facebook logo LinkedIn logo Mendeley logo
    Citation Tools
    Atria: An Ultra-fast and Accurate Trimmer for Adapter and Quality Trimming
    Jiacheng Chuan, Aiguo Zhou, Lawrence Richard Hale, Miao He, Xiang Li
    bioRxiv 2021.09.07.459340; doi: https://doi.org/10.1101/2021.09.07.459340

    Citation Manager Formats

    • BibTeX
    • Bookends
    • EasyBib
    • EndNote (tagged)
    • EndNote 8 (xml)
    • Medlars
    • Mendeley
    • Papers
    • RefWorks Tagged
    • Ref Manager
    • RIS
    • Zotero
    • Tweet Widget
    • Facebook Like
    • Google Plus One

    Subject Area

    • Bioinformatics
    Subject Areas
    All Articles
    • Animal Behavior and Cognition (4227)
    • Biochemistry (9107)
    • Bioengineering (6751)
    • Bioinformatics (23944)
    • Biophysics (12088)
    • Cancer Biology (9493)
    • Cell Biology (13739)
    • Clinical Trials (138)
    • Developmental Biology (7616)
    • Ecology (11661)
    • Epidemiology (2066)
    • Evolutionary Biology (15479)
    • Genetics (10616)
    • Genomics (14296)
    • Immunology (9462)
    • Microbiology (22792)
    • Molecular Biology (9078)
    • Neuroscience (48884)
    • Paleontology (355)
    • Pathology (1479)
    • Pharmacology and Toxicology (2565)
    • Physiology (3823)
    • Plant Biology (8308)
    • Scientific Communication and Education (1467)
    • Synthetic Biology (2290)
    • Systems Biology (6171)
    • Zoology (1297)