Abstract
European hazelnut (Corylus avellana L.) is a tree crop of economic importance worldwide, but especially to northern Turkey, where the majority of production takes place. Hazelnut production is currently challenged by environmental stresses such as a recent outbreak of severe powdery mildew disease; furthermore, allergy to hazelnuts is an increasing health concern in some regions.
In order to provide a foundation for utilizing the available hazelnut genetic resources for crop improvement, we produced the first fully assembled genome sequence and annotation for a hazelnut species, from Corylus avellana cv. ‘Tombul’, one of the most important Turkish varieties. A hybrid sequencing strategy combining short reads, long reads and proximity ligation methods enabled us to resolve heterozygous regions and produce a high-quality 370 Mb assembly that agrees closely with cytogenetic studies and genetic maps of the 11 C. avellana chromosomes, and covers 97.8% of the estimated genome size. The genome includes 28,409 high-confidence protein-coding genes, over 20,000 of which were functionally annotated based on homology to known plant proteins. We focused particularly on gene families encoding hazelnut allergens, and the MLO proteins that are an important susceptibility factor for powdery mildew. The complete assembly enabled us to differentiate between members of these families and identify novel homologs that may be important in mildew disease and hazelnut allergy. These findings provide examples of how the genome can be used to guide research and develop effective strategies for crop improvement in C. avellana.