First Resolution of the Complete Telomere-to-Telomere Genome of Lettuce

First Resolution of the Complete Telomere-to-Telomere Genome of Lettuce

Lettuce (Lactuca sativa L.) is an annual plant belonging to the genus Lactuca in the Asteraceae family. It is often used as fresh-cut vegetables and is also one of the most popular ingredients in salads. In 2021, the global total output value of lettuce reached 16.6 billion US dollars, with China, the United States and Western Europe as the main producers. However, the long-term domestication of cultivated lettuce has led to its narrow genetic diversity, making it vulnerable to various abiotic and biotic stresses. Therefore, lettuce molecular breeding mainly aims to improve yield, quality and disease and pest resistance, and relies heavily on rich genetic and genomic resources, such as molecular markers and reference genomes. In 2017, Michelmore's team at the University of California, Davis, published the first lettuce (Salinas) genome, and released the improved lettuce reference genome v11 in 2022. In 2023, the Beijing Academy of Agriculture and Forestry Sciences assembled the genome of stem lettuce (L. sativa var. Augustana). Although these assemblies have greatly facilitated the study of lettuce genetics, they are still highly fragmented and incomplete, containing hundreds of gaps, and important regions such as centromeres, ribosomal DNA, and telomere sequences are not reported, leaving bottlenecks for functional genomic research, gene cloning, and molecular design breeding of lettuce.

On June 26, 2024, Li Guo's research team from Peking University Institute of Advanced Agricultural Sciences published a research paper entitled "The complete telomere-to-telomere genome assembly of lettuce" in Plant Communications. The paper published for the first time the 2.59Gb telomere-to-telomere (T2T) complete and gap-free genome sequence of lettuce (2n=18), revealing the highly complex structural characteristics of the lettuce genome and the repetitive sequence characteristics of the centromere, and for the first time revealing the three-dimensional genome conformation and epigenetic characteristics of lettuce, providing important insights into the complexity of the genome of higher plants. In addition, this study systematically predicted the disease resistance genes of the lettuce nucleotide-binding site leucine-rich repeat (NLR) family and analyzed their expression patterns in gray mold infection, providing new clues for the study of the disease resistance mechanism of lettuce.

The complete genome of lettuce reveals the centromere structure, epigenetics and disease resistance gene landscape of the NLR family.

Figure 1. Genomic and epigenomic landscapes of the complete telomere-to-telomere genome assembly of lettuce. (Wang, et al., 2024)

This study used a highly pure Romaine lettuce variety PKU06 to generate a total of 112.4× coverage of PacBio high-fidelity (HiFi) long reads and 42.9× coverage of Oxford Nanopore Technology (ONT) ultra-long reads, as well as 118.8× coverage of Hi-C reads, for assembling the complete genome of lettuce. After the initial assembly of HiFi and ONT reads and the using of Hi-C data, only two gaps remained in the genome. By grabbing the original ONT ultra-long reads to fill the remaining two gaps, and then assembling the nucleolus organizer regions (NORs) sequence and polishing the whole genome sequence, this study finally obtained a complete assembly of all nine chromosomes. The final genome size is 2.59 Gb, Contig N50 is 320.7Mb, and 384 gaps in the Salinas version (mostly repetitive sequences and centromere sequences) are filled, which significantly improves the quality of the lettuce genome assembly. The assembly quality test showed that the QV value of the lettuce T2T genome was 58, indicating that the reassembly had a high base accuracy. This study annotated a large number of repetitive elements in the lettuce genome (accounting for 81.4% of the total genome, mainly transposons), predicted 45,507 protein-coding genes based on the full-length transcriptome, and performed functional annotations. This study further found through whole-genome prediction that the lettuce genome encodes 514 members of the disease resistance-related NLR gene family, of which 4 NLR genes were located in the newly assembled sequence. In addition, transcriptome analysis found that 58 NLR genes were significantly upregulated in gray mold infection, including 36 genes encoding TIR-NB-ARC (-LRR) domains, indicating that this type of NLR gene has potentially important disease resistance functions, which needs further study in the future.

Three-dimensional genome structure is an important factor affecting plant gene expression and function. This study used high-coverage Hi-C data to study the three-dimensional structure of the lettuce genome for the first time and modeled its spatial conformation, depicting the three-dimensional folding conformation of lettuce chromosomes. Further analysis revealed that the lettuce genome has obvious topologically associated domains (TADs) and A/B compartments. Interestingly, the switching frequency between the A/B compartments is low, and the centromeres are mostly located in the B compartment, which may be related to its heterochromatin characteristics. In addition, the A compartment has a higher gene density and a lower transposable element density, while the B compartment is enriched with heterochromatin marks such as H3K9me2. ChIP-seq analysis of histone modifications showed that H3K4me3 and H3K27me3, which mark gene transcriptional activation and repression, are enriched in the A compartment, while the B compartment is enriched with heterochromatin marks such as H3K9me2. These results show that the three-dimensional genome structure of lettuce is similar to the three-dimensional genome structures of most known plants, but there are also significant uniqueness.

Centromere is an important functional region of the genome, which is related to whether chromosomes can be correctly separated during cell division. Therefore, the study of centromere and its specific binding histone CENH3 is of great significance for understanding genome evolution and conducting haploid mutagenesis breeding and genome synthesis engineering. In order to identify the centromere sequence in the complete genome of lettuce, this study used CENH3 antibody to perform ChIP-seq experiments and analyzed the CENH3 binding sequence to determine the centromere region of the lettuce genome. The average length of the centromere is 3.425 Mb. The repetitive sequence of the centromere of lettuce is very complex, consisting of a mixture of Gypsy (56.6%), Copia (13.1%) and satellite (16.3%). The centromere of lettuce has an obvious high-order repeat structure, which is mainly composed of 62bp monomers in satellite DNA and some other short repeat sequences. Centromeric Gypsy elements are mainly derived from Tekay, Angela and centromeric retrotransposons of maize (CRMs) subfamily members, among which CRM sequences have specific rapid expansion and evolution in centromeres, which are significantly different from non-centromere region repetitive sequences. ChIP-seq signal enrichment analysis found that CENH3 mainly tends to bind to centromeric Gypsy elements and satellite sequences, indicating the important role of these two types of repetitive sequences in centromere function.

This study deciphered the complete genome sequence of lettuce for the first time, and depicted its three-dimensional genome structure, the complex structural characteristics and the epigenetic landscape of the centromere, providing important resources for accelerating the research and genetic improvement of lettuce.

Related Services

Reference

  1. Wang, K., et al. The complete telomere-to-telomere genome assembly of lettuce. Plant Commun. 2024 Jun 27: 10101.
For research or industrial raw materials, not for personal medical use!
Online Inquiry