Communications in Information and Systems
Volume 21 (2021)
Inverted repeats in coronavirus SARS-CoV-2 genome and implications in evolution
Pages: 125 – 145
The coronavirus disease (COVID-19) pandemic, caused by the coronavirus SARS‑CoV‑2, has caused 60 million infections and 1.38 million fatalities. Genomic analysis of SARS‑CoV‑2 can provide insights on drug design and vaccine development for controlling the pandemic. Inverted repeats in a genome greatly impact the stability of the genome structure and regulate gene expression. Inverted repeats involve cellular evolution and genetic diversity, genome arrangements, and diseases. Here, we investigate the inverted repeats in the coronavirus SARS‑CoV‑2 genome. We find that SARS‑CoV‑2 genome has an abundance of inverted repeats. The inverted repeats are mainly located in the gene of the Spike protein. This result suggests the Spike protein gene undergoes recombination events, therefore, is essential for fast evolution. Comparison of the inverted repeat signatures in human and bat coronaviruses suggest that SARS‑CoV‑2 is mostly related SARS-related coronavirus, SARSr‑CoV/RaTG13. The study also reveals that the recent SARS-related coronavirus, SARSr‑CoV/RmYN02, has a high amount of inverted repeats in the spike protein gene. Besides, this study demonstrates that the inverted repeat distribution in a genome can be considered as the genomic signature. This study highlights the significance of inverted repeats in the evolution of SARS‑CoV‑2 and presents the inverted repeats as the genomic signature in genome analysis.
This research is partially supported by the National Natural Science Foundation of China (NSFC) grant (91746119, to S.S.-T. Yau), Tsinghua University Spring Breeze Fund (2020Z99CFY044, to S.S.-T. Yau), Tsinghua University start-up fund, and Tsinghua University Education Foundation fund (042202008, to S.S.-T. Yau).
Received 25 November 2020