Preliminary genomic characterisation of an emergent SARS-CoV-2 lineage in the UK defined by a novel set of spike mutations

Recently a distinct phylogenetic cluster (named lineage B.1.1.7) was detected within the COG-UK surveillance dataset. This cluster has been growing rapidly over the past 4 weeks and since been observed in other UK locations, indicating further spread.

Several aspects of this cluster are noteworthy for epidemiological and biological reasons and we report preliminary findings below. In summary:
The B.1.1.7 lineage accounts for an increasing proportion of cases in parts of England. The number of B.1.1.7 cases, and the number of regions reporting B.1.1.7 infections, are growing.
B.1.1.7 has an unusually large number of genetic changes, particularly in the spike protein.
Three of these mutations have potential biological effects that have been described previously to varying extents:

  • Mutation N501Y is one of six key contact residues within the receptor-binding domain (RBD) and has been identified as increasing binding affinity to human and murine ACE2.
  • The spike deletion 69-70del has been described in the context of evasion to the human immune response but has also occurred a number of times in association with other RBD changes.
  • Mutation P681H is immediately adjacent to the furin cleavage site, a known location of biological significance.

The rapid growth of this lineage indicates the need for enhanced genomic and epidemiological surveillance worldwide and laboratory investigations of antigenicity and infectivity.


The two earliest sampled genomes that belong to the B.1.1.7 lineage were collected on 20-Sept-2020 in Kent and another on 21-Sept-2020 from Greater London. B.1.1.7 infections have continued to be detected in the UK through early December 2020. Genomes belonging to lineage B.1.1.7 form a monophyletic clade that is well supported by a large number of lineage-defining mutations (Figure 1). As of 15th December, there are 1623 genomes in the B.1.1.7 lineage. Of these 519 were sampled in Greater London, 555 in Kent, 545 in other regions of the UK including both Scotland and Wales, and 4 in other countries.

Figure 1 | Phylogenetic tree of the B.1.1.7 lineage and its nearest outgroup sequences, for samples collected up until 30-Nov-2020. Tips from the same location have been collapsed into circles whose area is proportional to the number of genomes represented. Three large subclades are evident within the B.1.1.7 lineage, each defined by one nucleotide change. One of these clades is defined by a further stop codon in ORF8.

Lineage-defining mutations & rate of evolution

The B.1.1.7 lineage carries a larger than usual number of virus genetic changes. The accrual of 14 lineage-specific amino acid replacements prior to its detection is, to date, unprecedented in the global virus genomic data for the COVID-19 pandemic. Most branches in the global phylogenetic tree of SARS-CoV-2 show no more than a few mutations and mutations accumulate at a relatively consistent rate over time. Estimates suggest that circulating SARS-CoV-2 lineages accumulate nucleotide mutations at a rate of about 1-2 mutations per month (Duchene et al. 2020).

A preliminary analysis of these observations is provided in Figure 2, which shows a regression of root-to-tip genetic distances against genome sampling date, for lineage B.1.1.7 and for a selection of related outgroup genomes. The rate of molecular evolution within lineage B.1.1.7 is similar to that of other related lineages. However, lineage B.1.1.7 is more divergent from the phylogenetic root of the pandemic, indicating a higher rate of molecular evolution on the phylogenetic branch immediately ancestral to B.1.1.7. Further, inferred nucleotide changes on this branch are predominantly amino acid-altering (14 non-synonymous mutations and 3 deletions). There are 6 synonymous mutations on the branch. This is suggestive of a process involving adaptive molecular evolution, although a role for increased fixation rates through relaxed selective constraint cannot be currently ruled out.

Figure 2 | Regression of root-to-tip genetic distances against sampling dates, for sequences belonging to lineage B.1.1.7 (blue) and those in its immediate outgroup in the global phylogenetic tree (brown). The regression lines are fitted to the two sets independently. The regression gradient is an estimate of the rate of sequence evolution. These rates are 5.6E-4 and 5.3E-4 nucleotide changes/site/year for the B.1.1.7 and outgroup data sets, respectively.

What evolutionary processes or selective pressures might have given rise to lineage B.1.1.7?
High rates of mutation accumulation over short time periods have been reported previously in studies of immunodeficient or immunosuppressed patients who are chronically infected with SARS-CoV-2 (Choi et al. 2020; Avanzato et al. 2020; Kemp et al. 2020). These infections exhibit detectable SARS-CoV-2 RNA for 2-4 months or longer (although there are also reports of long infections in some immunocompetent individuals). The patients are treated with convalescent plasma (sometimes more than once) and usually also with the drug remdesivir. Virus genome sequencing of these infections reveals unusually large numbers of nucleotide changes and deletion mutations and often high ratios of non-synonymous to synonymous changes. Convalescent plasma is often given when patient viral loads are high, and Kemp et al. (2020) report that intra-patient virus genetic diversity increased after plasma treatment was given.

Under such circumstances, the evolutionary dynamics of and selective pressures upon the intra-patient virus population are expected to be very different to those experienced in typical infection. First, selection from natural immune responses in immune-deficient/suppressed patients will be weak or absent. Second, the selection arising from antibody therapy may be strong due to high antibody concentrations. Third, if antibody therapy is administered after many weeks of chronic infection, the virus population may be unusually large and genetically diverse at the time that antibody-mediated selective pressure is applied, creating suitable circumstances for the rapid fixation of multiple virus genetic changes through direct selection and genetic hitchhiking.

These considerations lead us to hypothesise that the unusual genetic divergence of lineage B.1.1.7 may have resulted, at least in part, from virus evolution with a chronically-infected individual. Although such infections are rare, and onward transmission from them presumably even rarer, they are not improbable given the ongoing large number of new infections.

Although we speculate here that chronic infection played a role in the origins of the B.1.1.7 variant, this remains a hypothesis and we cannot yet infer the precise nature of this event.

Potential biological significance of mutations

Table 1 provides details of the B.1.1.7 lineage-specific non-synonymous mutations and deletions. We note that many occur in the virus spike protein. These include spike position 501, one of the key contact residues in the receptor binding domain (RBD), and experimental data suggests mutation N501Y can increase ACE2 receptor affinity (Starr et al. 2020) and P681H, one of 4 residues comprising the insertion that creates a furin cleavage site between S1 and S2 in spike. The S1/S2 furin cleavage site of SARS-CoV-2 is not found in closely related coronaviruses and has been shown to promote entry into respiratory epithelial cells and transmission in animal models (Hoffmann, Kleine-Weber, and Pöhlmann 2020; Peacock et al. 2020; Zhu et al. 2020). N501Y has been associated with increased infectivity and virulence in a mouse model (Gu et al. 2020). Both N501Y and P681H have been observed independently but not to our knowledge in combination before now.

Also present is the deletion of two amino acids at sites 69-70 in spike – this mutation is one of a number of recurrent deletions observed in the N terminal domain of Spike (McCarthy et al. 2020; Kemp et al. 2020) and has been seen in multiple lineages linked to several RBD mutations. For example, it arose in the mink-associated outbreak in Denmark on the background of the Y453F RBD mutation, and in humans in association with the N439K RBD mutation, accounting for its relatively high frequency in the global genome data (~3000 sequences).


Latest videos


Abonnez-vous à notre newsletter