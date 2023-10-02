Not long after the first M. tuberculosis genome sequence was assembled, additional studies revealed the global diversity of M. tuberculosis genotypes. Previous methods of typing M. tuberculosis strains, including IS6110-RFLP (restriction fragment length polymorphism), spoligotyping (spacer oligonucleotide typing, which is recognized as using CRISPR array spacers), and MIRU-VNTR (mycobacterial interspersed repetitive unit variable number tandem repeat) typing, had found some success using variable numbers or positions of repetitive genomic elements, although they were limited by the low abundance and occasional non-uniqueness of these markers (40, 41). Indeed, previous attempts to predict clinical strain parameters by IS6110 copy number were found to be suboptimal, as the so-called low-copy and high-copy groups were later shown to be polyphyletic (42, 43). In contrast, comparing single-nucleotide polymorphisms (SNPs) across the genome allowed for unambiguous, higher-resolution assignment of M. tuberculosis strains into lineage clusters (Figure 1) (41–43). Strikingly, the broad lineages of M. tuberculosis also clustered with the birthplaces of the patients they infected, suggesting that the M. tuberculosis phylogeny and indeed the M. tuberculosis genome had captured geographic information (43, 44). Comparing M. tuberculosis strains from diverse lineages showed a paucity of large genomic deletions and rearrangements, evidence of an exceptionally stable genome among bacteria that preserved historical information (44). By the turn of the century, the M. tuberculosis genome was recognized as a powerful spatiotemporal resource to understand the epidemiology of the disease.

Figure 1 Phylogeny of M. tuberculosis lineage strains. Simplified maximum likelihood phylogeny of the 9 lineages of M. tuberculosis, as well as the related M. bovis strain and the M. canettii outgroup strain used as a root. Adapted with permission from Microbial Genomics (60).

M. tuberculosis has been classified into nine lineages mainly on the basis of large-scale genomic variations, variably described as regions of difference (RDs) or large sequence polymorphisms (LSPs) (45, 46). Lineages 2–4 (L2–L4) constitute a monophyletic group defined by the TbD1 deletion, a loss of the membrane proteins MmpS6 and MmpL6 that appears to confer enhanced resistance to oxidative and hypoxic stressors (47, 48). Collectively termed the “modern” lineages, L2–L4 cause the majority of globally distributed TB epidemics and hence most of the TB disease burden (Figure 2) (49). Among these, L4 has the broadest range, spanning throughout Africa, Asia, Europe, and the Americas. L2 causes the largest proportion of TB cases in East Asia as well as some cases in Central Asia and notably includes the hypervirulent Beijing strains, while L3 exists mainly in India, with additional presence in East Africa. The remaining “ancestral” lineages include L1, which predominantly occurs in Southeast Asia and India and has the widest distribution of the ancestral lineages. L5–L9 seemingly arise only in Africa, with L5–L6 (classically dubbed M. africanum) causing up to 40%–60% of TB cases there (50). Notably, the animal-adapted lineages of the broader M. tuberculosis complex, including M. bovis, share ancestry with L6 (51). Beyond these overarching categories, the modern lineages can be divided into sublineages that also correlate with human population geography (52, 53). These sublineages can be further classified into “generalist” sublineages with broader global distribution and greater variability in T cell epitopes than the more geographically confined “specialist” sublineages (53). Consistent with these observations, recent evidence argues for sympatric spread among specialist sublineages, suggesting that specialist strains have adapted to the human host genetics in their endemic regions (54). Recent methods expand beyond the use of RDs and incorporate SNPs into a highly granular sublineage classification schematic (55, 56).

Figure 2 Cartogram of global TB burden by M. tuberculosis lineage. Country areas are scaled to reflect TB incidence in 2021 (1) using the go-cart.io algorithm (172). Pie charts reflect distributions of the M. tuberculosis lineages L1–L9, as well as the animal-adapted M. bovis, M. caprae, and M. orygis, from clinical isolates by geographic region as previously described by Napier et al. (56).

By integrating the diversity of modern M. tuberculosis genomes, attempts have been made to determine the origin of human-adapted TB disease and follow its evolution with changing human migration and behavior. The discovery of Mycobacterium canettii in the Republic of Djibouti and its subsequent genomic characterization as an M. tuberculosis ancestor localized the origin of ancient TB to an origin around the Horn of Africa (57–59). Clinically, M. canettii strains are of relatively low virulence, and their genomes are generally devoid of the RDs/LSPs that define lineages L1–L9 (46, 60). Solidifying an emergence in East Africa, a new ancestral lineage with very deep phylogenetic branching, L7, was found in Ethiopia in 2012 and appears limited to residents of and immigrants from the region (61). In the past few years, two more analogously restricted East African lineages (L8–L9) were characterized (60, 62). While horizontal gene transfer mechanisms are not believed to occur in modern M. tuberculosis genomes, the ancient genome contains a mosaic of genetic material, likely from nonpathogenic bacteria (59). Earlier efforts to sequence large M. tuberculosis genomic regions identified a comparatively low rate of silent nucleotide mutations in comparison with other human pathogens, suggesting a population bottleneck with M. tuberculosis adaptation to the parasitic lifestyle (63). These findings collectively suggest a model in which environmental bacteria supplied genomic material to what would become the obligate human pathogen M. tuberculosis, replacing historical arguments for a zoonotic origin (59, 64). From East Africa, M. tuberculosis would have spread globally alongside its human host populations, and indeed the phylogeny of human mitochondrial DNA shows similar topology to that of the M. tuberculosis lineages (64). Like its human hosts, M. tuberculosis underwent several population bottlenecks during geographic spread (65). These events were followed by periods of diversification featuring many non-synonymous SNPs, notably in cell envelope proteins that may have facilitated bacterial virulence by adapting to the host immune system (65, 66). Alterations in human population demographics correlate well with M. tuberculosis evolution on both the ancient time scale, such as the dissemination of the L2 Beijing sublineage as an agricultural lifestyle spread from China across East Asia 3,000–5,000 years ago, and the nearer time scale, such as the spread of this sublineage to Afghanistan during recent wars and the rise in TB drug resistance in former Soviet states with the collapse of the USSR (64, 67). These findings demonstrate the impressive power of the M. tuberculosis genome to record and adapt to host changes throughout human history. Simultaneously, the recent data in particular highlight the tendency of M. tuberculosis to exploit periods of social instability and forced migration to escalate into a greater public health threat.

The non-random expansion of particular M. tuberculosis lineages into global pandemics suggests that these strains may have altered properties relevant to disease outcomes, a hypothesis that echoes the validation of the TbD1 deletion as a gain-of-virulence event based on animal model studies (48). Relatedly, differing lineages have been found to elicit variable immune responses in cellular and animal model systems, and there is some evidence of differential immune modulation by the modern lineages related to these properties (68–73). L2, for example, has been found to induce lower levels of inflammatory cytokines than L4 in some but not all studies (68, 69, 74). However, care must be taken in synthesizing results between such studies, particularly in attempting to generalize results from individual strains across entire lineages. As an example, virulence factors may exist only among certain subgroups within a lineage, as is the case with the cell wall phenolic glycolipid (PGL) that facilitates immune evasion by the subsets of the Beijing sublineage that express it (75, 76). Consistent with this explanation, a comparatively modern Beijing strain was found to cause reduced cytokine secretion in a macrophage model when compared with a more ancestral Beijing strain, despite similarities in bacterial burden and growth rate (71). While all M. tuberculosis strains isolated from patients with active TB are virulent by definition, the observed differences in global transmissibility suggest subtler differences in virulence by lineage due to genome variations, as has been suggested by high-throughput studies (73).