Genome sequences of several pathogens have revealed a partitioning of chromosomes, with housekeeping genes often being located in the central core and antigen genes being located in subtelomeric regions5,6. These assemblies suggest that the linear organization of the genome may be important for restricting high levels of recombination to regions that code for antigens and for ensuring that all but one antigen is repressed.

Recently, genome-wide Hi-C analyses have begun to uncover the 3D organization of chromosomes at high resolution4, which has highlighted the critical role of spatial organization and compartmentalization of DNA in the regulation of gene expression and recombination2,3. In addition, microscopy-based analyses of the unicellular eukaryotic parasites Plasmodium falciparum and T. brucei have indicated that nuclear organization may be important for the mutually exclusive expression of antigens7,8,9. However, to our knowledge, the proteins that are involved in shaping genome architecture and controlling antigen expression have not yet been identified in any organism.

This study aimed to identify the process that restricts antigen expression. Specifically, we sought to identify proteins that are important for maintaining genome architecture and to determine whether global and/or local changes in chromatin conformation affect antigen expression.

In T. brucei—which is the causative agent of human sleeping sickness—the key antigens are the variant surface glycoproteins (VSGs). Most VSG genes—of which there are about 2,500—are found in long subtelomeric arrays of megabase chromosomes6. In addition, about 65 VSG genes are located on mini-chromosomes (50–150 kb in length)10and a smaller subset of VSG genes is located in distinct telomere-proximal polycistronic transcription units, called expression sites11. Expression sites are grouped into metacyclic-form and bloodstream-form expression sites (MESs and BESs, respectively) on the basis of the life-cycle stage during which they can be activated. VSG genes are transcribed only when they are located within an expression site and only one of about 15 BESs is transcribed at a time, which ensures that the expression of VSG genes is mutually exclusive11. Therefore, a genome sequence that contains both subtelomeric VSG gene arrays and telomeric expression sites, which is lacking in the available T. brucei genome (isolate TREU 927)6, is required to elucidate the molecular link between genome architecture and antigenic variation.

Using PacBio single-molecule real-time (SMRT) sequencing technology, we generated an approximately 100-fold genome-sequence coverage of the T. brucei 427 Lister isolate (the most commonly used laboratory isolate) and assembled the reads into megabase chromosomes, of which there are 11 (96 contigs, Fig. 1, Extended Data Table 1). To order and orient contigs without relying on scaffolds of related parasite isolates (which may have undergone genome rearrangements), we took advantage of two ubiquitous features of chromosome organization: a distance-dependent decay of DNA–DNA interaction frequency and substantially higher interaction frequencies between DNA loci located on the same chromosome, compared to those on different chromosomes4. The high degree of subtelomeric heterozygosity enabled us to assemble the complete T. brucei genome with phased diploid subtelomeric regions (Extended Data Figs. 1, 2, Supplementary Data). In addition, RNA sequencing (RNA-seq) revealed a notable partitioning of the genome into a transcribed homozygous core and non-transcribed heterozygous subtelomeric regions, which encode the vast repertoire of antigens (Fig. 1).

Analysis of the frequency of intra-chromosomal DNA–DNA interaction suggested a strong compartmentalization of the T. brucei genome: centromeres and junctions between the core and subtelomeres function as the most prominent boundaries of DNA compartments. In addition, the frequency of DNA–DNA contact was substantially higher across subtelomeric regions compared to core regions, which indicates that subtelomeres are more compact than the core region (Fig. 2a, Extended Data Fig. 3). Therefore, the partitioning of the genome into transcribed housekeeping genes and non-transcribed antigen genes that is observed in the genome assembly and transcriptome data is mirrored by the 3D organization of the genome. In T. brucei, RNA polymerase II transcription can occur in the absence of canonical promoter motifs12,13. Thus, the high degree of compaction across subtelomeric regions probably prevents the spurious initiation of transcription and ensures mutually exclusive expression of a single VSG gene from a BES. In addition, BES–BES interactions were much more frequent than interactions among randomly chosen genomic loci, suggesting a clustering of BESs (Fig. 2b). Taken together, the Hi-C data suggest a distinct compartmentalization of the T. brucei nucleus.

Higher-order genome structures are established and maintained by architectural proteins such as CCCTC-binding factor (CTCF) and cohesin14. Histone variants are also enriched at many compartment boundaries15, but the role of these variants in shaping genome architecture remains unknown. Although CTCF appears to be absent in non-metazoans16, the major subunit of cohesin (SCC1) is present in T. brucei and the depletion of this subunit causes deregulation of VSG expression17. However, it has remained unclear whether this is a direct effect because SCC1 depletion strongly affects cell-cycle progression and growth rate, leading to rapid parasite death18.

Chromatin immunoprecipitation with sequencing (ChIP–seq) revealed that in T. brucei SCC1 is enriched across tRNA and rRNA genes, termination sites of RNA polymerase II transcription and most of the 3′ ends of BESs (Fig. 2c, d, Extended Data Fig. 4). This pattern of cohesin enrichment is reminiscent of its distribution in humans and yeast, in which cohesin is found at insulator and boundary elements such as tRNA genes19,20. The observed distribution of SCC1 is also similar to that of histone variants H3.V and, to a lesser extent, H4.V in T. brucei (Fig. 2d, Extended Data Fig. 4; also see ref.21). This raised the possibility that these two histone variants function together with SCC1 in shaping genome organization and the regulation of antigen expression.

Genome organization and DNA accessibility control antigenic variation in trypanosomes

To investigate a possible link between these histone variants, genome architecture and antigen expression, we determined the expression of VSG genes and genome architecture in ΔH3.V, ΔH4.V and ΔH3.VΔH4.V cells. No cell cycle defect was observed in these cell lines (Extended Data Fig. 5).

Laboratory-adapted isolates, such as the one used here, switch their expression of VSG isoforms at very low frequency (about 10−6per population doubling), and homogenously express VSG-2 (Fig. 3a; also see ref.22). Thus, an increase in heterogeneity of VSG gene expression can be caused by a loss of mutually exclusive expression of VSG genes in individual cells—that is, heterogeneity in antigen expression at the single-cell level—or an increased switching frequency in expression of VSG genes in different parasites (heterogeneity at the population level).

To distinguish between these possibilities and to identify the VSG genes that are expressed, we performed single-cell RNA-seq (scRNA-seq) of individual T. brucei cells. scRNA-seq data from a total of 40 wild-type and 378 ΔH3.VΔH4.V cells revealed that—whereas all wild-type cells expressed VSG-2—in 74% of the ΔH3.VΔH4.V cells, VSG-2 transcript levels contributed less than 20% of the total VSG mRNA; this indicates a switch in expression of VSG genes (Fig. 3a, Extended Data Figs. 6, 7). Activation of new VSG genes was not random, with VSG-11 being the dominant newly activated VSG gene in 230 out of 378 cells. In addition, several cells contained transcripts from multiple VSG genes, which points to a partial loss of mutually exclusive expression. To determine the stability of VSG-2 expression, we analysed ΔH3.VΔH4.V cells at two time points that were about 50 population doublings apart. Although the overall pattern remained the same (Fig. 3a, Extended Data Fig. 6), the percentage of cells that expressed only VSG-2 mRNA, or multiple VSG mRNAs, had declined by the second time point. This suggests that the process of VSG-2 deactivation had progressed further, and that the simultaneous expression of multiple VSG genes may have been a transient intermediate state. Analyses based on immunofluorescence and flow cytometry confirmed that the loss of VSG-2 mRNA resulted in a loss of VSG-2 expression (Extended Data Fig. 8). No major effect on the expression of VSG genes was observed upon deletion of H3.V or H4.V alone (Extended Data Fig. 8).

In T. brucei, the switching of expression of VSG genes occurs by two distinct mechanisms11: either by switching transcription from one BES to another (transcriptional switch) or by a recombination-based event that leads to the replacement of the previously active VSG gene with a new VSG gene from a different genomic location (recombinational switch, Fig. 3b).

To gain insight into the mechanism by which histone variants affect antigen expression, we sequenced ΔH3.VΔH4.V genomic DNA using SMRT sequencing technology. The SMRT data indicated that, in most cells, recombination had occurred between an expression-site-associated gene 8 (ESAG8) gene pair that was present in both BES1 and BES15. The data also revealed that the new chimeric BES contained three copies of ESAG8, one from BES1 and two from BES15 (Fig. 3c). scRNA-seq and Hi-C data support a recombination event (Fig. 3d, e). Hi-C data revealed that, upon deletion of H3.V and H4.V, the interaction frequency between VSG-11 and the 5′ end of chromosome 6—where VSG-2 is located in wild-type cells—increased, indicating that VSG-11 had relocated to chromosome 6.

Studies in different organisms have shown that the frequency of recombination is affected by spatial proximity and DNA accessibility23,24. Thus, to determine whether histone variants contribute to genome architecture and/or local DNA accessibility, we performed Hi-C and assays for transposase-accessible chromatin using sequencing (ATAC-seq). Hi-C data from ΔH3.V cells revealed marked changes in inter-chromosomal interactions (Fig. 4a, top) and a significant increase in interactions among repressed BESs (Fig. 4b), pointing to a loss of constraints that may have ‘anchored’ the BESs to specific nuclear sites. In support of these Hi-C data, fluorescence in situ hybridization (FISH) data revealed a strong clustering of telomeric repeats upon deletion of H3.V (Fig. 4c, d). By contrast, deletion of H4.V affected genome architecture only modestly (Fig. 4a, bottom). Unlike the Hi-C data, our ATAC-seq data indicated that promoter-proximal DNA accessibility increased upon H3.V or H4.V deletion (Fig. 4e). However, only ΔH3.VΔH4.V cells exhibited high DNA accessibility across the entire length of transcriptionally repressed BESs (Fig. 4e bottom, Extended Data Fig. 9).

In summary, the Hi-C and ATAC-seq data indicate that although deletion of H3.V was responsible for the majority of genome architectural changes and increased BES clustering, this alone was not sufficient to induce a switch in expression of VSG genes. Only the concurrent deletion of H3.V and H4.V, which also strongly increased DNA accessibility across transcriptionally repressed BESs, enhanced the rate of recombination-based switching of VSG genes.

The depletion of histone H3 was previously shown to upregulate BES proximal-promoter activity—presumably via a general increase in DNA accessibility—but did not cause deregulation of VSG genes25. We hypothesize that the marked increase in switching frequency of VSG gene expression results from the combination of decreased spatial distance between BESs and increased local DNA accessibility (Fig. 4f).

The activation of new VSG genes did not occur at random; this non-random activation has previously been observed for infections of different hosts26,27. In a small number of cells, we detected transcripts from different VSG isoforms. This loss of mutually exclusive expression of VSG genes may be caused by increased DNA accessibility upon the deletion of histone variants, which may result in promiscuous RNA polymerase II transcription. Our observations that even in ΔH3.VΔH4.V cells not all expression sites are transcribed and that specific ‘pairs’ of VSG genes tend to be co-expressed, suggest that there are additional constraints imposed by genome organization or VSG protein structure28. At the genome level, co-activated VSG genes may have to be localized in close proximity to ensure sufficient levels of an activating factor8,29; alternatively, differences in VSG protein structure may make it impossible for the parasite to tolerate certain mosaic surface coats.

In this study, we have demonstrated how evolutionarily conserved features of genome architecture can be exploited for the de novo scaffolding of phased diploid genomes. The use of Hi-C, scRNA-seq and ATAC-seq—to our knowledge, all used here for the first time in T. brucei—opened opportunities for genome assembly and the characterization of the mechanism that underlies VSG switching in ΔH3.VΔH4.V cells. Our data reveal that histone variants can function as architectural proteins, and that changes in global genome architecture and local chromatin configuration can induce extensive switches in antigen expression.