A carousel showing three slides at the same time. Use the Previous and Next buttons to move through three slides at a time, or use the slider buttons at the end to move through three slides at a time.
The adaptive immune system of vertebrates modifies the genome of individual B cells to encode antibodies that bind specific antigens. In most mammals, antibodies are composed of heavy and light chains produced by a recombination sequence of the V, D (for heavy chains), J, and C gene segments. Each chain contains three complementarity determining regions (CDR1-CDR3) that contribute to antigen specificity. Certain heavy and light chains are preferred 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22. Here we consider pairs of B cells that share the same heavy chain V gene and CDRH3 amino acid sequence and are isolated from different donors, also referred to as common clonotypes 23, 24. We show that for naive antibodies (antibodies that have not yet adapted to antigen) they use the same light chain V gene about 10% of the time, and for memory antibodies (functional) it’s about 80% even with only one cell Use each clonotype. This property of functional antibodies is a phenomenon that we refer to as light chain identity. We also observed this phenomenon when a similar heavy chain was present in the donor. Thus, while naive antibodies appear to recur sporadically, recurrence of functional antibodies shows unexpected limitations and validity during V(D)J recombination and immune selection. For most functional antibodies, the heavy chain determines the light chain.
The main task of immunology is the grouping of antibodies by function. Ideally, the antibodies in these groups will have common epitopes and paratopes defined by their protein sequences. In practice, small amounts of antibodies are tested in vitro, for example for functional activity such as neutralizing capacity. You can analyze the simple binding of a large number of antibodies with specific antigens. In the future, it may be possible to understand the properties of antibodies on a large scale based on sequence information alone, perhaps through structural modeling, which could lead to clustering of antibodies25. However, the effectiveness of any functional grouping scheme cannot currently be assessed without the use of sufficiently large datasets with multiple antigenic specificity using cells from multiple individuals or donors. Innovative approaches such as mitochondrial lineage tracking can be used to validate calculated clonotypes.
Nevertheless, some conclusions can be drawn. All antibodies in a clonotype—a group of antibodies that share a common ancestral recombinant cell derived from a single donor—usually perform the same function. Therefore, the clonotype can be considered as the smallest functional group of an antibody. Further, as has been observed, nature repeats itself, creating similar clonotypes that seem to perform the same function2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 16, 17, 18, 19, 20, 21, 22, they can be combined into groups. Such relapses have been observed between donors, but as we shall show, they also occur in individual donors. In any case, this relapse occurred after the accidental recombinant production of a large number of potential antibodies. Replays are generated by selecting from this pool.
Specific examples show that sequence similarity can determine how functional groups are understood. For example, in the case of the influenza virus, antibodies that bind the epitope-anchoring hemagglutinin stem-domain reuse four heavy chain V genes (IGHV3-23, IGHV3-30, IGHV3-30-3, and IGHV3-48) and two light chain V genes. 21 (IGKV3-11 and IGKV3-15). Similar observations were made in the case of Zika virus, where a protective heavy and light chain gene pair IGVH3-23-IGVK1-5 was observed in several people, which also cross-reacts with dengue virus. Even in the context of HIV infection, which results in many different viruses in a single individual, repetitive and ultra-broad neutralizing antibodies such as the VRC01 lineage, whose subclass members are used by IGHV1-2 and IGHV1-encoded heavy antibodies. chain combination 46 was coupled to light chains encoded by IGKV1-5, IGKV1-33, IGKV3-15 and IGKV3-2022.
Inspired by these examples, we set out to determine whether unrelated B cells with similar heavy chains also have similar light chains. We excluded related cells (i.e. cells with the same clonotype) because they used the same VDJ gene by definition.
We generate a large amount of pairwise V(D)J data to investigate this problem. Using peripheral blood samples from four unrelated individuals (Methods and Extended Data, Table 1), we captured and sequenced paired full-length antibodies from 1.6 million single B cells of four flow cytometric phenotypes. Sequences 27, 28: naive, non-switchable memory, class-switched memory, and plasmablasts (expanded data, Fig. 1). For each cell, we obtained a nucleotide sequence from the V gene leader sequence to sufficient constant regions to determine antibody isotype and subclass.
Using calculations, we classified these antibody sequences into two types: naive and memory. To this end, we determined the V allele for each of the four donors (methods) and then, for each B cell, used the putative allele to assess the somatic hypermutations that occurred outside the junction region (SHM), including both strands. If the antibody sequence is not germline mutated (i.e. does not exhibit SHM), we label the antibody sequence as naive, otherwise we label it as a memory, class understanding is a biological oversimplification and does not account for e.g. class switching . We compare these categories with the stream sort category (Extended Data Table 2). According to computer analysis, approximately 80.0% of cells classified as naive were naive, with a maximum of 90.9% for donor 2. In contrast, using computational analysis, only 0.6% of cells classified as memory were naive. During library preparation, we used up the stock of storage cells and intentionally mixed them in some libraries (eg switchable B-cells and naive B-cells) to maximize capacity. Our computational ordering also allows us to make efficient use of all data.
We next examined whether similar heavy chains imply similar light chains for unrelated B cells. We investigated this issue on memory cells and naive cells, respectively, by looking at pairs of cells, whether memory cells or naive cells. We considered only pairs of cells with the same heavy chain V gene and the same length of CDRH3, the cells of which were obtained from different donors. We divided them into 11 groups based on their percentage CDRH3 amino acid identity, rounded to the nearest 10%. Then, for each group, we calculated its light chain identity: the percentage of cell pairs with the same light chain gene name. In this work, we considered that the paralogs of the V light chain gene with the letter D in the name are identical (for example, IGKV1-17 and IGKV1D-17 were considered identical), as described previously29.
We present the results of this analysis in Figure 1a. For memory B cells (2813 cells) found in different donors with the same heavy chain V gene and 100% CDRH3 amino acid identity, we found 82% identity between their light chains, while in naive cells (754 cells ) the identity of the light chain was only 10%. This makes sense since naive cells are often not selected for function or to undergo SHM during immune responses. This finding implies that for memory cells that carry functional antibodies and are often the product of thymic and peripheral selection, heavy chain identity implies light chain identity. Next, we tested the coherence of light chains in memory B cells using an independent data set, finding a heterogeneous set of old data (data acquired with 10x Genomics and disseminated with enclone30) and several other data sets related to multiple sclerosis 31, with 93% match, COVID-19. 1932,33, Kawasaki disease34 and LIBRA-seq35) and 79% concordance for the latest datasets36 (Methods, Supplementary Data). We have disseminated this data as part of our article and have provided details of each data set in Supplementary Table 1.
Label pairs of B cells if (1) they have the same heavy chain V gene name, (2) they have the same CDRH3 length, and (3) both cells are memory cells (red) or naive (blue). The percentage of cell pairs using the same light chain V gene (or paralog) is shown as a function of CDRH3 amino acid identity rounded to the nearest 10%. (a) Probability that the common antibody shares the V light chain gene: two cells in each pair are from different donors. One curve is shown for each pair of donors. Additional curves (grey and black (hidden under the grays in the figure)) show the identity of the light chains when the heavy and light chains are randomly aligned to match. Data are mean ± sem We tested the differences between regression curve slopes using a sum-of-squares F test (P < 0.0001, F = 20.89, df numerator = 13, df denominator = 140). Data are mean ± sem Data means ± standard error. Мы проверили различия между наклонами кривой регрессии, используя F-критерий суммы квадратов (P <0,0001, F = 20,89, числитель df = 13, знаменатель df = 140). We tested for differences between the slopes of the regression curve using the sum of squares F-test (P < 0.0001, F = 20.89, numerator df = 13, denominator df = 140).数据是平均值± sem 我们使用平方和F 检验测试回归曲线斜率之间的差异（P < 0.0001，F = 20.89，df 分子= 13，df 分母= 140）。数据 是 平均值 ± Sem 我们 平方和 平方和 检验 测试 曲线 斜率 之间 的 （（p <0.0001 ， f = 20.89 ， df 分子 = 13 ， df 分 = 140）。。。。。。。。。。。。。。。。。。。。。。。。。 Data are mean ± standard error. Мы проверили различия между наклонами кривых регрессии, используя тест суммы квадратов F (P <0,0001, F = 20,89, числитель df = 13, знаменатель df = 140). We tested for differences between the slopes of the regression curves using the F sum of squares test (P < 0.0001, F = 20.89, numerator df = 13, denominator df = 140). b, Probability of having a common light chain V gene for private antibodies: two cells in each pair are from the same donor but from different calculated clonotypes, and show additional evidence that they belong to different true clonotypes (methods). The curve is displayed for each donor.
We also analyzed these data using only one cell per clonotype and found 79% light chain identity in memory B cells for the data in this work (methods and Supplementary Table 2). In addition, to completely rule out the possibility that sample cross-contamination could explain the light chain consistency, we performed a pairwise comparison of our data, old data, and recent data, treating each as a single superdonor, finding 70%. We tested the effect of cell misclassification by randomly swapping memory and naive labels of 10% cells and found 82% match for memory cells and 17% match for naive cells (versus 10% without swapping), indicating that naive consistency may be inflated by actual labeling errors. We note that light chain identity does not imply heavy chain identity (Extended Data Table 3). We also note that if we instead define naïve (CD19+IgD+CD27±CD38±CD24±) memory cells by flow cytometry (methods), we find 86% overlap between memory cell light chains (87.3% for switched and 84.4%). % for non-switched) and 16% original cells. Additional drawing. Figures 1 and 2 show significant associations between Vh and Vl genes and superfamilies, as well as the frequency of Vl superfamily usage along CDRH3 length.
Although the light chain V gene paralogs were not considered the same gene, light chain identity was maintained, with light chain identity being 64% in memory cells (Expanded Table 4). More sophisticated methods may provide more identifications, since sufficiently similar V genes should be functionally indistinguishable. In fact, light chain identity can be observed without reference genes at all. Given cell pairs from different donors, we can calculate the editing distances of their heavy and light chains. In this case, if the cells had the same CDRH3 amino acid sequence, their light chain editing distance was less than or equal to 20 78% of the time compared to 9% of the time without CDRH3 restriction (expanded data, Fig. 2). ).
Relapses (separate recombination events) occur between different donors and within the same donor; from first principles, light chain identity is expected to occur at the same rate in a single donor. This is difficult to investigate because SHM can cause two cells from the same recombination event to behave as different events. We solve this problem by considering cells in different computational clonotypes that would result from independent recombination if the computations were perfect. We need additional conditions to further increase the likelihood that the cells actually represent a true repetition (Methods). For example, we consider it sufficient that two cells have different CDRL3 lengths, as this is most likely caused by a single recombination event, although this cannot be confirmed by VDJ sequencing alone. Using this approach, we observed 65% light chain identity in memory cells with 100% CDRH3 amino acid identity in cells from the same donor (Figure 1b). This is lower than that observed for cells from different donors (Fig. 1a), probably because additional prophylactic conditions (methods) excluded many cases of true recurrent antigen-specific responses expected in humans.
As an analytical tool, cell pairs contribute to the understanding of VDJ biology. However, groups of non-overlapping memory B cells, which may share common functions, are of greatest biological interest. The following grouping scheme is an example of a general approach to this problem: all memory B cells that have the same heavy chain V gene and 100% identical CDRH3 amino acid sequence are placed in one group. This is unsatisfactory because it ignores clonotypes that have the same function but differ in amino acid sequence, and also because some amino acids can be changed without affecting the ability of the antibody to bind its antigen.
These limitations are easy to overcome. For clarity, we give a specific example with 90% identity. We consider only memory cells and computational clonotypes that consist entirely of memory cells. We define groups by first determining when two cells are similar. We call two cells similar if they belong to the same clonotype or if they have the same heavy chain V gene and 90% identical CDRH3 amino acid sequence. Then if there is a row of cells X = X1,…,Xn = Y such that for each i the value of Xi is the same as Xi + 1, we put two cells X and Y in the same group. The multiple “jumping” cells between these two give them a similarity of transmission. This process is a well-known mathematical concept that we call transitive grouping. It puts each cell into a group and creates non-overlapping groups. We analyzed the identity of the light chains of these groups by examining pairs of cells from different donors in the same group.
Transitivity allows the formation of a large number of clonotypes. As shown in FIG. 2a, the light chain identity of these groups decreased with decreasing percent CDRH3 identity. This is faster than expected because there are multiple hops in transitivity, each of which can bind antibodies with different functions. Even so, large close-knit groups can be formed. Figure 2b shows a set of transmissible clonotypes calculated using 90% identity, using data from this work and those from Phad et al.36, and the IGHV3-9 common heavy chain V gene. The light chain identity for this group was 99.7% (726 out of 728 cells) and all but four cells used the IGKV2-30 and IGKJ2 light chain genes. We note that only 7% (1203 out of 16880) of memory B cells using the IGHV3-9 heavy chain V gene in these datasets used IGKV2-30. The cells in this group are in 122 calculated clonotypes, each of which would represent an independent event of repetition (recombination) if the clonotypes were calculated accurately. Both the heavy and light CDR3 sequences showed high conservatism. Notably, 93% of the cells had an IgA1 or IgA2 subclass.
a, we grouped clonotypes by transmission at a given percentage of CDRH3 amino acid identity (box), while requiring the same heavy chain V gene. The graph shows the relationship between light chain coherence and the number of cells present in each group. b, top, at 90% identity, using the data presented herein and Phad et al., shows a delivery group including cells with the sequence CDRH3 CIKDILPGGADSW. (2022) 36. These cells use the IGHV3-9 heavy chain V gene. Each dot represents a cell and each cluster represents a calculated clonotype. All but three calculated clonotypes used the IGKV2-30 light chain gene and cells from all six donors (d1, d2, d3, d4, d5 and d6) were present. Bottom: CDRH3 (top) and CDRL3 (bottom) amino acid sequence logo map in a panel.
Antibody relapses are expected to be predominantly driven by relatively frequent recombination events. We analyzed each junction sequence, finding the most likely D region, excluding the presence of a D region or joining two D regions to account for VDDJ junctions 37, 38, and aligning the antibody nucleotide sequence with the reference V(D)J sequence junction (Fig. 3a and Tables 1 and 2). We first tested whether the observed recurrence rates were comparable to those expected by chance. The answer to this question presents a dilemma because it requires a deep enough understanding of the reorganization to accurately generalize the process through modeling. Other researchers have dealt with this complex problem39,40,41. We generated random antibody linker sequences with the soNNia41 simulation program using deduplicated, naive heavy chain nucleotide sequences from our training data. We also explored dummy variants (see Methods and Supplementary Table 3). Our simulations used two functions not used in previous studies23,24. That is, we identify and use truly naive junction sequences for simulation training (no detectable SHMs in other CDR or FR regions), and we use a post-selection model to account for both central and peripheral selection.
The heavy chain linker sequence was aligned with a tandem reference sequence containing VJ, VDJ or VDDJ, with two different D genes and a most likely reference. The number of bases at the insertion sites was determined relative to this standard (deletions were counted separately), as well as the number of substituted bases. a, shows the heavy chain binding region of a memory cell with a heavy chain binding CARDGGYGSGSYDAFDIW. We found that IGHD3-10 is the most likely D gene. There are 8 inserted bases and 7 substitutions at the junction. The replacement rate is 7 out of 46, where the denominator (46) is the total number of matching and non-matching bases. b, for each of the four antibody types in the data, we counted the number of bases inserted in the heavy chain junction region relative to the tandem reference sequence VDJ (or VJ or VDDJ in some cases). Most of the inserted bases are insertions in non-template regions 1 or 2. The frequency is shown as a function of the number of bases inserted. See also Table 1.
We then calculated how many naïve antibodies would be predicted for relapse by simulation by creating four sets of dummy antibodies of the same size as the group of naïve cells in our data and then calculating the repetition of a cross-donor heavy chain gene and a CDRH3 amino acid pair. out of ten Each of the simulations is repeated. The recurrence of naive antibodies appears in cell counts, which makes sense since naive cells rarely appear in multiple cell clonotypes. While the actual number of recurrences we observed was 754, the median predicted was 1190. Considering modeling issues, the numbers are close. This confirmed that the soNNia simulator replicated the statistical properties of the real repertoire, including VDDJ recombination frequency (Table 1) and compound repetition, even when the model was trained on memory rather than naive sequences (Supplementary Table 3).
We next examined whether the junction regions of the repetitive antibodies were as complex as those of the arbitrary antibodies (Fig. 3b). We found an order of magnitude fewer inserted bases for recurrent antibodies, both for naive and memory cells. This suggests that recurrent antibodies inherently have simpler bonds than arbitrary antibodies and therefore, as expected, are more likely to accidentally relapse. In fact, most of the recurrent antibodies, both naive and memory, had no inserted bases (Figure 3b).
Finally, we examined the identity of light chains in antibodies that relapsed in a small number of people. On average, these antibodies have relatively simple binding sites (Fig. 3b). Antibodies with more complex bonds should recur less frequently. In large groups of people, they would be clearly visible, but they could still be observed in our data. We found that recurrent antibodies with more intermediate nucleotides (non-template regions 1 and 2 and other intermediate bases relative to the alignment-related reference sequence) had greater light chain identity than antibodies without gender intermediates (Table 2). This allays our concerns that repetitive antibodies are anomalous in light chain identity. In fact, our results show that all antibodies are recurrent, but with varying frequency, depending on the complexity of their linkage and the prevalence of related antigens. Our results also show that, except for frequency, more complex antibodies do not behave differently with respect to light chain coherence.
This work supports the following model, which summarizes the observed limitations of gene use by some antibodies , 21, 22, 23, 24. In nature, many heavy chain configurations result in efficient binding of a given antibody target. However, for each of these heavy chain configurations, homologous light chains are largely defined – about 80% at the level of the light chain gene or paralog. We called this phenomenon light chain consensus and observed it by looking for repetition of heavy chains in memory and naive cells from four donors. The small number of donors has shifted our analysis towards areas of compounds of less complexity, although the same phenomenon occurs with the more complex compounds that appear in our data. Our results indicate that light chain coherence can be universally applied to memory B cells.
Although we only analyzed V(D)J data for about 2 million cells in this work, more detailed data were obtained separately for heavy and light chains. This allows identification of recurrences (common clonotypes) in such data using strict definitions based on 100% CDRH323,24 amino acid identity. Due to differences in size and technical approach between studies, we interpret these data with caution here. Here we show that relapses in these and other studies described previously are due to two types of B cells: naive B cells, which produce “not yet working” antibodies with minimal light chain identity, and naive B cells. that produce antibodies with light chain identity. antibodies to memory B cells. Given that naive B cells lack acquired function and selection, they would not be expected to have a light chain identity. Conversely, light chain identity in memory B cells is expected as they acquire function and selection.
The simplest explanation for the repetitions we and others have observed between naive cells is that their sequence duplications are purely random. We show that repeating naive cells (as well as memory cells) have significantly less knot complexity in our data, so it is not surprising that they are repeated. One might wonder if the observed repetition rates are consistent with the mechanistic biology of V(D)J recombination. Answering this question requires precise quantitative knowledge of this extremely complex process, and such knowledge does not exist. However, here we show that simulations of this process do indeed predict repetition at a rate similar to our observations. Therefore, we assume that naive sequences arise by chance. In contrast, repetitive memory sequences are the product of random and frequent exposure to related antigens. We believe that repetitive naive sequences are informative for recombination, while repetitive memory sequences are informative for antibody function, central selection, and peripheral selection.
We assume that light chain coherence implies functional coherence. However, we do not claim to solve the problem of grouping functions. There are at least two obstacles to this. First, all methods based on direct sequence comparisons, including ours, are naive about the structural consequences of amino acid changes. Instead of comparing sequences, a more efficient approach to functional grouping may be to first calculate model structures of antibodies from their sequences and then compare these structures25,42,43,44,45. Secondly, better data from the real world is needed to evaluate any method. Thus, although similar antibodies to the same antigen are widely observed in many people, the frequency with which antibodies to different antigens can be identical and similar remains unknown. Real data for such problems may include a large set of natural antibody data, one for each antigen, and binding data for each antibody. These data will be most reliable if they include nucleotide sequences (eg, allowing consistent identification of the VDJ gene) and sufficient information about the donor to distinguish true relapse from clonal expansion in the individual. Generating such real data at scale is possible using existing methods21, 35, 46, 47, 48. Immunoglobulin loci and their products are complex and difficult to analyze. It is reasonable to assume that some of the sequences considered naive in this study are in fact antigen-specific, as others have been described in SARS-CoV-2 infection. Conversely, despite progress in identifying new germline alleles,50,51 some of the sequences considered as memory in this study may in fact be naive and not detected by our inference method. New alleles are generated.
Because of how V(D)J recombination works, the antibody light chain sequence carries less information than the heavy chain sequence. Our work shows that the light chains of functional antibodies are severely limited, meaning that the light chains used in nature are the most functional: natural selection wins. The complex dance between heavy and light chains is best studied at the individual cell level, the environment in which antibodies are produced, selected, and amplified. Although we do not yet understand why, we show that the organization of this interaction results in a limited number of acceptable light chains. We recommend that antibody developers actively search for the best light chains that are widely used in nature, rather than focusing solely on heavy chains. Similarly, for bispecific antibodies, it may be useful to find heavy chains that are similar to two native light chains.
For some details we refer to ref. 51. We provide commands for using executables in enclone code and data from this work at https://github.com/DavidBJaffe/enclone/blob/master/enclone_paper/what_code_does_what. Install the enclone executable and download the data in just one line (https://10xgenomics.github.io/enclone/). We used enclone version 0.5.175. Running enclone data for this job takes about 145 GB of memory and 20-40 minutes on a multi-core server (24 cores). As an intermediate, we created a per_cell_stuff file (https://plus.figshare.com/articles/dataset/Dataset_supporting_Functional_antibodies_exhibit_light_chain_coherence_/20338177?file=36366549) for each cell to facilitate direct analysis of this work by another data method. There are also files per_cell_stuff.old_data (https://plus.figshare.com/articles/dataset/Dataset_supporting_Functional_antibodies_exhibit_light_chain_coherence_/20338177?file=36366552) and per_cell_stuff.phad (https://plus.figshare.com/articles/dataset/Dataset_supporting_Functional_herence_chain_light_light) . ) /20338177?file=36366543) matches the other data used. The use of other executables requires compilation from the source code provided at https://github.com/DavidBJaffe/enclone and located in its enclone_paper/src/bin directory. These calculations (usually using files like per_cell_stuff) were done on a MacBook Pro with 16 GB of memory.
We used a Sony MA900 cell sorter to purify single B cell suspensions from the four donor PBMCs described herein. We used the following flux definitions for each population: naive: live, CD3-CD19+IgD+CD27±CD38±CD24±; non-switchable memory: live, CD3-CD19+CD27+IgDlowIgM++CD38±CD24±; switchable memory: live, CD3-CD19+CD27+IgD-CD38±CD24±CD95± and plasmablasts: live, CD3-CD19+CD27+IgD-CD38++CD24-.
The antibody kit we used consisted of the following 8 clones in a staining volume of 200 µl, for a total of 1.25 µg of antibody, each at a 1:40 dilution (5 µl): LIVE/DEAD dye (Invitrogen, 7-AAD, 00-6993 -50), anti-CD3 (BioLegend, BV711, 317327, OKT3 clone, IgG2a/kappa, mouse), anti-CD19 (BioLegend, PE, 982402, HIB19 clone, IgG1/kappa, mouse), anti-IgD (BioLegend, APC), 348221, clone IA6-2, IgG2a/kappa, mouse), anti-IgM (BioLe, PE/Dazzle 594, 314529, clone MHM-88, IgG1/kappa, mouse), anti-IgM – CD24 (BioLegend, BV605, 311123, clone ML5, IgG2a/kappa, mouse), anti-CD27 (BioLegend, FITC, 302805, clone O323, IgG1/kappa, mouse), anti-CD38 (BioLegend, BV421, 356617, clone HB -7, IgG1/kappa, mouse) and anti-CD95 (BioLegend, BV510, 305639, clone DX2, IgG1/kappa, mouse).
We titrated and designed a B cell fractionation panel using 20 million fresh PBMCs (AllCells, 3050363) from healthy donors whose cells were not used for single cell data in this study. We thawed cells according to the demonstrated 10x Genomics protocol for fresh frozen human peripheral blood mononuclear cells for single-cell RNA sequencing (CG00039 Revision D). Briefly, we resuspended the cells in 20 µl PBS/2% FBS and incubated the cells on ice for 30 minutes in the dark. Before sorting, we washed the cells with 3×1 ml PBS/2% FBS, then resuspended them in 300 µl PBS/2% FBS for the sorting step.
Cells from four donors were stream sorted into naive, switchable memory, non-switchable memory, and plasmablasts. Donated blood was collected according to an IRB approved protocol and administered by AllCells (a subsidiary of Discovery Life Sciences) with informed consent; no clinical trials have been conducted in this study or its data, and no personally identifiable information has been reported. V(D)J sequences were generated using the 10x Genomics Immune Profiling Platform using six Chromium X HT chips and manufacturer’s standard methods. The cDNA libraries were sequenced in an S4 flow cell on a NovaSeq 6000 platform. Certain populations of memory B cells are relatively rare in some donors. With this in mind, in some libraries we add naive B cells to isolated memories from each donor to capture enough B cells and unique sequences from each donor (see also Extended Data Sheet 1). We targeted 20,000 cells extracted from each lane on the HT chip.
We have used several other datasets. They included single cell data from Phad et al.36 for 247,516 cells that were approximately evenly distributed between two donors. The second combined collection (“old data”) includes data generated internally by 10x Genomics and distributed with enclone, as well as data related to multiple sclerosis31, COVID-1932,33, Kawasaki disease34 and LIBRA-seq35 with several other datasets.
Some of this data predates the dual indexing and is indicative of mixing (i.e., index hopping) in the flow cell. In this case, we consider cells from multiple donors as belonging to the same donor. The merger left 23 donors, for a total of 280,669 cells in the old data.
Pairs of cells were selected from different calculated clonotypes (Fig. 1b). To make it unlikely that two cells belong to the same true clonotype, we also required that at least one of three conditions be met: (1) data support the use of different heavy chain J genes for the two cells. For this purpose, the framework region of four of the two cells was examined. There must be at least three positions where the reference sequence differs, and the cell has bases that match them, and no positions where the reference sequence differs, and all cells support only one of the references. (2) Same criteria as (1), but using a light chain instead of a heavy chain. (3) Different length of light chain CDR3.
We used soNNia v0.1.2, commit 85c7169, to simulate 10 repeats of 1,408,939 heavy chain bonds. https://plus.figshare.com/articles/dataset/Dataset_supporting_Functional_antibodies_exhibit_light_chain_coherence_/20338177?file=36354231 provides a reproducible Conda environment, scripts to create these files, and simulation data. We trained two soNNia models using naive annotation sequences in pre-selection and post-selection modes, and two models using learned annotation sequences in pre-selection and post-selection modes.
We performed a light chain identity alignment test comparing light chain data while keeping heavy chain data fixed, and then calculated light chain identity between cell pairs as shown in Figure 1 using only one cell per clonotype to reduce potential bias. in clone extension. We performed 1000 permutations at light chain coherences from 0% to 100% in 10% steps and then calculated the standard error of the mean for each level. We present these curves in Figure 1a. We tested the differences between the slopes of the linear regression models of these curves using the sum-of-squares F-test.
We tested the significance of the Vh/Vl contingency table (where each count represents a clonotype with a given Vh/Vl pair) using a Monte Carlo simulation with 100,000 P-values.
We used the ggseqlogo52 and msa53 R packages to match the CDRH3 and CDRL3 sequences to the MUSCLE algorithm and create logo maps using positional entropy/Shannon information (y-axis units: bits). The letters are colored according to the properties of the various amino acids. Color coding and other amino acid property codes needed to reproduce these numbers are published as part of this document.
The donor allele of the V gene was partially derived according to the following algorithm (as part of the enclone program, in the allele.rs file). The basic concept is to add up the observed sequences for a given V gene and identify base variants. If we use all cells (one sequence per cell), the sequences will be biased due to clonal expansion, resulting in incorrect alleles. Ideally, instead, we should use only one cell per clonotype that uses a given V gene. However, the order of operations is such that we compute the donor allele first, and then the clonotype. Therefore, we used a heuristic to select clonotype-independent cells. The heuristic is that we select only one cell among cells that use a given V gene and have the same CDRH3 length, CDRL3 length, and partner chain V and J genes. Stacking is then performed by the V gene sequences of these cells.
Then, for each position along the V gene, excluding the last 15 bases (to avoid connecting regions), we determined the distribution of bases that occurred in these selected cells. We took into account only the positions in which non-reference databases occurred at least four times and accounted for at least 25% of the total. Each cell then has a trace relative to those positions, which is a list of base calls at that position; we require that these traces meet similar proof criteria. Each such non-reference trace then defines an “alternative allele”. We do not limit the number of alternative alleles, as they may result from duplication of gene copies. The algorithm’s ability to reconstruct alleles is limited by the coverage depth (accounted for in “non-redundant” cells) for a given V gene. In addition, the algorithm was unable to identify germline mutations that occurred in the terminal bases, junction regions of the V gene. An example of allele invocation can be found in ref. 51, in the Donor Analysis section.
For more information on study design, see the Nature Research Report abstract linked to this article.
All data are publicly available at https://doi.org/10.25452/figshare.plus.20338177, including processed full length V(D)J sequences and annotations.
All code that reproduces the main conclusions and data of the article is available at https://github.com/DavidBJaffe/enclone (Git hash 561e3ac); a separate copy of this code is also hosted on Figshare+ at https://plus. . figshare.com/articles/dataset/Dataset_supporting_Functional_antibodies_exhibit_light_chain_coherence_/20338177?file=37819143.
Forgacs, D. et al. Evolution of convergent antibodies and expansion of clonotypes after vaccination against the influenza virus. PLoS ONE 16, e0247253 (2021).
Heilmann, C. & Barington, T. Distribution of κ and λ light chain isotypes among human blood immunoglobulin-secreting cells after vaccination with pneumococcal polysaccharides. Heilmann, C. & Barington, T. Distribution of κ and λ light chain isotypes among human blood immunoglobulin-secreting cells after vaccination with pneumococcal polysaccharides. Heilmann, K. and Barington, T. Distribution of light chain isotypes κ and λ among cells secreting human blood immunoglobulin after vaccination with pneumococcal polysaccharides. Heilmann, C. & Barington, T. 肺炎球菌多糖疫苗接种后人类血液免疫球蛋白分泌细胞中κ 和λ 轻链同种型的分布。 Heilmann, C. & Barington, T. Distribution of pneumonia 球蛋多糖多糖vaccination后人物生球球球热种线结构中κ和λ轻链同种型的可以. Heilmann, K. and Barington, T. Distribution of light chain isoforms kappa and lambda in human blood immunoglobulin-secreting cells after vaccination with pneumococcal polysaccharide. scanning J. Immunity. 29, 159–164 (1989).
Roy, B. et al. High-throughput single-cell analysis of B-cell receptor utilization in autoantigen-specific plasma cells in celiac disease. J. Immunity. 199, 782–791 (2017).
Zhu, D., Lossos, C., Chapman-Fredricks, JR & Lossos, IS Biased immunoglobulin light chain use in the Chlamydophila psittaci negative ocular adnexal marginal zone lymphomas. Zhu, D., Lossos, C., Chapman-Fredricks, JR & Lossos, IS Biased immunoglobulin light chain use in Chlamydophila psittaci negative ocular adnexal marginal zone lymphomas. Zhu, D., Lossos, C., Chapman-Fredricks, JR & Lossos, IS Предвзятое использование легкой цепи иммуноглобулина при краевых лимфомах придаточной зоны глаза, отрицательных на Chlamydophila psittaci. Zhu, D., Lossos, C., Chapman-Fredricks, JR & Lossos, IS Preferential use of immunoglobulin light chain in Chlamydophila psittaci-negative marginal lymphomas of the adnexal zone of the eye. Zhu, D., Lossos, C., Chapman-Fredricks, JR & Lossos, IS 偏向免疫球蛋白轻链在鹦鹉热衣原体阴性眼附件边缘区淋巴瘤中的应用。 Zhu, D., Lossos, C., Chapman-Fredricks, JR & Lossos, IS. The application of 细向immunoglobulin protein 转链在费湯热衣原体阴性眼在在美国在美国美格. Zhu, D., Lossos, C., Chapman-Fredricks, JR & Lossos, IS Смещенная легкая цепь иммуноглобулина при Chlamydia psittacosis-негативной лимфоме маргинальной зоны придатков глаза. Zhu, D., Lossos, C., Chapman-Fredricks, JR & Lossos, IS Mixed light chain immunoglobulin in Chlamydia psittacosis-negative marginal zone lymphoma of the adnexal eye. Yes. J. Chem. 88, 379–384 (2013).
Post time: Nov-15-2022