Human midbrain dopaminergic neuronal differentiation markers predict cell therapy outcomes in a Parkinson’s disease model

Human pluripotent stem cell–based (hPSC-based) replacement therapy holds great promise for the treatment of Parkinson’s disease (PD). However, the heterogeneity of hPSC-derived donor cells and the low yield of midbrain dopaminergic (mDA) neurons after transplantation hinder its broad clinical application. Here, we have characterized the single-cell molecular landscape during mDA neuron differentiation. We found that this process recapitulated the development of multiple but adjacent fetal brain regions including the ventral midbrain, the isthmus, and the ventral hindbrain, resulting in a heterogenous donor cell population. We reconstructed the differentiation trajectory of the mDA lineage and identified calsyntenin 2 (CLSTN2) and protein tyrosine phosphatase receptor type O (PTPRO) as specific surface markers of mDA progenitors, which were predictive of mDA neuron differentiation and could facilitate high enrichment of mDA neurons (up to 80%) following progenitor cell sorting and transplantation. Marker-sorted progenitors exhibited higher therapeutic potency in correcting motor deficits of PD mice. Different marker-sorted grafts had a strikingly consistent cellular composition, in which mDA neurons were enriched, while off-target neuron types were mostly depleted, suggesting stable graft outcomes. Our study provides a better understanding of cellular heterogeneity during mDA neuron differentiation and establishes a strategy to generate highly purified donor cells to achieve stable and predictable therapeutic outcomes, raising the prospect of hPSC-based PD cell replacement therapies.


Supplemental Figure 6. Graft fiber innervation across different brain regions and synaptic integration of mDA neurons.
(A) hNCAM + fibers distribution and extension in stage III or IV LMX1A + EN1 + groups. The white asterisk indicates graft site. Scale bar, 500 μm. (B) Examination of hNCAM fiber extension across different brain regions. Scale bar, 500 μm. (C) The graft volumes at 6 months. N = 5 (stage III LMX1A + EN1 + ) and 8 (stage IV LMX1A + EN1 + ). (D) Grafts co-labeling human-specific fiber STEM121 and TH. Scale bar, 100 μm. Inset box represents zoomed view of extended graft fiber. Scale bar, 20 μm. (E) Typical immunohistochemistry images by co-labeling of human-specific synaptophysin and TH in CLSTN2-derived graft. Boxed areas are magnified on the right. White arrows indicate co-localization of human-specific synaptophysin with TH along the TH + fibers. Scale bar, 20 μm.

Supplemental Figure 7. TH-tdT/CLSTN2-and TH-tdT/PTPRO-EGFP cell lines validation and electrophysiological recording of grafted mDA neurons.
(A) PCR genotyping of the TH-tdT/CLSTN2-EGFP (left) and TH-tdT/PTPRO-EGFP (right) hPSC lines. Red arrows indicate expected PCR products. The expected PCR products for LA-based insertion and homozygosity identification correctly targeted TH locus are ~1200 bp and ~1000 bp, respectively. The expected PCR products for homozygosity identification correctly targeted CLSTN2 locus and PTPRO locus are the same as Supplemental Figure 4A. The expected PCR products for LA-based insertion identification correctly targeted CLSTN2 and PTPRO locus are ~1700 bp and ~3000 bp, respectively. For homozygosity identification, clones without expected PCR products are homozygous and with expected PCR products are heterozygous. (A) PCR genotyping of the AAVS1-EGFPnls, CLSTN2-, PTPRO-tdT/AAVS1-EGFPnls hPSC lines. PCR identification strategy for CLSTN2 and PTPRO locus is the same as Supplemental Figure 4A. The expected AAVS1 locus PCR product for LA insertion is ~1500 bp; homozygous, ~600 bp. (B-G) Typical grafts were immunostained with oligodendrocyte or oligodendrocyte progenitor cell marker OLIG2 (B), astrocyte marker GFAP (D), VLMC marker COL1A1 (F), and hN in graft groups. Quantification of OLIG2 + cells (C) ratio, GFAP + cells (E) ratio, and COL1A1 + density (G) per area in graft. Scale bar for (B), 20 μm. Scale bar for (D, F): 100 μm. One-way ANOVA followed by Tukey's multiple comparisons test. (H) Further clustering of VLMC group and representative marker expression of each VLMC subcluster. (I) VLMC subtype ratio by each graft group calculated using scRNA-seq data. (J) Heatmap of AUROC scores between graft neuronal clusters in this study and neuron-only data set from a public data set (4). (K) Typical images of graft, immunostained by 5-HT or GABA, and TH (left). Quantification of 5-HT + and GABA + neurons ratio (right) in stage III or stage IV LMX1A + EN1 + groups.
Feeder-free hES cell maintenance. The CLSTN2-tdT hPSC line was switched into feeder-free conditions by seeding dissociated cells onto a six-well plate coated with vitronectin XF (Stemcell Technologies). Cells were fed daily with mTeSR1 medium (Stemcell Technologies) and passaged 5-6 days using TrypLE Express Enzyme (1X, Gibco). Cells were used for differentiation after 3-4 passages from transformation. The differentiation culture condition was the same as hPSCs on irradiated MEF feeder cells.
Construction of knock-in cell lines. For the OTX2-EGFP/FGF8-tdT/EN1-BFP-3x-HA-tag reporter line, the OTX2-EGFP hPSC line was first generated, then P2A and tdT sequences were inserted in frame into an FGF8 donor plasmid with the same structure as OTX2-EGFP, but using homology arms of FGF8 and neomycin resistance gene. At the final step, P2A and BFP-3x-HA-tag were inserted in frame into the EN1 donor plasmid with the same structure of OTX2-EGFP but using homology arms of EN1 and puromycin resistance gene. For the LMX1A-tdT/EN1-mNeonGreen reporter line, the LMX1A-tdT hPSC line was first generated. P2A and mNeonGreen were then inserted in frame into the EN1 donor plasmid with the same structure of LMX1A-tdT donor plasmid but using neomycin resistance gene. For surface marker reporter cell lines, P2A and tdT were inserted in frame into the CLSTN2 or PTPRO locus, with same structure of LMX1A-tdT donor plasmid using homology arms of CLSTN2 or PTPRO. To establish surface marker reporter/EGFPnls cell lines for scRNA-seq analysis of grafts, AAVS1-NeoR-CAG-EGFPnls-WPRE-polyA donor plasmids were constructed and electroporated into H9 ESCs or established surface marker reporter cell lines using transcription-activator-like effector nucleases (TALENs). We constructed TH-tdT/Surface marker-EGFP reporter cell lines based on TH-tdT that our lab previously generated (1). Briefly, we modified the surface marker-tdT donor plasmid by replacing tdT with EGFP but used the neomycin resistance gene. Targeting guide RNAs were the same as the surface marker-tdT. Electroporation, genomic DNA extraction, and genomic PCR identification were performed as described (1,2   appropriate for fresh-frozen tissues. For Figure 6C and Figure 12E immunohistochemistry was performed following RNA FISH. All antibodies used in this study are commercially available. The following antibodies were used in immunostaining: Mouse monoclonal anti-Human nuclei quantification, the total number of human nuclei (hN + ), TH + , and FOXA2 + cells within the grafts were counted from images obtained at 20x magnification to quantify the ratio of TH + , or FOXA2 + cells. For the percentage of GIRK2 + or CB + DA neuron within grafts, TH + cells were first identified. To quantify other cell types in grafts, such as 5-HT + , GABA + , OLIG2 + , or GFAP + , cells were co-labeled with TH and hN (except the GFAP group). Other-cell-types + hN + THwere counted manually with ImageJ.
hCOL1A1 + fiber area within the graft was analyzed using ImageJ software. All data are presented as the mean ± SEM. To estimate graft volume, the section with immunostaining of hN was acquired at 20x magnification and analyzed with ImageJ.
The graft area was extrapolated in every section of a 1:6 series, and the volume was Clustering and identification of cell populations. The filtered count matrix was analyzed and processed using Seurat (v3.1.4) and Scanpy (v3.0.46), including data filtering, normalization, highly variable genes selection, scaling, dimension reduction, and clustering. First, scRNA-seq data sampled from each time point was created as Seurat object separately; genes with less than 3 counts were removed and cells with less than 200 genes detected were removed. Second, each Seurat object was converted into a loom file and imported into Scanpy for clustering. Then, six Seurat objects were merged using the "merge" function in Seurat and converted into a loom file to proceed with cell type clustering in Scanpy. Details of the downstream analysis are described as the following: Data filtering: cells with a mitochondrial gene ratio 5% were excluded. Then, cells with more than 1000 genes detected, less than 6000 genes detected (cells with more than 6000 genes detected are potential doublets), more than 1000 counts detected, and less than 40000 counts detected (cells with more than 40000 counts detected are potential doublets) were kept.
Data normalization: for each cell, counts were log normalized with the ''NormalizeData'' function in Seurat, "scale.factor" was set for 40000.
Highly variable genes selection: 2000 highly variable genes were calculated using the "FindVariableFeatures" function in Seurat. Then, we identified genes correlated with cell-cycle marker TOP2A (Pearson correlation greater 0.15) and excluded them from 2000 highly variable genes.
Cell cycle scoring: a cell-cycle related gene set with 43 genes expressed during G1/S and 54 genes expressed during G2/M was used to calculate an S phase score and a G2M phase score using the 'CellCycleScoring" function in Seurat. Cell cycle difference score was calculated as the difference value of S phase score minus G2M phase score.
Data scaling: the Seurat object was performed "ScaleData" function with default parameters. Number of counts, number of genes, mitochondrial gene ratio, and the cell cycle difference score were variables to regress out in "ScaleData" function in Seurat.
Principal component analysis: highly variable genes were used to calculate principal components in the "RunPCA" function in Seurat. 100 principal components (PCs) were obtained and stored in Seurat object for computing neighborhood graph and umap in the next part.
Leiden clustering: Seurat object was converted into loom file and imported by Scanpy.
A neighborhood graph of observations was computed by the "scanpy.pp.neighbors" function in Scanpy. Then, the leiden algorithm was used to cluster cells by "scanpy.tl.leiden" function in Scanpy.
Cluster merging and trimming: the top 200 differentially expressed genes for each cluster were calculated by the "scanpy.tl.rank_genes_groups" function in Scanpy using the parameters method = 'wilcoxon' and n_genes = 200. Cluster annotation was done manually based on previous canonical markers of developing midbrain and developing mouse brain atlas. For cell clusters that expressed similar marker genes, we merged them into one cluster; for cell clusters with unknown marker gene expression, we used Metascape, a web-based gene annotation analysis tool (3), to define their cluster identity.
Combining filtered count matrix and analysis results of Metascape, we defined some cell clusters as low-quality cells (low counts and low genes detected), stressed, apoptotic, high ribosome protein genes detected ratio, and hypoxic (Supplemental Table   4). Then, we filtered them out to obtain the trimmed cells list. Finally, we redid analysis steps 1-8 to get consistent clustering annotation for the separate time points and merged dataset.
Regional gene module score analysis. Regional gene modules were curated based on detectable genes in our datasets, previous research, and developing mouse brain atlas.   Multi-Application Cell Sorter at stage III or IV. tdT + Neongreen + cells were collected as one group, whereas tdT -Neongreen + , tdT + Neongreen -, and tdT -Neongreencells were collected together as another group. Cells were pelleted at 400 g for 5 min and lysed with TRIzol (Thermo Scientific). Sequencing libraries were generated using NEBNext UltraTM RNA Library Prep Kit for Illumina (NEB, USA) and index codes were added to attribute sequences to each sample. Libraries were sequenced on an Illumina Hiseq PE150. Raw sequencing reads were processed by quality control and adaptor trimming.
Then, clean sequencing reads were mapped to the UCSC human GRCh38 genome with