FXYD3 functionally demarcates an ancestral breast cancer stem cell subpopulation with features of drug-tolerant persisters

The heterogeneity of cancer stem cells (CSCs) within tumors presents a challenge in therapeutic targeting. To decipher the cellular plasticity that fuels phenotypic heterogeneity, we undertook single-cell transcriptomics analysis in triple-negative breast cancer (TNBC) to identify subpopulations in CSCs. We found a subpopulation of CSCs with ancestral features that is marked by FXYD domain–containing ion transport regulator 3 (FXYD3), a component of the Na+/K+ pump. Accordingly, FXYD3+ CSCs evolve and proliferate, while displaying traits of alveolar progenitors that are normally induced during pregnancy. Clinically, FXYD3+ CSCs were persistent during neoadjuvant chemotherapy, hence linking them to drug-tolerant persisters (DTPs) and identifying them as crucial therapeutic targets. Importantly, FXYD3+ CSCs were sensitive to senolytic Na+/K+ pump inhibitors, such as cardiac glycosides. Together, our data indicate that FXYD3+ CSCs with ancestral features are drivers of plasticity and chemoresistance in TNBC. Targeting the Na+/K+ pump could be an effective strategy to eliminate CSCs with ancestral and DTP features that could improve TNBC prognosis.

To distinguish the data originated from grafted (human tumor) cells from those of host (mouse) cells, XenoCell package was exploited (2).First, using the generate_index function, mouse and human genome indexes were generated based on the mm10 and hg38 reference genomes, respectively.Next, the reads were classified into either mouse or human transcripts using the classify_reads function, and graft-specific cellular barcodes were extracted using the extract_cellular_barcodes function, with cellular barcodes, which contained 70-100% of graft reads, being extracted.The output fastq files were processed using the cellranger count function of Cell Ranger.The human transcriptome reference package used was obtained from 10× Genomic website.Finally, a unique molecular identified (UMI) count matrix was generated for downstream analysis.
The following processes were performed using Seurat vignettes of R package (version 4.1.0)(3).

Quality Control
Genes expressed in less than five cells were eliminated.Cells with < 500 or > 5,000 genes, or with a fraction of mitochondrial genes > 20% were discarded to eliminate low-quality cells with mitochondrial contamination.

Normalization
Gene transcript counts were normalized using NormalizeData within the LogNormalize method, in which gene transcript counts for each cell are divided by the total counts for that cell, multiplied by a scale factor (scale.factor = 10,000 by default), and then natural-log transformed using log1p method.A total of 2,000 highly variable genes were selected for further analysis.

Clustering
To reduce data dimension, principal component analysis (PCA) was applied using default parameters.Best dimensionality was determined using JackStrawPlot and ElbowPlot functions.Dim = 20 was used for integrated data and dim = 10 for P1, P2, and P3.Graphbased clustering approach was used to cluster cells using FindNeighbours and FindClusters functions.With respect to the parameter used for the FindClusters function, res = 0.15 (Figure 1B), res = 0.1 (Supplemental Figure 12A) were used for integrated data and res = 0.5 for P1, P2, and P3 data.RunUMAP and DimPlot functions were used to perform nonlinear dimension reduction and to visualize the data, respectively.

Heatmap of Top10 Genes for Each Cluster
Upregulated genes (marker genes) of each cluster were identified using the FindAllMarkers function (min.pct= 0.25, logfc.threshold= 0.2, test = wilcox for Wilcoxon rank sum test), and the top10 upregulated genes of each cluster were shown in heatmap.

Gene Set Variation Analysis (GSVA)
The mammary stem cell signature contains all the genes which are specifically upregulated in cluster 15 compared with those in all other clusters (clusters 1-14), described in publicly available single-cell data of mouse mammary glands (4) (Supplemental Table 5).Similarly, the luminal and alveolar progenitor signatures contain all the genes which are specifically upregulated in cluster 6 and 10, respectively, compared with those in all the other clusters (4)(Supplemental Table 5).GSVA algorithm (GSVA R package [version 1.42.0]), a Kolmogorov-Smirnov-like rank statistic based on kernel estimation of the cumulative density (5), was employed to calculate signature enrichment scores for each single cell.To define differentially enriched gene sets, limma difference analysis (limma v3.50.1, R package) (6) was performed between ancestor-like CSC signature gsva score > 0 and < 0 groups of integrated data of human tumor cells (Supplemental Figure 12A) (moderate t-tests, P-value was adjusted to the false discovery rate).

Samples with NAC
SnRNA-seq data were acquired from 20 tumor samples of eight TNBC patients included in the PROMIX trial (NCBI Sequence Read Archive accession no.SRP114962) (7).Clonal evolution analyses based on longitudinal samples (pre/mid/post NAC) indicated that clonal persistence (drug resistant) and extinction (drug sensitive) occurred in four patients each.Data matrix of log-transformed transcript per million (log[TPM/10+1]) value was used to conduct downstream analyses, filtered to include genes that were expressed in at least 30% of the cells.The data of 20 libraries were integrated using Harmony (8) from Seurat , according to sample label.UMAP dimensionality reduction was conducted based on the top principal components, and breast epithelial cells were identified by clusters with mean gene expression (i.e.tumor cell: EPCAM; normal breast cell: ACTA2, KRT18/19) with cutoff > 0.8.Two-way ANOVA with interaction tests was used to evaluate the GSVA score of immature mammary epithelial gene signatures, based on different time points (pre-treatment vs. mid/post-treatment) and response to NAC (resistant vs. sensitive).GSVA scores were calculated using bulk microarray gene expression data from the PROMIX trial (GSE87455), and the optimal cutoff GSVA score was determined by the maximally selected rank statistics when we evaluated the correlation between the luminal progenitor signature and event-free survival using the surv_cutpoint function in the surminer R package (Supplemental Figure 1D).

Fluidigm C1-based scRNA-seq
NRP1 high or IGF1R high single-cell capture and whole transcriptome amplification were performed using the Fluidigm's C1 system, which is commonly used for analysis of numerous genes in individual cells, along with the SMART-Seq v4 Ultra Low RNA Kit (Takara Bio, 6350525), according to the manufacturer's instructions.cDNA quantification was performed with the Quant-IT PicoGreen dsDNA Assay Kit (Thermo Fisher Scientific, P11496), and the samples were diluted to be 0.1-0.3ng/μL.Then, Nextera XT DNA sample preparation Kit (Illumina, FC-131-1096) and IDT for Illumina Nextera DNA UD Indexes (Illumina, 20027213, 20027214, 20027215, 20027216) were used for dual indexing and amplification, according to the manufacturer's protocols.Samples were sequenced using an Illumina HiSeq3000 sequencing system.

RNA-seq Data Processing
Fastq files were aligned and quantified using Kallisto (version 0.45.0)(10) in relation to the human reference transcriptome obtained from the GENCODE database (version 30) (11).
Gene transcript counts for each sample were summarized in a matrix.

Quality Control
Empty captures, dead cells, and cell aggregates were filtered out with records of microscope observation of C1 integrated fluidic circuit plates during the single-cell capture step.Genes expressed in less than five cells were removed.Then, cells with < 7,000 (P1-IGF1R high and P3-NRP1 high ) or < 10,000 (P3-IGF1R high and P4-NRP1 high ) genes, or with a fraction of mitochondrial genes > 5% for P1-IGF1R high , > 12.5% for P3-NRP1 high , > 20% for P3-IGF1R high , or > 15% for P4-NRP1 high , were discarded to eliminate low-quality cells with mitochondrial contamination.

Normalization
Normalization of the data was performed as described for 10× Genomics-based scRNA-seq data.

Pseudotime Trajectory Analysis
R package Monocle3 (version 1.0.0)(14) was used for pseudotime trajectory analysis by implanting dimension and cluster information obtained from the analysis with Seurat pipelines.Root cells were defined as those that had the highest GSVA score of mammary stem cell or luminal progenitor signatures, as described in Figure 2E.
Plot_genes_in_pseudotime function was used to visualize the dynamics of the genes, commonly upregulated in quiescent clusters of each individual data, along the pseudotime.

Differential Gene Expression Analysis
Upregulated genes (marker genes) of each cluster were identified using the FindMarkers function (one cluster vs. the other clusters; logfc.threshold= 0.2, test = wilcox for Wilcoxon rank sum test).

Cell Cycle Analysis
The cell cycle phase-specific gene signatures defined by Macosko et al. (21) and Oki et al. (22) were used to calculate phase-specific scores for G0, G1S, G2M, M, MG1, and S, by using a scoring method developed by Tirosh et al. (23) and is implemented in Seurat as "AddModuleScore" function (ctrl.size=100).Phase-specific scores were normalized by Zscore method and each cell was assigned to a cell-cycle stage based on its highest Z-score.

Cell Cycle Regression Analysis
The effects of cell cycle-related genes (difference between G2M and S phases) were removed using the vars.to.regress function in Seurat vignettes of cell cycle regression alternate workflow, which was adapted for analysis of stem cell development.

Data Analysis of Single-cell RNA seq (scRNA-seq) of TNBC Patient Samples
ScRNA-seq data were acquired from tumor samples of 6 TNBC patients (Gene Expression Omnibus (GEO), accession code GSE 118389) (24).The following processes were performed using Seurat vignettes R package.For quality control, cells with < 500 or > 10,000 genes were discarded.Normalization and clustering were performed as described above.
Data from PT039, PT058, PT081, PT084, PT089 and PT126 patients were integrated by using FindIntegrationAnchors and IntegrateData functions with k.weight = 50, respectively.Dim = 10 was used for individual data and integrated data.Resolution = 0.2 was used for integrated data.Difference analysis between ancestor-like CSC signature gsva score > 0 and < 0 groups of integrated data of tumor cells (Supplemental Figure 12D) was performed as described above.

Flow Cytometry and Cell Sorting of FXYD3 high or FXYD3 low CSCs and non-CSCs
Cultured patient-derived cancer cells or cell lines were labeled by anti-FXYD3 antibody (1:100, Abcam, ab205534) at 4 °C for 30 min, and washed twice with FACS buffer.Next, these cells were incubated with Alexa Fluor 647-conjugated secondary antibody (1:1,000, Cell Signaling Technology, #4414) at 4 °C for 30 min, and washed twice with FACS buffer.
With respect to NRP1 and IGF1R, cells were labeled with PE-conjugated NRP1 antibody (R&D systems, FAB3870P) or PE-conjugated IGF1R antibody (BD Biosciences, 555999), respectively, at 4 °C for 30 min, and washed twice with FACS buffer.Cell viability solution (BD Biosciences, 555816) was used to eliminate dead cells.

Quantitative Real-time PCR
Total RNA was isolated using RNeasy Mini Kit (QIAGEN, 74106) and transcribed into cDNA ATGCTATCACCTCCCCTGTG) was used for internal control.The primer nucleotide sequences are detailed in Supplemental Table 4.

Cell Growth Assay
Cells were plated at 1,000 (patient-derived cells) or 300 (cell lines) cells/well onto collagencoated 96-well plates in the organoid medium (for patient-derived cells) or ultra-low 96-well plates in the sphere formation medium (for cell lines).After 4 days, relative cell number was measured by CellTiter-Glo Luminescent Cell Viability Assay (Promega, G7572).

Flow Cytometry Analysis after Drug Treatment
Cells were treated with drugs at a concentration of approximately IC50 for 48 h.Then cells were then incubated with anti-FXYD3, NRP1 or IGF1R antibodies and cell viability solution as described above.Analysis was carried out by flow cytometry cell sorter (BD FACSAria III).
To prevent the effects of Doxorubicin fluorescence on flow cytometry data, different gating thresholds were defined for negative control (NC) and Doxorubicin-treated groups by using isotype control.The flow cytometry results were evaluated with FlowJo Software 10.7.1 (BD Biosciences).

Intracellular Ca 2+ Detection
Intracellular Ca 2+ was detected and quantified by flow cytometry using Oregon Green 488 BAPTA-1, AM, cell permeant (Thermo Fisher Scientific, O6807).Cells were incubated with the indicator reagent (5 μM) at 37 °C for 30 min.Then cells were washed, collected and stained with anti-FXYD3, NRP1 or IGF1R antibodies and cell viability solution for flow cytometry analysis as described above.The fluorescence signal of intracellular Ca 2+ was quantified by geometric mean of fluorescence (GeoMFI).

ROS Detection
ROS production was detected and quantified by flow cytometry using ROS assay kit (Dojindo, R252) according to manufacturer's instructions.Briefly, cells were incubated with highly sensitive DCFH-DA Dye (1:50,000) in Hank's balanced salt solution (HBSS) at 37 °C for 30 min.After treatment, the cells were washed, collected, and then stained with anti-FXYD3, NRP1 or IGF1R antibodies and cell viability solution for flow cytometry analysis as described above.The ROS production per cell was quantified by GeoMFI.

Knockdown Experiments Using siRNAs for FXYD3
Two different siRNAs (Thermo Fisher Scientific, #1, HSS143336, #3, HSS182369) duplexes for FXYD3 and a nonspecific control siRNA duplex with similar GC contents (Thermo Fisher Scientific, siRNA Negative Control Med GC Duplex #2, 12935112) were transfected to cells by using Lipofectamine RNAiMAX Transfection Reagent (Thermo Fisher Scientific, 13778075), according to the manufacturers' instructions.qPCR or growth assay were performed after 48 h.

Knockdown Experiments Using Lentivirus-based System
Two independent shRNAs specific for ATP1B1 were designed and constructed into the

Immunofluorescence for Tissue Sections (Frozen Tissue)
Immunofluorescence for tumor tissue sections was performed as described previously(32).
Then, samples were embedded into Tissue Teck OCT compound (SAKURA, 4583) and cut into 4 μm cryosections (Leica CM1950 Cryostat).For staining, frozen tissue sections were fixed with pre-cold (−20 °C) acetone (FUJIFILM Wako, 016-00346) until evaporation.Tissue Teck OCT compound were melted at room temperature.Then, tissues were permeabilized by 0.4% Triton X-100 (Nacalai Tesque, 25987-85) at room temperature for 30 min.Next, the sections were incubated with 1% BSA blocking buffer (Nacalai Tesque, 01863-48) at room temperature for 1 h followed by incubating with primary antibodies at optimized concentrations at 4 °C overnight.On the second day, the sections were incubated with secondary antibodies conjugated with fluorescence at appropriate dilutions at room temperature for 1 h.Nuclei were counterstained with Hoechst33342 (Thermo Fisher Scientific, H3570).Images were acquired using a Zeiss LSM900 confocal microscope, and were analyzed using Image J software.The primary antibodies used in this experiment are as follows: NRP1 antibody (1:200, Thermo Fisher Scientific, 14-3042-82), IGF1R antibody Fisher Scientific, Z25308) was used for combining anti-FXYD3 antibody (final concentration 1:100, Abcam, ab205534), according to the manufacturers' instructions.

Ecocardiography
Transthoracic echocardiography and electro-cardiogram (ECG) was performed using the Vevo2100 ultrasound system (FUJIFILM VisualSonics).M-mode echocardiographic images were obtained from a short-axis view to analyze the left ventricle (LV) size and LV contraction.
Echocardiographic and ECG evaluations were carried out after one week and two weeks of treatment for each mouse.

FindIntegrationAnchors
and IntegrateData functions (3) introduced in Seurat with default parameters.