Nov 18, 2021 / Oncology

Behind the research: Building a breast cancer cell atlas with single cell and spatial insights

Leida Tirado-Lee

2,300,000. That’s how many women across the globe were diagnosed with breast cancer in 2020 alone (1). The number jumps to a staggering 7,800,000 when you include diagnoses that occurred in the past five years (1). Despite being one of the most prevalent cancers in the world, there is still much work to be done to help us understand breast cancer subtypes and how to better treat them. At the Garvan Institute of Medical Research, the laboratory of Associate Professor Alex Swarbrick, PhD, uses cellular genomics to glean new insights into the diverse cellular microenvironments of breast cancers.

We recently spoke with Dr. Swarbrick and Dr. Sunny Wu, his postdoctoral fellow, about their latest publication in Nature Genetics defining the cellular taxonomy of breast cancer samples, spatially mapping cell locations and interactions within tumors, and stratifying breast cancer cohorts into unique ecotypes based on cellular composition and clinical outcomes (2).

Jump to: Findings | Future Directions

Study background and approach

You just published your latest research in a Nature journal, which is every researcher's dream. Can you tell me a little bit about the background of how this project started?

Swarbrick: Well, I guess it's been a long time coming, and it's funny, I actually remember having a conversation very early when single cell sequencing was out. This is before 10x [Genomics] was on my horizon, maybe before it existed, in fact.

People had been doing low-throughput single cell sequencing, FACs sorting individual cells into little wells, and it was excruciatingly difficult and slow. I remember having this conversation with a colleague of mine, “How cool would it be to make, essentially, a cellular atlas of breast cancer.” This was probably eight years ago or something now.

And, the reason we thought it would be cool is that even the simplest look at a pathology slide from a solid cancer tells you there is so much going on that we really didn't understand. There are so many different cells in different shapes, different reactivity by IHC [immunohistochemistry], and so on. It was clear that this was a level of understanding that was really poorly developed, and understanding it better was going to give us better insights into the disease and hopefully, ultimately, how to treat the disease.

Then, we saw single cell coming over the horizon, maybe a little earlier than some people. And so we made two moves. One was starting a biobank of human tissues and biobanking them in a way that is compatible with single cell sequencing (3). Most people dunk tissues into formalin and it's very difficult to work with that. Whereas we do what is called cryopreserving, so storing tissues in a way that keeps them alive and you can dissociate them later.

The second move though was more coincidental. It was getting a 10x [Genomics] Chromium box. Actually, we got it for the linked-read assay, which we never actually ran. But thankfully, the single cell assays instead came about. And so those two things came together naturally.

You’ve talked about single cell, but this study combined single cell with spatial transcriptomics to not only define those cellular phenotypes but then map where they are located. Can you talk a little bit about that approach of combining these two technologies and what it did for this project?

Wu: Yeah, so we're super fortunate to be able to work closely with such great clinical colleagues and get access to all these patient samples and this tissue, so we really want to maximize everything we can learn about it.

So, as Alex touched on, for a long time, we were doing single cell and learning a lot about the heterogeneity within the tissue compartment of different tumor types. But that really only told us so much about the expression profiles of each cell. It gave us pretty limited information into where these cells sit within tissues, what cell niches they form, and how they're communicating with one another. That's where, nicely, the Visium assay came along to solve a few of those issues and really help us address that.

So, [now there] was another level of information we were able to learn from each of these patient tumors, using the Visium assay to help us understand where these cell types that we're finding in single cell [data] sit in tissue, how they're communicating to one another, and what mechanisms could be going on that could help drive tumor progression.

Can you talk about the challenges that you face when working with single cell and spatial datasets, and how do you approach that in order to gain the insights you were looking for?

Wu: Yeah, certainly. In the beginning, when we were just analyzing single cell data, the rest of the world was also analyzing single cell data, and some really great computational labs were leading that charge on how to best analyze cellular heterogeneity, what different regulatory networks [are] occurring within the cells that you're sequencing.

When spatial came along, it certainly wasn't an easy solution to integrate all of those different types of data together. Luckily, this was a huge collaborative study [with] experts in the field [and] our own lab to help us integrate those different modalities.

Just an example of that is integrating the single cell and spatial data with the current resolution of Visium. We know that each particular spot doesn't give a single cell resolution but can be contained within, maybe a handful to a dozen cells, depending on the tissue type or sample type. Working with Alma Andersson and Joakim Lundeberg (SciLifeLab, Stokholm) that are experts in this area, we were able to integrate the single cell data to give us approximations or estimates into the cell abundances of the cell types of interest that we're sequencing by single cell in different tissue areas. That was quite challenging, but, using these complex deconvolution methods, we were able to integrate those data types nicely and [this] matched with what our pathologist was also seeing under the microscope.

Swarbrick: I think one of the other big challenges was just the challenge at the sample end. How do you get tissue from pathology or surgery into the single cell instrument in the best condition possible and as quickly as possible? In fact, most of the single cell data from this atlas were generated on the day that the sample was collected. So, these were rushed back to the lab and processed. This might be nine o'clock at night.

Cryopreservation helped solve that problem. Then the other [problem] is how to process samples, how to get them into the best condition. There was a lot of trial and error.

You collaborated with 10x Genomics scientists on this project. What was that collaboration like?

Swarbrick: Yes, Stephen Williams is a computational scientist [from 10x Genomics] we collaborated with, he's a computational scientist. I met him at AGBT in Florida just days before the world went crazy, back in February 2020. I saw that he had this really nice spatial transcriptomic dataset from triple-negative breast cancers, which is an area we're really interested in.

He was really generous with his data and with his ideas, and so we decided to collaborate to try to make our dataset more substantial. Because human tumors are so diverse, the more you have, the more likely you are to start seeing repeating patterns that can make some sense. We also have this ongoing relationship where he gives us really great insights and advice into how to process Visium data.

Wu: The collaboration with 10x [Genomics], in particular, Stephen, has been great. The additional samples were one thing, but it also really helped us learn more about the biology of the tumors that we were studying. We had this really nice story surrounding the stromal microenvironment of tumors, and we were starting to see some pretty interesting relationships between those cell types and, in particular, with T cells and tumors. Within our dataset, we started to see some significant correlations, but, of course, we were slightly limited on sample numbers. And so, those additional samples from Stephen and his insights into the computational analysis was a really great collaboration with 10x [Genomics] that helped us further the paper significantly.

Jump to: Background | Future Directions

Findings from the study

Let’s talk a little bit about your results. Was there anything that particularly surprised you when you looked at the data?

Wu: Firstly, one aspect of the data that we explored was tumor heterogeneity, how breast cancer cells can be found in different cell states within just an individual tumor. We worked with Chuck Perou and colleagues at the University of North Carolina to take some of the knowledge that we know about tumor subtyping from the bulk setting and adapt that or make that more suitable for a single cell space.

One thing that certainly surprised us, in some particular tumors, was the multiple breast cancer cell states that could exist and coexist within the same tumor. That has certain implications when you start to think about targeted therapies—which cell types are going to respond, which are not going to respond, and which may contribute to relapse.

Then, following on from that, when we started to look into the spatial data, we could also see that [these different cell states] could be found in different pockets in regional differences within the tissue itself. We're certainly exploring [this] further in larger tissue cohorts.

What does this really mean in terms of oncology research and translating it to the bedside?

Swarbrick: I guess there are a few things you might pull out. Maybe one that's easier to explain would be our finding of frequent functional heterogeneity within the cancer cells of breast cancers. We've suspected this would be the case. But, we've never really known what the observation of heterogeneity for one of two markers really means for the overall phenotype of the cell.

One of the observations we made is that almost every breast cancer contains cells with a subtype that is divergent from the subtype assigned to the bulk tumor [as]. Breast cancers are classified into four or five groups, for example, luminal and basal, based on a molecular signature. What we found is that certain breast cancers, while they might be predominantly luminal, have subsets of cells that have a basal molecular signature and this is transcriptome-wide, so it's not one or two markers. These cells look like they genuinely don't belong in that tumor.

I think this has some pretty direct implications because we would anticipate that those patients might not have a complete response to treatments targeting their bulk tumor phenotype. So, if you have a luminal breast cancer, you’d give them hormone-based therapies, those basal cells sitting within that tumor are not going to respond to those therapies.

So, you may predict that those patients will have intrinsic resistance to treatment and relapse early. So, it has predictive importance. It also might help us think about what additional treatment we need to give that patient to try to eliminate those basal cells as well.

Wu: Another interesting implication of our studies is the contribution that all the different cell types within the tumor, not just the cancer cells themselves, but their surrounding host tissue cells that they recruit to help them grow, as well as the immune system.

One nice thing we did within this study was to start to see how all those different cell types contribute to the overall profile of the tumor, which is often just studied, I guess, in isolation within those different compartments. And typically, only thought of within the tumor cells themselves.

We took our new single cell taxonomy of all of these different cell types that we described within breast cancers and, then, went back to independent bulk datasets where we could show that, beyond just those profiles that are dominated by the bulk signal, all of these individual cell types contribute to the overall profile of the tumor that then goes on to have associations with outcome or prognosis.

That's what we called “ecotyping.” It certainly moves towards this integrated view that all of these different cell types really play an important role as units of cells to define a particular tumor.

The data that you generated is publicly available. How important is having those public databases to progressing research?

Swarbrick: I think it was really core to our approach, to this study, that we could see the potential of this data to not just advance our own specific research questions, but many others because single cell data is so multidimensional. This study had more than 100,000 cells, each one with a whole transcriptome. And, then, obviously the other dimensions of the data, the CITE-seq protein data, and the spatial information. There's no way that any single lab can fully exploit a dataset like that.

It's also an expectation, obviously, of the scientific community, of our funders, of the patients that donated the tissue, that this really precious, expensive dataset that a lot of people gave a lot to generate becomes widely used and disseminated.

Wu: Publicly available data is something we've also benefited from in this study. The ability to take other publicly available datasets, for instance, the METABRIC study, which holds thousands of patients and tumors that have been profiled with gene expression studies. We've learned a lot from that, and we hope the research community can benefit in a similar way from our dataset.

The [Broad Single Cell] portal, where our data is hosted, is really great because not all labs have the resources or computational abilities to start to ask some of these questions within the data themselves. It has a nice graphical user interface that anyone could just load up in Google Chrome and start to ask some of those questions in a really simple manner. Our hope is that other labs can do a very similar thing, just start to explore their favorite genes of interest or therapeutic targets that they are profiling in their studies and start to see some human relevance.

Jump to: Background | Findings

Future directions

How will you follow up on this study?

Wu: I think there are so many different directions that this study can take us. One obvious one is just the need to apply this to much larger cohorts of patients where, as Alex alluded to before, we could also start to directly assess how these expression profiles that we're finding in single cell RNA sequencing directly relate to the outcome of a particular patient, their prognosis or response to treatment.

Now, the lab is doing some incredible things, being led by some awesome new students and staff in the lab that are starting to generate single cell datasets from hundreds of patients. And some of those cases, I think, have up to seven years of clinical follow-up to help us specifically address that question.

Swarbrick: Scale is a big one. Scaling up is, I think, one of the biggest keys, to get the numbers where you [can] have enough data to associate cellular and molecular features with clinical features. The clock's ticking on the field's ability to just generate datasets disconnected from clinical information. We can only do that for so long. It's really up to us now to show or test the value of these technologies to oncology. And to do anything else, I think, is wasting time.

The second one, I guess, is where Sunny's work has been going, which is spinning out hypotheses. We don't have to do droplet single cell RNA-seq the rest of our lives. So, some of the work we're doing now is taking some of the things we've learned from that and going back and actually doing experiments. Going back into model systems, for example, to test the hypotheses that emerged from these findings and, then, try to narrow down to individual candidates and drive them into implementation.

This study used cryopreserved tissues. You often hear the statistic that there are over a billion tissues currently preserved in biobanks across the world and most of them are in FFPE. Do you think you'll go and look at some of the preserved tissue samples that have been done in formalin and try to access any information from those tissues?

Swarbrick: Yes, for sure. We're trialing the Visium FFPE solution right now. With the early data, it looks like it's going to be really good. I'd imagine it's going to become the standard pretty soon. And like you say, the beauty of it is it just opens up all these tissue collections.

Because in cancer, and, I guess, many human diseases, you need time. One of the key features you want to know when you're studying human disease is how the person's disease goes on to progress. Did they respond to treatment? Did they survive? That takes time. You've got to be able to go to these archival tissues where you've had the time to accumulate that information. I think that’s going to be where being able to do these kinds of studies with FFPE is going to be really awesome.

Wu: The ability to study FFPE tissue powerfully unlocks a whole new world for us because, as Alex mentioned, we've been storing tissues for many, many years. The time allows us to directly measure some features of that particular tissue that might directly relate to how that patient's prognosis, outcome, or response to treatment, and understanding some mechanisms into how that occurs from the Visium data on FFPE is going to be incredibly powerful.


  1. World Health Organization. (2021, March 26). Breast cancer. World Health Organization. Retrieved October 25, 2021, from
  2. Wu SZ, et al. A single-cell and spatially resolved atlas of human breast cancers. Nat Genet 53: 1334–1347 (2021).
  3. Wu SZ, et al. Cryopreservation of human cancers conserves tumour heterogeneity for single-cell multi-omics analysis. Genome Med 13: 81 (2021).