Owkin and UNIL collaborators discuss the MOSAIC study: A conversation with Dr. Elo Madissoon and Dr. Raphael Gottardo
“At the end of the day, it's all about the data. If you want to have a huge impact, no matter what you work on, whether it's biology or any other field, you need to have high-quality data. And I think having the right instruments, the right technologies, will get you there. That's the main summary of this study.” — Raphael Gottardo, PhD, Professor of Biomedical Data Science, Lausanne University Hospital, University of Lausanne (UNIL)
AI models—such as foundation models which are machine learning models trained on large datasets to perform a wide variety of general tasks (1)—can be powerful assets to parse through complex biological research. As the models reveal interesting patterns in certain biological systems, researchers can ask better questions of new tissue samples or apply the models in other contexts.
“When I see these tumor microenvironments, what does that mean in terms of predicting response?” This is one of the questions that Dr. Raphael Gottardo, Professor of Biomedical Data Science at the University of Lausanne, hopes to ask of the models that he and his colleagues will generate as part of the MOSAIC study. An initiative between Owkin, an AI biotech company, and a consortium of academic researchers, the goal of MOSAIC is to profile the spatial biology of thousands of FFPE patient tumor samples across a variety of cancer types. This unprecedented study leverages the Visium Spatial platform to analyze archived patient samples.
We spoke with Dr. Gottardo and Dr. Elo Madissoon, Director of Single Cell and Spatial Technologies at Owkin, to understand the goals of the MOSAIC study and how a massive, pan-cancer spatial transcriptomics dataset could be leveraged to enable future therapeutic prediction modeling, identify putative drug targets, and ultimately impact personalized cancer medicine.
Keep reading to hear what they have to say about the importance of data quality and how Visium outperformed another candidate spatial platform in their benchmarking study to identify the right technology to fuel MOSAIC (2).
Can you tell us more about the motivation for the MOSAIC study? And how did that lead to your comparative study of possible spatial methods?
Madissoon: We started the MOSAIC study to perform large-scale, high-resolution characterization of tumor tissues. Two factors were important for us. One was throughput—to be able to profile thousands of patients—and the second was to generate a lot of data points per patient. We were aiming to find out the best method to study archival cancer samples for this purpose.
We focused on archival samples because we wanted access to a large number of available patient samples—and archival material is where you get the fastest access to the largest number of samples.
Gottardo: When Owkin came to us, they had selected the GeoMx DSP platform for spatial imaging. I had worked with 10x for a very long time, and I knew they were the industry leader. GeoMx is what I would call a bulk technology, because you basically capture an area of the tissue and then you measure the average gene expression of all the cells and all the genes that are in that part of the tissue. It’s nice because it's visual, and it's targeted so you can go into regions that are important to you, but at the same time, it is not really a high-resolution platform.
The best-in-class at the time was the Visium platform, and we had a lot of experience with it generating data used in past published papers. So we told Owkin we thought it was a mistake to use the GeoMx DSP, and they should use Visium. So I suggested we do a pilot.
We did a head-to-head pilot study, generated some data, and compared the two platforms, which I think was super important. The whole consortium was going to be based on that choice of technology, so we wanted to make sure we used the best one.
What were the most significant findings from your comparative study?
Madissoon: The three methods that we chose to test were GeoMx, Visium spatial transcriptomics, and we also added Chromium to the study, which is a single cell method. We wanted to test these methods in a real-life setting with biobanked materials where the RNA quality can typically be really bad.
The main categories we wanted to assess were operational feasibility of obtaining the data and the biological insights gained from the data. Operational feasibility included the experimental setup and material requirements, and then scaling to a high-throughput setting in the wet lab. We also wanted to assess the possibility for automation of data processing and analysis.
Between the two spatial methods, what we appreciated with Visium was the standardized workflow. This was really apparent in the setup and routine lab performance and data processing phases.
Regarding biological findings, we found that GeoMx can get a more specific signal for selected cell types than Visium, but we also really valued Visium's unbiased approach which helped us get a more comprehensive characterization of the entire tissue and gave us many more data points. So we concluded that Visium was a better suited method for our high-throughput study that's primary purpose is discovery.
In addition, we demonstrated that integrating Chromium single cell data brings high added value to spatial analysis. In MOSAIC, we are performing single cell analysis for the majority of the patient [samples] and have successfully applied this to even core needle biopsies. However, it does have the highest tissue requirements, which can be a limitation for very small biopsies and precious samples.
Gottardo: One thing we wanted to look at was reproducibility. We repeated the assay, both with GeoMx and Visium, in replicate sections from the same tumor so we could do a head-to-head comparison in technical variability. This was really good for Visium, but it wasn't as good for GeoMx because there was a bit of batch variability with the regions of interest, or ROIs.
Then we wanted to see, in terms of the biology, can we identify patient-to-patient variability? Can we identify heterogeneity within the same patient? If I look at a tissue section, and if I take some part of the tissue at the top and at the bottom, can I identify transcriptional heterogeneity across the tissue section? We could do this very well with Visium, because you get the whole tissue—you get these spots that have a uniform layout, and that's really nice. There's no bias in sampling.
What we found with GeoMx is that, because it's a visual selection of ROIs, there's a human bias. You might say, “Oh, this looks interesting, this looks interesting,” but you might have missed something that's really interesting because you didn't see it visually. We have examples of that in the paper, such as a tertiary lymphoid structure which we could see in the Visium data even when we didn't select for it. But for GeoMx, this wasn't selected so we missed it.
Another one of the big goals of the MOSAIC consortium study is to potentially identify new drug targets for specific cancer indications. So we also wanted to see if we could do that in this context as well. Can we identify targets that might be different across patients? Can we even identify heterogeneity within a specific tumor that would warrant a combination therapy?
We showed that for drug target identification, you need the unbiased nature of Visium. I think about unbiased in the amount of genes we are able to detect, but also the unbiased spatial nature of sampling the entire tissue section. If you're biasing yourself one way or the other, you might be missing very interesting areas.
The last goal, which was more or less in line with exploring the technical variability of the assays, was how easy is the platform to use. There wasn’t just one core lab that would generate all the data. We had to ship things to multiple centers where each one would generate their own data. If the operation of the instrument was very challenging, or if there was a lot of manual intervention, that could really impact the robustness of the assay.
In talking to people who work on the GeoMx platform, it seemed it was much more time consuming, and the software is very difficult to use. This was another aspect that was important to us—how practical and robust the platform is, and we found that this wasn't really the case [with GeoMx].
How did this comparative study influence your perspective on the use of archived tissue?
Madissoon: I must say, I was very skeptical knowing that the archival samples have degraded RNA. But to my surprise, all three methods, which are probe-based, performed well in capturing the biological signal and in actually finding relevant things. Sanity checks all performed well in the biobank material, even for very poor quality tissues.
To me, it was a happy surprise that we could get biological insights and valuable data from the archived material and that it also works with single cell sequencing, even with the whole dissociation aspect.
Gottardo: One important aspect when you think about biobanks or the samples that we may have access to, is that they might be limited in scope. Maybe we have a few indications and maybe there's a huge amount of heterogeneity. So if you want to define cohorts that are more homogeneous, maybe based on prior treatment, on age, or whatever inclusion criteria you have—looking at multiple partners or hospitals helps a lot.
We can now build these finer level cohorts that will allow us to really answer specific questions. The ability to use archived tissue samples and do a very fine selection based on very clear clinical or inclusion criteria is what’s really important.
When you think about answering critical questions, you have to think about experimental design. Making sure you have enough samples, that things are balanced, that there are no confounding factors, that your control and treatment groups are matched. There are a lot of things you can do to maximize statistical power and reduce false positives or false discoveries. You want to maximize the chances of finding something that is going to be real.
So having the ability to do these sorts of assays which are unbiased or high dimensional on FFPE tissue has really been a game changer. It's been very, very important for enabling these kinds of studies that would have been impossible otherwise.
If you had to do everything prospectively, collecting samples and things, it would take forever. We would have never been able to do 3,000 patient samples in less than 18 months.
A critical element of the MOSAIC study is to build AI models. What are some of the ways that AI can enhance the interpretation of complex biological data, such as single cell or spatial transcriptomics data? What were the primary objectives of using AI in the context of your study?
Madissoon:
The main objective for Owkin in general is to assign the right drug to the right patients.
For this, we want to use the AI algorithms on MOSAIC data to help identify patients and patient subgroups, and identify drug targets specific to these newly identified patient groups.
We can also ask whether there are already existing drugs that would be the most suitable to these specific patient subgroups. Since MOSAIC is a huge dataset—even with Visium only there are going to be 75 billion data points—AI can really considerably speed up the analysis, as well as integrate multiomic modalities like imaging, Visium and Chromium data, and even DNA mutations. It can combine a patient's information on so many different levels.
Additionally, the AI models can be used for multiple individual data analysis steps, such as to identify new cell types and regions on tissue images, and then link them to the spatial gene expression data. Representation learning can identify spatial patterns on Visium data or biomarkers from single cell RNA sequencing and correlate those to disease states, or to the success versus failure of a treatment.
In the future, there could also be AI diagnostic models that use molecular data such as spatial or single cell transcriptomics to predict the patient's response to a given drug.
Overall, we are using AI analysis and MOSAIC data for multimodal integration, so we can take account of the imaging, transcriptomics, genomics, and clinical data all together to accelerate discovery and enable precision medicine for cancer patients.
Gottardo: I like to think about it on different levels. You have data processing and representation of the data. Some examples of that could be if we want to denoise the data and enhance the signal quality—whether we're thinking about deconvolution or increasing the resolution of the image, that's one way we can use AI.
We’re using it for integrating data, like single cell and Visium and H&E—how do you integrate these different modalities? You can use what we call latent space modeling, where you take these high dimensional data, use a lower dimensional representation of each of these modalities, and integrate them.
Some of the other things we're looking at is graph-based modeling, where we can model the cell–cell interactions within the tissue. There's also the more traditional AI tools, or prediction modeling, where we want to predict a clinical outcome or identify new drug targets.
All of these things are different elements of AI that can be used, some of which are used to automate a repetitive process and some to generate new insights.
The other thing that is important for MOSAIC is enabling what we call foundational models. These are things that can be very good at performing a wide array of tasks. These kinds of models generally require large amounts of data for training.
[The MOSAIC study] is really an asset because it's probably one of the largest datasets where we have not only multi-resolution data—single cell, spatial, H&E—but we also have very rich clinical data. This will allow us to answer a lot of very generic questions, such as, when I see these tumor microenvironments, what does that mean in terms of predicting response? Or [someday], when you see a new patient, for which maybe you do or you don’t have all of these modalities, you [could] be able to potentially predict a response to treatment.
This is really the holy grail of having access to a large amount of data, having these general purpose models that work very well because they've been trained on a large amount of data.
With all the capabilities that AI is enabling, along with single cell and spatial analysis, do you have a moonshot goal? Are there ways that you hope to see the field of personalized medicine impacted by this project?
Madissoon: Cancer is highly heterogeneous, so every patient can almost be unique in this aspect. We believe that with this heterogeneous material, we need a very, very large dataset. That is one thing that MOSAIC is addressing, getting thousands of patient samples to compare.
The second aspect is to get very deep data. To get very high-resolution molecular characterization with single cell, spatial, whole-exome sequencing, histological imaging, and clinical data.
Getting all of this information from so many patients will allow us to discover patterns and patient subpopulations that would be more likely to respond to specific treatments. So we can assign a patient to a specific drug that is more likely to be successful than a random drug.
Gottardo: When we were approached by Owkin to form this MOSAIC consortium, I was very keen on it, because I would say this is a unique opportunity to not only generate data from more patients, but also get access to additional data and collaborate.
We can have the largest amount of data that's ever been generated from cancer patients using these latest technologies, spatial and single cell omics.
Again, one of the critical elements of this dataset is the scale and resolution, but also access to very rich clinical data. This is pretty rare because in the public domain, you can find large single cell datasets, and we're starting to see growing interest in spatial datasets. But anything that's linked to well-defined clinical outcomes and clinical data is extremely rare. The fact that we have multiple indications as well, that we have pan-cancer work, is phenomenal too.
So I hope that we'll be able to identify new targets—things that may be linked to better response or longer survival. And [then someday], when we see a new patient, what can we say? What should be the right treatment? Even if it's not identifying new drug targets, it may be personalized treatment, identifying things that would be very difficult to do if we were just limited to the data that we currently have.
Another critical element we're going to learn from this data is how to extend that knowledge to additional data that are not part of the consortium. From Owkin's point of view, or for anyone working on these kinds of data, especially in the pharma world, you think data, asset, drug.
I think there's probably going to be very important advances in AI that will have huge impact outside of the consortium. Not because the AI models have been applied to these data, but because they've been trained on that data, and then you can apply it in another context, whether it's drug identification, or treatment selection for patients.
What we're seeing a lot in these kinds of AI models is the ability to learn the relationship between different modalities and still be able to make a prediction, even if you don't have all the modalities. This is the generative aspect of AI, where you can learn a lot and then you can generate new knowledge on data that you haven't seen or that is not complete.
So I wouldn't be surprised if the biggest impact of a study like this is the ability to bridge what we know from this to newer studies in the future.
Is there anything else you'd want to share that you haven't been able to say yet or any final comments?
Gottardo: The one thing I would say, and I always say—I've known 10x for a long time, and they've been a great partner to work with. They are a technology provider. But the customer relationship goes beyond that.
Something that I love about working with 10x is that they listen to our needs. I'm speaking as a scientist who's worked with them for a long time. I think they're here to provide solutions for problems that we have and help us solve these problems. I think it goes beyond just a business transaction and into the science.
A study like this one and what we're doing with 10x shows that it's a partnership. I see 10x as a partner in this consortium. I think it's really important to have expertise across the board. We are providing the expertise in terms of access to samples, clinical expertise, computational expertise, and we need a partner that can bring the experimental expertise because at the end of the day, it's all about the data.
These interviews have been edited for length and clarity. We’d like to thank Dr. Raphael Gottardo and Dr. Elo Madissoon for their responses and for their groundbreaking work to advance cancer drug discovery and personalized therapy.
To learn more about Owkin and the MOSAIC study, we invite you to review this webinar that presents the goals and current status of the project. And find the team’s benchmarking study here.
References:
- https://aws.amazon.com/what-is/foundation-models/
- Dong Y, et al. Transcriptome analysis of archived tumors by Visium, GeoMx DSP, and Chromium reveals patient heterogeneity. Nat Commun 16:4400 (2025). doi: 10.1038/s41467-025-59005-9