Researcher Spotlight

Learn about notable advances in our single-cell science research with the Boston Children’s Hospital (BCH) community and beyond. We are committed to advancing single-cell science by making it accessible and empowering other scientists to dig deeper into their data, one cell at a time.

Expanding Horizons in Microbiology: The Breakthrough of Bacterial MERFISH

At CDN, we love diving into the science, but just as importantly, we love getting to know the scientists behind the discoveries. In this interview, we spoke with Jeff (PI) and Ari (Graduate Student) about their backgrounds, the journey that led them to their current research, and the recent paper on bacterial MERFISH—a groundbreaking technology expanding our ability to map bacterial RNA organization in single cells.

Meet the Scientists

Ari: A Journey from Switzerland to Boston

Ari grew up in Geneva, Switzerland, where he pursued his studies before making his way to Harvard Medical School for a fellowship during his master’s at EPFL. He joined Jeff’s lab, a lab deeply embedded in bacterial research, after the pandemic. This was about a year into his PhD program, following completion of his master’s project in a different lab.

What drives Ari? The thrill of discovery.

“We can’t discover new continents anymore—that’s been done. Maybe one day, we’ll explore other planets, but for now, biology is still an open frontier,” he says. “Problem-solving and troubleshooting can be frustrating, but they’re also what make this job so fun. When something doesn’t work, I get to ask: Why didn’t it work? How can I test it? How can I measure it?”

Jeff: From Physics to Biology

Jeff’s path took him from physics to biology. He earned his PhD at UC Berkeley, where he developed optical tweezers—biological “tractor beams” that allowed researchers to manipulate single molecules and study enzymes interacting with nucleic acids.

Wanting to explore the complexities of the cell, he transitioned to bacterial research during his postdoc. His work led to an exciting collaboration that helped pioneer spatial biology using a multiplexed RNA imaging method, MERFISH. “Our goal is to develop technologies that let us see things that were simply unobservable before,” Jeff explains.

Bacterial MERFISH: A New Window into Microbial Life

So how did this project come about?

MERFISH had already transformed how scientists study tissue biology, but Jeff wanted to apply it to bacteria. The problem? Bacteria are tiny, and their RNA is packed far more densely than in mammalian cells—about 1,000 times denser. This made it impossible to distinguish individual RNA molecules with traditional microscopy.

Ari and his collaborator, Dr. Yuanyou Wang, took on the challenge. “We had to solve a three-order-of-magnitude RNA density problem before we could even get started,” Jeff says.

Expanding Bacteria—Literally

To overcome this, the team turned to expansion microscopy, a technique that physically enlarges cells using hydrogels. Imagine a balloon: if you write on it and then inflate it, the text spreads out, becoming easier to read. Expansion microscopy works the same way, allowing researchers to visualize bacterial RNA organization at unprecedented resolution.

“Instead of making the microscope better, we make the sample bigger,” Ari explains. “We embed bacteria in a hydrogel, then let it expand. The molecules are pulled away from each other but are not themselves stretched that much, making them resolvable under a standard microscope.”

Mapping Bacterial RNA Organization

With this approach, Ari and his team were able to do something that had never been done before: map the spatial organization of over half of the E. coli transcriptome.

For years, scientists assumed bacteria had little RNA organization—after all, their small size and rapid molecular diffusion suggested otherwise. But Ari’s work contributed to a growing appreciation that bacterial transcriptomes are organized within the cell.

. “We found a surprising diversity of RNA spatial patterns,” Jeff says. “Some RNAs are evenly distributed, but many exhibit strikingly distinct localizations.”

These findings raise new questions: Why do bacteria organize their RNAs? How does this organization influence cellular function? The research has opened an entirely new avenue for studying bacterial cell biology.

 

Unexpected Insights into Bacterial Metabolism

One of the most surprising findings came when the team looked at how bacteria adapt to different nutrients. Bacteria don’t just switch to metabolizing a new sugar when their preferred one runs out. Instead, they go through an exploratory transcriptional phase, testing for alternative sugars—even ones that aren’t present in the environment.

“It’s like they’re sending out molecular feelers,” Ari explains. “They don’t immediately switch to the second-best sugar. Instead, they express genes for several different metabolic pathways before committing.”

Using their technology, the team was also able to uncover the molecular activities of subpopulations within identical bacterial cultures—some focused on amino acid synthesis, others on their functional machinery.

“This is a fundamental insight into bacterial gene regulation,” Jeff adds. “Even in minimal conditions, bacteria are exhibiting metabolic diversity.”

The Future of Bacterial MERFISH

Beyond understanding bacterial behavior in the lab, this technology has potential real-world applications. For the first time, researchers can apply MERFISH to bacteria inside living hosts, providing unprecedented insight into microbial interactions in their natural environments.

“We don’t just want to study bacteria in test tubes—we want to understand them in their real-world settings,” Ari says. “This technique gives us a way to do that.”

Closing Thoughts

Before wrapping up, we asked Jeff and Ari a few fun questions.

What’s the most impactful scientific advancement of the last decade?

For Jeff, it’s the explosion of spatial transcriptomics—merging genomics with microscopy to study biology at an unprecedented resolution. “The ability to map tens of thousands of RNAs while preserving spatial context is a game-changer,” he says.

Ari, on the other hand, struggles to pick just one. “AI is changing the way we process biological data. Gene therapy is becoming a reality. And cryo-electron microscopy has revolutionized how we visualize molecular structures. There’s too much to choose from!”

If you weren’t a scientist, what would you be doing?

Jeff: “I love discovery, but I also love mentorship and teaching. If I had to drop research, I’d probably still be a teacher.”

Ari: “I love the engineering side of research—building tools, optimizing processes. If I weren’t in science, I’d probably go into industrial design or automation.”

Final Takeaways

Jeff and Ari’s work is pushing the boundaries of bacterial research, making it possible to study microbial gene expression with an unprecedented level of detail.

Their study demonstrates that bacterial RNA isn’t randomly distributed—it’s spatially organized, and understanding this could transform how we think about bacterial physiology. By adapting MERFISH for bacteria, they’ve unlocked new opportunities to study bacterial adaptation, gene regulation, and cellular behavior in ways that were previously impossible.

The future of bacterial spatial transcriptomics is just beginning, and we can’t wait to see what comes next.

 

READ MORE

Gene Therapies to Treat Patients with Sickle Cell Disease

This version of our Researcher Spotlight takes a slightly different, Q&A-style narrative, because Vijay’s tireless work has inspired a massive breakthrough in Sickle Cell Anemia treatment.

Let’s dig in! 

Q: We wanted to talk to you about the recently, December 2023, FDA approved gene therapy Casgevy, to treat transfusion-dependent beta thalassemia (TDT), a type of sickle cell disease. As we understand, this is the culmination of work you started during your PhD back in 2004, in Stuart Orkin’s lab. Could you please tell us a bit about what piqued your interest back then to work on this project?

A: Oh, well, maybe I’d answer that in a little bit of a different way, if that’s okay? – Because I think it’s a great question! – I think that I became very fascinated by the problem we were trying to address, which was: ‘how during human development is there this switch from a fetal form of hemoglobin that you express throughout much of gestation to the adult human’ – And we had known from decades of work, including work here at Boston Children’s Hospital, that if you can induce that fetal form of hemoglobin, or if you naturally just have more of the fetal hemoglobin, you actually can do much better if you have sickle cell disease or dialysis…and that was known for a long, long time. Yet how this process was regulated, was really unknown.

 

And I bring that up because, you know, when I entered my PhD I was like, “okay, I really want to understand this problem”, a little bit naively, but there was a lot known about it, and not very much known about the molecular basis of it.
When I started my PhD and wanted to address this problem, I went to my PhD advisor – He said, ‘“oh, you know, [he was sort of a little bit skeptical because he’d worked on this], decades ago…lots of people have kind of lost their careers doing this…but why don’t we go ahead”. And soon thereafter, he said, “why don’t you help me write a grant?” [And that grant renewal]…it actually got an almost perfect score…at the time it was like 110 or something. It was really well received. But I can tell you retrospectively [laughing] that every one of the aims of that grant proposal actually totally failed. 
The [aims] failed in part because there were assumptions made about the system.  And it turned out that the thing which really helped us make a lot of advances was advances in genomics and human genetics. And specifically, we were able to identify this association in the BCL11A gene through genome-wide association studies.  
So I bring all that up because, as we think about the CDN and the value of emerging genomic approaches that are happening…to me, I see tremendous value because as I look back to what we ‘were able to do’, we had a question in mind – but couldn’t answer it – and it was really through advances in genomics and human genetics, that we could make any headway to understand it. 
I say all of this because I think it’s a broader message that I hope will ring true many more times! That’s what excites me about all these advances in single cell biology…we now have an opportunity, I think in many ways, akin to what happened twenty-ish years ago, that will allow us to really start to make some of those important advances. 

Q: Could you explain a little bit about what Sickle Cell disease is, and specifically transfusion-dependent beta thalassemia (TDT) is, how it works, and its prevalence? 

A: At a basic level, both sickle cell disease and beta thalassemia, are actually the most common monogenic diseases in the world…because of selection for carriers..who have these mutations, because it prefers resistance to severe forms of malaria, it turns out. But really both of them occur in the same gene!..so, due to mutations. The adult beta hemoglobin molecule or HBbV is the gene, and it turns out that sickle cell disease, there’s a specific point mutation – that causes the hemoglobin, once it’s assembled as a protein to have a tendency in the deoxygenated state to polymerize and actually to form red cells. So you get these polymers of hemoglobin that form, and they deformed the red cells and cause [sort of] red cells to then stick in small blood vessels, blocking blood circulation, causing pain and organ damage as a result. 
Thalassemia on contrast, is a condition also due to mutations in the adult beta hemoglobin molecule…but it’s really [moreso] due to reduced production of the adult beta hemoglobin molecule…so there, you have a production issue. 
Really, the problems in beta thalassemia…to dive into that a little bit more, are not because of the deficiency of beta hemoglobin, but actually because of the imbalance between the other part of hemoglobin, the alpha hemoglobin and the beta hemoglobin – if you have too much of the free ‘alphas’, it turns out they actually precipitate themselves and cause this sort of precursors in the bone marrow…to die and not be produced effectively. 
The reason I bring up that both of ‘these’ are showing these two problems…the adult beta hemoglobin molecule, it turns out that if you can get rid of, or turn down the amount of adult beta hemoglobin, you can help with both of these conditions! And the way you naturally do that is, during gestation you have this fetal form of hemoglobin, and that is a beta-like hemoglobin molecule, but it substitutes for beta hemoglobin – so nature has sort of devised this ‘way’, and it turns out that children with sickle cell disease or thalassemia don’t manifest into latent infancy with symptoms typically, because they’re protected while this fetal hemoglobin is ‘on’ – so, nature has sort of shown us how to do this! We just didn’t know what the molecular regulation of that process was.

Q: Why do we need these two Beta Hemoglobin Molecules/why don’t we just keep one for the entire time?

A: I think it’s a really great question! Actually, most mammals had done just that. Outside of old world primates, almost every other mammal has just a [one] adult hemoglobin – and no switching process! Probably some of it has to do with the fact that fetal hemoglobin has a higher oxygen affinity, so it facilitates a transplacental oxygen transport. But it turns out, for example, that if you measure mouse red cells – mice do a fine job of transferring transplacentally oxygen – there’s just other adaptations. you can have to enable that. I bring this up because it turns out, there are humans naturally with mutations or deletions that cause high levels of fetal hemoglobin, and even if you have 98% fetal hemoglobin, you can give birth just fine…So there’s ways that you can adapt and get around this, but of course, evolution doesn’t care about the individual that cares about thousands of individuals. And so maybe it would help people over time to have this kind of adaptation. But it’s a really great question because, ‘where and how it evolved’, is something that we don’t fully understand.

Q: What is the main role of BCL11A?

A: BCL11A really serves in some ways as a rheostat to regulate fetal hemoglobin. So what we know so far is, it acts as a transcription factor – and when we uncovered BCL11A’s role in fetal hemoglobin switching, it turned out, it was really well studied for its role in B lymphocyte development and its role in neural development.
So people had characterized it as an important transcription factor for both of these processes. And its role in red cell production was just not at all appreciated, right? It wasn’t one of the key regulators of red cell production canonically…I think of it as an accent, because as soon as we found out, we said ‘alright, let’s turn it down, see what happens at the time etc…’, with sort of the antiquated siRNA or shRNA approaches… immediately, we could see this result where, robustly, you reduced fetal hemoglobin. I mean, it was remarkable. And the reason I say it was remarkable is because we had been trying all these other factors that we had had hypotheses about and all of them had failed, right? – so we knew the system was working as we expected…and yet, we never hit upon something that had this effect.

 

Q: A little bit later in 2015, Daniel Bauer published a paper where he found out about the enhancer that promotes the expression of BCL11A, can you tell us a bit about your experience as it relates? 

When did you think about using the CRISPR-CAS9 system as a mode of therapy?

A: Right, yes, that is a very interesting story! 
They had been looking for where the genetic association was. We knew it was in the BCL11A gene, but it was in a non-coding sequence. It turned out that it was mapped to a region that contained this enhancer. The funny part of the story is they identified a large PANA10 KV region that included the variation. But, to this day, we still do not understand how these variants act and they only bring this up because I still think that there’s more biology to learn…a lot to learn about how this variation acts, and things that we’re trying to study related to this. Because they identified an enhancer, they actually started to go into and use CRISPR tiling approaches to actually map the most active parts of that in cancer. One of these reasons, which is what they published in the 2015 paper from Canberra and colleagues, was actually the really active region that was sort of the key for allowing expression – and if you edited that, One of these reasons, which is what they published in the 2015 paper from Canberra and colleagues. It was actually the really active region that was sort of the key for allowing expression – and if you edited that, which is essentially what KAF-JV does, then you can nicely turn down BCL11A levels.

 

In 2012, 2013 it started to be applied to human cells..so I mean, just remarkable to see how quickly things went.
I think one message I would say [if I was, you know, cause I talked to younger folks about this a lot] is I think one of the most exciting things for me is what’s to come! Like, you know, as I look at ‘that’ progression, I think in some ways the CDN efforts are going to lead to hopefully many, many more advances…much akin to this, right!? Because you know enabling tools, i.e. if it wasn’t for GWAS, we would never have been able to identify B-cell RNA…the HAPMAP project, and the ability to understand human genetic variation…If it wasn’t for CAS9 and CRISPR CAS9, there wouldn’t be an approach to readily disrupt this regulatory element. If it wasn’t for efforts like the ENCODE effort, right, there wouldn’t have been the same understanding of some of these non coding elements. And yeah, I still think that there’s much more to understand even about this biological process. But I would argue, you know, these types of approaches and technologies really drive the kind of innovations that we’re seeing. One may argue, ‘why would a children’s hospital need a single cell effort…I argue that this is exactly why, and what we need! Technologies are what drive the advances in innovations!
To tell an “aside”/good story…so when our 2008 paper was published in Science, describing the B-Cell regulator, [this is switched from fetal to adult hemoglobin]…on the exact same page, just on the next page after our article – literally sharing like the physical paper – because Science used to publish ‘back to back’, and be like overlapping…there was a paper on Bacterial Immunity from Luciano, Mara Feeney and Eric Timer. That was actually the first description of CRISPR-Cas9 being this sort of bacterial defense system. And [thinking now] that was remarkable, right? I didn’t pay any attention to that back then…I was like “this bacteria stuff, it’s weird [laughing]”. But it just goes to show that this is where convergences of biology, as you were saying, are so important, and so valuable, right? They are the things that come together. I mean, If you bring people who are interested in pediatric and diseases and you introduce them to single cell genomics, which is what you’re doing, who knows what can happen!

Q: What do you think are the key steps that have enabled such quick turnaround from the academic setting to now being an FDA approved therapy?

A: Well, I think there’s probably more complexity to it. And the reason I say that is, because once it was clear that it was a good target – I think that there were lots of companies who were interested in it even before the enhancer was identified – There were companies already thinking about ‘how do we neither use gene therapy approaches to turn it down?’. Or, how do we edit desalinate directly? It turned out. Because of off-target effects in blood stem cells, that wasn’t a great idea, but there was also some interest and true that there were companies that were actually starting to work on this probably well before Vertex said CRISPR therapeutics started to work on this, a few companies were in that pool. I think that it’s also just a testament to what Vertex and CRISPR did in terms of execution, how they ran the clinical trials and how they implemented this. Because while it’s a remarkable success and has gone remarkably well, I think that it’s also testament to the fact that they could kind of come out and say, ‘look, this approach works now in about 80 some odd patients…I think the papers are going to be publishable…’. Describing those clinical trials, and that’s really an implementation issue – I think it’s something we have to bear in mind that sometimes it’s like, ‘okay, we make the great discovery and it can just get right to a therapy’… And there’s a huge amount of hurdles to come and to overcome. It is really just a testament to the fact that they figured out: How do you best dose patients with this? How do you build on this? – On the other hand, the one thing I’ll just remind you of, which I think is the exciting opportunity in some ways, is that ‘this’ is really built on decades of studies of the blood stem cells. And so were it not for those kinds of advances –  We wouldn’t be able to necessarily manipulate the blood stem cells and transplant them back in the way that we can. So I actually think that this also speaks to ‘why do we need to even just get basic insights into blood stem cells?’ – I would argue that’s really what has enabled and hopefully we’ll continue to enable further advances in this field!

Q: Can you elaborate a bit on how the therapy works? 

A: Yes! So you actually take out the blood stem cells, you collect them through kind of, you could mobilize blood stem cells – It turns out if you give this CXCR4 antagonist. Okay. Support and in sickle cell patients, you don’t need to give another drug called GCSF – Now, I think that that’s actually where the limitation is though right now. So basically the patients in these clinical trials, they’ve had to go for four separate collections…on average, to be able to get enough blood stem cells. So you need to get enough blood stem cells! I would challenge the people thinking about these problems with: What if you can do that with one collection? Wouldn’t that improve patients, would that improve the product? So that itself is advancement waiting to happen…an advancement that I think single cell genomics is inevitably going to contribute to, right? 
Because you can studying blood stem cells in way that you can’t with any other technology and using this kind of approach, then you can take and modify these cells – well, what, if we can better understand how we get better culture stem cells ex vivo and handle them, and could only culture to them over a few days before they lose their ability to act to stem cells – that limits the therapies right? Because they have to be put back.  
There are a huge amount of innovations that have yet to happen that will enable that. In some ways I feel like it’s exciting. You can do it…but version 2.0, version 3.0, those are waiting to happen. And I hope that people here and elsewhere will start to help those advances happen.

 

Q: This is something that you probably started working on 15, 20 years ago…now that you have seen it come to fruition, how did you celebrate? 

How did you feel when you saw that this finally was getting actual, actionable clinical results, and then that patients were actually being treated with this? 

A: Well, I will say I’ve been following the clinical trials, so when the approval came about it wasn’t a surprise…we sort of had, at the end, some warning…[laughing] you know, there’s lots of ‘orders’ and ‘people’ who had sort of been ‘talking’…and I think in some ways it was really exciting to see what had happened. In many ways though, seeing the ‘approval’ really felt like it lit an additional fire! The ‘approval’ said [to me], “Gosh, if that worked shouldn’t we redouble our efforts to continue to do what we’re doing, to try to use insights from human genetics, to understand more biology, to help more patients?”
So in some ways I think it served as a real big inspiration! The other thing to me is…it just makes me even more excited about where the future is going to! Because, as we think about the real nuances of this, you realize, “well, It’s good, but man the things that we’re doing in the lab today are even better!” – And I say ‘we’, in the broader sense like, “shouldn’t ‘we’ move this stuff?…to the clinic, in a way that we could not do in any other setting?”
A lot of the time people say, “Isn’t clinical care where it needs to be?…shouldn’t a clinical hospital focus on the here-now in implementation?” – And, sure they should, but they should also focus on, as BCH does, innovating and on developing the next generation of therapies. And to me, that’s where we’re going to see huge advances!
You know, you had sort of said early on, “Well, it [from my beginning’s and findings in 2008 as a PhD student, to Orkin in 2016, to now – clinical trials for a Sickle Cell therapy] seems like a fast time scale?  [Smiling] I hope that we look back at this interview and know ‘that’ was a long time scale…we were saying then it was a fast timeline but, now [in the future] ‘that’ was slow…what were we talking about back in the day [2024]?”
I would be disappointed if say 20 years from now, we aren’t saying “wow…for [only] years from discovery to medication!”…This would be a satisfying way to see things [smiling]!

Q: How do you envision the future of gene-editing therapies from here?

A: Well, I think there’s a couple of lessons! I think one thing is BCL11A, by targeting it, you’re not at all targeting the primary issue. Like I said, the other mutations are in beta hemoglobin, they are in sickle cell disease, thalassemia etc…So, maybe there’s workarounds for other diseases that we need to understand. And I think that to me, that’s why it’s exciting to understand some of the underlying biology, to understand the pathogenesis, the disease – because maybe you don’t want to target a primary lesion, or maybe you cannot target it…But, then maybe there’s a workaround?
And I hope that as we better understand biology, we’ll find more of those workarounds, and many of those workarounds might be good targets for therapies.
One lesson I would walk away with is – and as I mentioned before, historically, the field of hemoglobin switching was being worked on for decades [and I’m biased] – Genomic approaches allowed something that no one could have predicted to be identified. And to me, that’s a really important lesson! I think it says even where a lot of times genomic approaches are critiqued, e.g. people say, “what’s the value of this?”, well I think this is a clear cut example where the value proposition is there.
It was easy to dissect this because we had a lot of background on how we think about red cell biology, how we think about globin gene regulation…and how much biology is waiting to be discovered. We want to come through these kinds of approaches. So that’s the second lesson I would take away.
A third thing to walk away with is…I think what was done is exciting, but I think what’s to come is even more exciting! I think there’s still more to be done and I think that there’s many more questions to be asked. I hope that those who are addressing those questions will not only lead to interesting scientific advances, but also will lead to really important clinical advances. And I think that that’s why being in a place like Boston Children’s Hospital, for me, is particularly exciting, right? Because I can spend time across the street, seeing patients. And then working with people like you [CDN, other scientists etc], it’s also awesome, right? Because it’s like gosh, this is going to lead the next generation of how we think about biology in a different way…and that’s the fun part of what I do day to day!…It’s the fun part about being in a place like Boston, right? There’s people thinking about all sorts of things across all of the same spectrum and we all kind of get to play in the same sandbox…So it’s fun, you know?
READ MORE

Collecting Metadata In Large-Scale Projects

A case study of the HCA integrated gut cell atlas

 

Introduction

Human experiments can have complex designs and a great deal of clinical covariates that affect analysis. Collecting these covariates can be a difficult process because they span multiple experimental levels, from experimental and analytical methods to donor and sampling information. This is especially true in single-cell experiments, where thousands of cells are collected per sample, and sometimes multiple samples are collected per individual. 

Here, we use the term metadata to refer to any variable related to the dataset, participant, sample, cell or gene except the core cell x gene expression matrix. It is critical to organize these metadata as complexity increases, and if we want to bring in published data for comparison, we need to create or follow a standard metadata format to be able to make comparisons. When analyzing published data, required metadata fields are sometimes unavailable, so it is necessary to reach out to the authors of published data. When reaching out, providing an online standardized metadata sheet improves accuracy, skips redundant data wrangling steps, and enhances collaboration.

Here I will explain methods I have used to make metadata collection and organization feasible in an atlas-level using Google Sheets. We use this collection process in the HCA integrated gut cell atlas project.

Design of Metadata

The HCA provides a required metadata schema that also includes metadata fields required by CELLxGENE. The HCA metadata guidelines are highly detailed (https://data.humancellatlas.org/metadata). The HCA follows similar guidelines to CELLxGENE, which is more simply detailed here (https://cellxgene.cziscience.com/docs/032__Contribute%20and%20Publish%20Data) in terms of single-cell data analysis. We adhere to both schema to make dataset upload to the portals easier. The following portals increase public data accessibility and offer unique methods.

We create separate google sheets for defining and collecting tier 1 (common and public) and tier 2 (patient-protected and tissue-specific) metadata. The google sheets for Tier 2 metadata can then be downloaded while preserving formatting so they can be filled offline in a secure location. Tier 1 metadata is collected on the Google Sheets to make collection more collaborative.

First, a metadata definitions file is created with detailed descriptions of each of the metadata fields, as well as dropdown menu options for fields which can be restricted to categories. Here is an example sheet. 

https://docs.google.com/spreadsheets/d/1Jz02P7ZnqaigofvXVOxrYJh9bhvHa7nY-4EdJrt4BRA/edit?gid=0#gid=0

We use a custom python library (available on our github: https://github.com/CellDiscoveryNetwork/MetaManager) to then generate empty Google Sheets with the custom formatting. This library contains functions for generating Google Sheets and reading whatever is entered into these sheets into python data frames. 

Once the Google Sheets are made, we share these with all of the members of the experiment, because the people who gather the donor-level information might have been different than those who decided experimental or analytical (such as alignment parameters) methods. Once the Google Sheets are created, we can start setting up the tracking method.

With the python library, we can use pattern matching or other validation methods in python to determine whether the metadata was filled out correctly or not. We then visualize the results on a heatmap, where each row corresponds to a dataset or experimental protocol and each column corresponds to a metadata field. 

One heatmap is made per metadata level, for example:

In the Gut Cell Atlas project, we publish these heatmaps on a public website generated by Google Cloud https://hca_gut_cell_atlas.storage.googleapis.com/metadata_correctness.html which we then share with all of our contributors. We have set up a server that downloads the metadata from the google sheets automatically and updates these heatmaps every couple minutes, so that we can give feedback on metadata entry in real time. 

In conclusion, we have created restricted metadata entry sheets which can be collaboratively filled, and we have a method for checking whether these were filled out correctly. Google Sheets also has built-in version control, so if anything goes awry, there’s a rewind button. Being online and available to all collaborators, Google Sheets helps prevent siloing of bioinformaticians, medical staff, and experimentalists and lengthy email exchanges about correcting metadata, and if doing meta-analysis, it streamlines large-scale collection of metadata from collaborators. 

READ MORE

Combining GWAS with scRNAseq to better understand asthma

It is no question why we sat down with Sarah for this edition of the CDN Researcher Spotlight…her work, passion and recent success speaks for itself. She has worked on a vast amount of research projects in her professional career and recently has published a preprint on the relation between rhinovirus infections and childhood-onset asthma.

 

The study that became the preprint:

Could you summarize to the general public why you decided to carry out this study and what are the main takeaways?

“We are interested in understanding the molecular mechanisms of immune-mediated diseases by using a combination of genetic and multi-omics approaches.

Asthma is one of the themes we are studying in the lab since we know it is a disease influenced by genetics and the environment, however we only understand some part of the  genetic origin. Only 47% of risk loci co-localize with leukocyte T-cell, a cell type that is part of the immune system, and has been associated with the genetics of asthma.  So we decided to tackle this question by thinking about other cell types that are under-studied in the context of asthma and in functional genomics studies. Epithelial cells for instance, is another main cell type studied for asthma but not usually from the genetic perspective; so we decided to focus on those cell types; because epithelial cells are the first line of contact for respiratory viruses and allergens. Moreover, epidemiological studies have shown that viral infections in early life are associated with childhood-onset asthma development and it remains unclear whether rhinovirus is causal in asthma or whether it is a biomarker for children already predisposed to asthma.”

Keeping in mind the general public, could you briefly explain what are GWAS and how are you incorporating them into this study?

“I will start with an introduction about GWAS, which stands for Genome-wide association study and it aims to identify associations of genotypes with phenotypes by testing for differences in the allele frequency of genetic variants between patients and healthy controls. The summary statistic obtained from a genome-wide association study is what use for our analysis; you can think of millions of variants that will have an associated P-value and then we can assess which of those variants are associated with the disease. 

In our study we are combining single-cell RNA-seq data and GWAS data by looking at genes that are over-expressed in a specific context (ex: genes being unregulated upon rhinovirus infection or genes being upregulated upon asthma). And then we check if those genes carry more risk variants than expected by chance. In this way, we are identifying cell states that are likely mediating genetic risk for a disease; and in our case we found an enrichment of genes in childhood-onset asthma risk loci in epithelial cells infected with rhinovirus. And thanks to single cell RNA-seq, we found that it is the non-ciliated airway epithelial cells that are likely driving the genetic susceptibility to childhood-onset asthma.”

What do you think is the power of using single-cell technologies?

“Single-cell technologies are great for many reasons, compared to more ancestral methods where we would analyze bulk populations of cells, single cells allow researchers to study individual cells. Bulk methods provide a snapshot of the overall population, whereas single-cell analysis captures the dynamics of individual cells over time. These single-cells technologies allow to identify cell subset/state in a more fine grain manner, this permit to identify rare cell types and understand better their roles and potential mechanisms associated with them. We can also use single-cell technology better understand cell-to-cell interactions and communication.”

Could you elaborate on how identifying epithelial cells as key cells expressing asthma-associated genes can lead to better treatment or prevention of asthma?

“The identification of airway epithelial cells infected with rhinovirus, and specifically non-ciliated cells as being the one upregulating asthma-associated genes help to better understand the genetic origin of childhood-onset asthma. It has been shown in the literature that drug targets that have genetic association evidence are more likely to be approved, and then move forward to clinical trial.” 

If you could carry out further analysis, what would you like to have done?

“Now our results represent preliminary results from cells taken from a few individuals (10-20 individuals), if we could replicate our results in cohorts of larger sample size (hundreds to thousands of individuals) it would be ideal. In that way we could identify the likely causal genes of risk variants and this will help to characterize more precisely the underlying molecular mechanisms. We would also need functional validations to prove and understand better our findings, we could think about CRISPR strategy for example. Finally, if our findings are further validated we could imagine the development of a rhinovirus vaccine or other protective intervention in order to prevent childhood-onset asthma.”

READ MORE

Integrating single cell & spatial transcriptomics

Cells are the basic building blocks of all living organisms. As biologists we are in a continuous endeavor to characterize and unravel how they come together to form and maintain complex tissue structures and organisms. Understanding the underlying molecular mechanisms ruling these systems is critical to understanding how cells communicate, interact, respond to stimuli, and evolve in steady-state and disease conditions (Wagner et al., 2016).

Single-cell RNA sequencing (scRNAseq) technologies have revolutionized our ability to understand cells. scRNAseq is an untargeted approach to profile individual cells at the whole transcriptome level. In 2009, Tang, et al. showed how an mRNA sequencing assay of mouse blastomeres increased transcript detection sensitivity, and enabled the identification of more than 1,700 previously unknown transcript isoforms2. Already, at this nascent stage, scRNAseq was showing its strengths and hinting at the powerful revolution about to come. Currently, there are many different approaches to generating scRNAseq data. All of them follow, broadly, the same basic steps: 1) single cell isolation and capture, 2) cell lysis and barcoding, 3) reverse transcription, 4) pre-amplification, and 5) library preparation and sequencing. For more detailed information on the technologies refer to established literature such as Svensson et al. 2018 and Mereu & Lafzi et al. 2020.

However, in order to fully understand a cell’s behavior and role in the tissue, it is critical to look beyond the cell in isolation. It is necessary to understand where it is located and how it is interacting with its neighbors. Ultimately, maintaining the spatial context is key to understanding tissue architecture of single cells in health and disease. However, single-cell analysis methods lose this spatial disposition in the cell dissociation step. The spatial neighborhood of a cell defines its interaction universe through juxtacrine and paracrine signaling and determines which biological processes that cell will carry out. Bulk RNAseq and scRNAseq deliver the promise of transcriptomics at the tissue and single-cell level, but at the cost of dissociating cells and removing the spatial context. Therefore, technologies aimed at capturing mRNAs while retaining the spatial context have been developed for a more comprehensive analysis. One of the most popular spatial transcriptomics technologies used nowadays is Visium. Originally developed by Ståhl et al. in 2016 and commercialized by 10X Genomics in 2020 (Figure 1).

Figure 1. The Visium spatial gene expression slide assay (https://www.10xgenomics.com/)

The Visium technology, allows the user to profile tissue sections 6.5mm x 6.5mm, using ~5,000 spots. These spots are 55µm in diameter and therefore do not provide single-cell resolution. For reference immune cells range from 8-20µm in diameter while epithelial cells can go up to 25µm. Therefore, these spots are believed to capture the mRNA from 1-10 cells and sometimes even more, depending on the cellular density of the tissue.

Therefore, during his Ph.D. Marc set out to leverage the strengths of scRNAseq and spatial transcriptomics by integrating them and inferring the location of cell types and stateswithin complex tissues. To do so he developed SPOTLight (Elosua-Bayes et al,. 2021), an NMF-based regression model that learns gene signatures from the single cell data and uses them to decompose the cell types found within each spot. He then proceeded to apply it in different scenarios, for example, to better understand the tumor microenvironment in oropharyngeal carcinoma (Nieto et al., 2021). There he showed how the tumor presented different immune landscapes within and surrounding it. Some of these regions were immune active with cytotoxic CD8 T cells present while in others tumor cells had managed to dampen the immune response and showed a higher prevalence of regulatory and exhausted T cells (Figure 2).

Figure 2. Oropharyngeal carcinoma tumor immune microenvironment. Top left – H&E staining of the Visium capture area, bottom left – tissue compartments separating tumor regions from fibrotic and immune rich. Middle – Predicted cell type proportions of Regulatory T cells and CD8 Cytotoxic T cells. Right – Denoised gene expression of the respective populations of interest. Nieto et al., 2021.

These results highlight the heterogeneous nature of tumors and the importance of characterizing the different compartments found within them. Reaching this level of spatial resolution will enable us to better understand tumors and their heterogeneity, serve as prognostic markers, and ultimately deliver more effective personalized treatments.

The continuous advance of single-cell and spatial technologies will keep transforming our understanding of biology in health and disease. However, these advances must be translated to the clinic in order to have an impact on the general population. Nowadays, there is an abundance of data that comes from multiple modalities, tissues, and conditions, and much more is yet to come. The ability to integrate this information and draw the line from genomic variants to specific cell types in specific spatial niches to disease onset and treatment will be a grand challenge in the years to come.

 

References

  • Wagner, Allon, Aviv Regev, and Nir Yosef. 2016. “Revealing the Vectors of Cellular Identity with Single-Cell Genomics.” Nature Biotechnology 34 (11): 1145–60. https://doi.org/10.1038/nbt.3711

  • Tang, Fuchou, Catalin Barbacioru, Yangzhou Wang, Ellen Nordman, Clarence Lee, Nanlan Xu, Xiaohui Wang, et al. 2009. “MRNA-Seq Whole-Transcriptome Analysis of a Single Cell.” Nature Methods 6 (5): 377–82. https://doi.org/10.1038/nmeth.1315

  • Svensson, Valentine, Roser Vento-Tormo, and Sarah A. Teichmann. 2018. “Exponential Scaling of Single-Cell RNA-Seq in the Past Decade.” Nature Protocols 13 (4): 599–604. https://doi.org/10.1038/nprot.2017.149

  • Mereu, Elisabetta, Atefeh Lafzi, Catia Moutinho, Christoph Ziegenhain, Davis J. McCarthy, Adrián Álvarez-Varela, Eduard Batlle, et al. 2020. “Benchmarking Single-Cell RNA-Sequencing Protocols for Cell Atlas Projects.” Nature Biotechnology 38 (6): 747–55. https://doi.org/10.1038/s41587-020-0469-4

  • Ståhl, Patrik L., Fredrik Salmén, Sanja Vickovic, Anna Lundmark, José Fernández Navarro, Jens Magnusson, Stefania Giacomello, et al. 2016. “Visualization and Analysis of Gene Expression in Tissue Sections by Spatial Transcriptomics.” Science (New York, N.Y.) 353 (6294): 78–82. https://doi.org/10.1126/science.aaf2403

  • Elosua-Bayes, Marc, Paula Nieto, Elisabetta Mereu, Ivo Gut, and Holger Heyn. 2021. “SPOTlight: Seeded NMF Regression to Deconvolute Spatial Transcriptomics Spots with Single-Cell Transcriptomes.” Nucleic Acids Research 49 (9): e50. https://doi.org/10.1093/nar/gkab043

  • Nieto, Paula, Marc Elosua-Bayes, Juan L. Trincado, Domenica Marchese, Ramon Massoni-Badosa, Maria Salvany, Ana Henriques, et al. 2021. “A Single-Cell Tumor Immune Atlas for Precision Oncology.” Genome Research 31 (10): 1913–26. https://doi.org/10.1101/gr.273300.120

READ MORE

Get involved!