Gero unveils AI model for small molecule design without structure

ProtoBind-Diff generates novel compounds from protein sequence alone – a potential accelerant for early-stage discovery in aging biology.

AI models for drug discovery are becoming more capable, more flexible and, in some cases, more biologically agnostic. One of the more recent entries into this growing field comes from Singapore-based biotech Gero, which has announced ProtoBind-Diff: a generative model for small molecule discovery that works entirely without protein structural data.

Whereas most AI platforms for target-conditioned drug design depend heavily on 3D structures or docking simulations, ProtoBind-Diff is trained solely on protein sequence and ligand information. It learns from over a million active protein–ligand pairs, drawing on pre-trained embeddings to infer chemically meaningful interactions from primary sequence alone. According to the authors of the model’s preprint, this enables ligand generation across the full proteome – including “orphan, flexible, or rapidly emerging targets for which structural data are unavailable or unreliable.”

The implications for geroscience – a field often constrained by limited target tractability – are of note; by enabling molecular design for sequence-known, structure-unknown targets, ProtoBind-Diff may offer a more efficient route into the biological gray zones of aging.

Longevity.Technology: Much of the fanfare surrounding AI in drug discovery tends to fixate on optimization – faster docking, better scoring, slicker pipelines. ProtoBind-Diff, by contrast, is aimed squarely at the upstream bottleneck: the ability to open up the vast dark matter of the proteome to therapeutic interrogation. By conditioning molecular generation on protein sequence alone, rather than structural data, it offers a way to pursue targets that are disordered, orphaned or simply too obscure to have been structurally resolved. For aging biology – a field often accused of being target-poor and hypothesis-rich – this is more than just another model benchmark; it is a change in tempo. The less we know about a target, the more interesting it now becomes.

What makes this launch particularly noteworthy is not just the model’s design, but its intent. Gero plans to release ProtoBind-Diff’s weights and interface for others to explore – a refreshingly unguarded move in a space often dominated by proprietary platforms and closed loops. If the model performs as advertised, it could accelerate the early, hypothesis-testing stage of discovery that geroscience so sorely needs – enabling the rapid generation of probes even when structural certainty is absent. This may not render structure obsolete, but it does invite a shift in mindset: from awaiting clarity to acting in ambiguity. And for a field that traffics in complexity, heterogeneity and slow-moving endpoints, that might be just the nudge it needs.

Sequence in, small molecules out

At the core of ProtoBind-Diff is a masked diffusion model that generates SMILES strings – the text-based representations of chemical compounds – conditioned on protein sequence embeddings derived from the pre-trained ESM-2 language model. Unlike structure-based methods, which typically require defined binding pockets or docking poses, ProtoBind-Diff learns to associate sequence context with chemically meaningful ligand features.

“Designing small molecules that hit protein targets is one of the hardest problems in drug discovery,” said Peter Fedichev, CEO and Co-founder at Gero. “Classical modeling struggles because the energy scales, polarization effects, and the complexity of protein dynamics make high-resolution predictions nearly impossible. But maybe we’ve been asking the wrong question.”

He continued: “Nature had to solve this puzzle already – evolution optimized a biochemical language that encodes how proteins and molecules interact. With ProtoBind-Diff, we’re tapping into that. It’s a language model that learns from sequences, not structures. It doesn’t simulate physics – it learns the grammar of bioactivity from a million real examples.”

Peter Fedichev, CEO, and Co-founder at Gero

The model leverages pre-trained protein embeddings (ESM-2) and a denoising diffusion framework to generate chemically valid and novel molecules in SMILES format, guided by sequence-level information alone. “ProtoBind-Diff generates chemically valid, novel, and target-specific ligands without requiring structural supervision,” the authors write [1]. Despite never seeing 3D data during training, attention maps from the model align with known binding residues, suggesting that it learns “spatially meaningful interaction priors from sequence alone [1].”

Aging targets in sight

Although ProtoBind-Diff is positioned as a general-purpose small molecule discovery engine, Fedichev told Longevity.Technology that the Gero team is actively applying it to aging-related biology. “ProtoBind-Diff is indeed a general-purpose small molecule discovery engine,” he explained, “designed to identify ligands for aging-related targets that lack structural data.”

Current efforts include the generation of candidate molecules for proteins involved in inflammation, metabolism and epigenetic regulation – areas central to several hallmarks of aging. “In our benchmarks, we included FTO (Fat mass and obesity-associated protein) – an RNA demethylase whose inhibition may help counter metabolic dysfunction and chronic low-grade inflammation associated with aging,” he said. “Other examples include epigenetic erasers and readers such as KDM1A and SPIN1, where inhibitors are being explored for applications in cancer, inflammation, and fibrosis – all relevant to aging biology.”

“Aging remains a target-poor area, and long before translation begins, researchers need rapid ways to generate molecular probes to test biological hypotheses – often in the absence of high-quality structural data,” he added. “This is where ProtoBind-Diff, or its future refinements, may play a transformative role.”

In benchmarking, ProtoBind-Diff matched or exceeded structure-based models such as Pocket2Mol and TargetDiff across a range of “easy” and “hard” targets. On several structure-scarce proteins, the model demonstrated higher enrichment scores using Boltz-1 than models trained on crystallographic data.

Lifespan effects are also on the agenda, but time is the limiting factor. “Lifespan studies are ongoing, as they require significant time and validation,” Fedichev told us.

“I believe we are only at the beginning of the journey toward creating an ideal generative model,” said Konstantin Avchaciov, Senior Researcher at Gero and lead scientist on the project. “Yes, in our benchmarks, the ProtoBind-Diff model outperforms some existing 3D structural models. That said, I am confident that as we continue to expand our datasets to include a broader diversity of protein classes, we will achieve significantly better results in the future.”

What comes next may not need structure

Gero has integrated ProtoBind-Diff into its internal drug discovery pipeline and is exploring collaborations in oncology, immunology, infectious disease and gerotherapeutics. A public GitHub repository has already been launched, with broader access to the full model promised soon.

Whether ProtoBind-Diff becomes a staple tool or a generational stepping stone will depend on performance in real-world applications – but for now, it seems to offer something that aging biology has long needed: a faster way to go from sequence to hypothesis, even when structure doesn’t come along for the ride.

[1] https://www.biorxiv.org/content/10.1101/2025.06.16.659955v1

#Gero #unveils #model #small #molecule #design #structure

utech506@gmail.com 14 seconds ago

0 0 4 minutes read