IBM’s Open Source GT4SD Generates Ideas for Scientists – The New Stack

Global information technology company IBM released an open source library, the Generative toolbox for scientific discovery (GT4SD) in the hope of acceleration of discovery in the field of machine learning.

Designed with the goal of not only making advanced generative models easier to use, but also more efficient when applied to discovery workflows, the GT4SD hosts generative algorithms developed at IBM Research for the design of new materials with distinct language models and applications for scientific documents.

In a blog post Explaining GT4SD, IBM described the project as “an open-source library for accelerating hypothesis generation in the scientific discovery process that facilitates the adoption of cutting-edge generative models.” IBM researcher and main creator of GT4SD Matteo Manica gave his thoughts on the toolkit in an interview with The New Stack.

“It was an open exchange between different researchers within IBM Research. We noticed that there was this desire in the community, and in other areas, to simplify access to these AI technologies. We also noticed a gap that needed to be filled in generative models,” he said. “Last year, we started to gather feedback from various researchers working on the subject. There were a lot of different technologies we were building, so it was a big effort to homogenize all these ideas from different labs into one. single research project.

“Compared to the time it usually takes to publish research, we went very fast. From the initial ideation of an algorithm, it can take about a year to release it. By then it’s already old and you have plenty of other ideas,” jokes Manica. “Instead, what was beautiful about this initiative was that it was very focused on the rapid development of a technology. We started with algorithms that we had already created at IBM Research, and then we looked for some things that we wanted to make available with our library. It was an effort that lasted ten or eleven months at most.

Real-world use

Merging AI with hypothesis generation can bring unprecedented benefits to multiple fields of study. In drug discovery, where there are countless drug-like molecules currently known to man, finding the perfect combination is nearly impossible with the regular workflow of trial and error. By using the GTS4D, this process can be accelerated exponentially.

“Generative models are really good at looking at what you already know and listing examples, like properties, and then extrapolating to new examples. You can think of this process as connection points,” Manica comments. “Imagine you have a Pollock-like canvas with many points (properties) and that you can draw lines between these points. Along these lines you find many points that did not exist because they have not yet been discovered. Generative models allow you to simulate the discovery of these points. If the properties discovered match the criteria used for the search, they may be optimal candidates for the discovery process.

While drug discovery is an immediately recognizable use case, Manica says the GTS4D can be used for “any molecular science application.”

“For example, you can optimize enzymes to catalyze a specific reaction. That’s pretty cool because enzyme engineering and design is very important for greener chemical processes. You won’t need extreme temperatures, toxic solvents or high energy consumption,” Manica said. “It’s a perfect example where generative modeling can help scientists make a process more sustainable and efficient.”

In its article unveiling the GT4SD, IBM detailed many scenarios in which the toolkit would be very useful. Here are a few :

  • Discovery of materials and drug discovery scientists can use the library to provide models capable of generating new molecule designs based on specific properties such as target proteins, target omics profiles, scaffolding distances, binding energies, HOMO and LUMO energies , and many others.
  • Scientists and students using generative models are offered a centralized environment to access and try out different models simplification of model usage via consistent commands for inference or recycling with default settings.
  • AI/ML practitioners creating generative models can benefit from the familiar framework of the GT4SD which makes the models easily accessible to a wider community.

“Replacing manual processes and human bias in the discovery process has significant effects on applications that rely on generative models, leading to accelerated expert knowledge,” IBM writes.

“We focus on material design because that is where we have done most of the research. However, the toolkit is designed to be as generic as possible so that generative models can be used in various applications,” Manica said.

Open Source for Research

The GT4SD was developed to be open source from the start, according to Manica. “I’m a big believer in open source projects, and I think it’s the best way to achieve big goals within the scientific community.”

He went on to say, “Our goal in making the toolkit open source was for research to advance faster in the area of ​​generative modeling. For any business, revenue is obviously important, and it can be hard to see how you can generate revenue from an open source project. But the main feedback we’re looking for here is to create a community of users and contributors who help us build better models and who we can help build better models.

Open source technology has been making waves in the engineering and coding communities, but Manica believes scientists can benefit just as much. The more people who use the toolkit, the more it will progress.

A look into the future

The GT4SD has been released to the public, but it’s not what Manica would call finished. “The good thing is that it’s not a final product. Of course, it’s ready to use, but the library is a kind of open factory for generative modeling. We intend to continue to develop it not just for our own research at IBM, but so that the wider research community can build together. This is just the beginning – we hope contributors and users will improve the library and will also use in their own projects.

He continues: “In five years, I would like to see GTS4D grow and become a real community. Whether in the biochemical field, polymer physics or any other industry. Ultimately, our primary concern is to achieve the goal of dramatically accelerating discovery over the next ten years.

The toolkit has been available since last week. Manica ended by urging potential users to “Try GT4SD. Use it in your research, break it, and report any issues. It was developed for the scientific community, and we want everyone involved.

The New Stack is a wholly owned subsidiary of Insight Partners, an investor in the following companies mentioned in this article: Real.

Comments are closed.