The Most Common and Useful Neural Nets

Explainer: what are AI architectures and which are the most important ones?
machine learning
science
Author

Rachel Thomas

Published

July 24, 2024

This post can stand alone as a friendly introduction to neural nets, no background required. It is part 2 in my series on using AI to figure out what T cells can bind to. Here are part 1 and part 3.

What are Neural Networks?

Neural networks are a type of machine learning algorithm that are currently in the limelight, powering chatbots like ChatGPT and image generators like Midjourney. Neural networks are just math equations, written in computer code. When people hear the terms “neural networks” or “artificial intelligence,” they may picture humanoid robots, but it is more accurate to imagine a scaled-up version of 10th grade math class.

When considering an AI system, two major categories of tasks are:

  • Classifying things: These are not predictions in the sense of predicting the future, but rather, guessing the answer to a question. Is this a picture of a chihuahua or a blueberry muffin? Which of these potential drugs would be most likely to target antibiotic resistant bacteria?
  • Generating things: Create a picture of Kanye West made out of tiny Captain Picard faces. Generate a new molecule that may work as an antibiotic.

Classification: muffin or chihuahua? (pic from Karen Zack, @teenybiscuit, 2016) // Generation: Kanye out of Picard faces (pic from fast.ai intern Brad Kenstler, 2017)

Some problems in medicine and immunology can be framed either as a classification problem OR as a generative problem. For instance, suppose your goal is to produce a new antibiotic to address antimicrobial resistance. Some scientists are trying to classify which compounds, out of hundreds of millions of existing ones, may have the desired properties, whereas others are trying to generate new compounds. Both approaches are useful.

Language Models

You may have heard the term “language model” describing chatbots (such as Chat-GPT or Claude) and automated language translation (such as Google Translate and Skype Translator). One approach to developing sophisticated language capabilities is to write a computer program that predicts the next word in a sentence. Predict the next word and then the next, and you can generate strings of text. This is a language model. Both prediction and generation are used to build language models.

Google Translate relies on language models. The gender bias shown here has since been corrected, although bias remains an issue.

Along the way, the model has learned a number of patterns regarding context, vocabulary, and relationships. Fine tuning lets us use these capabilities for specific problems. For instance, a language model trained on medical chart notes was used to make predictions about which patients were most likely to be readmitted to the hospital. While the model was trained on a prediction task (predicting the next word in a sentence), it can now be used for generation as well when combined with some additional techniques when it is given additional instructions and training from humans.

When language models are trained, they are not always just tasked with predicting the next word. In a sentence, sometimes random words are covered up, and the computer must predict what goes in that spot. It is a fill-in-the-blank assignment.

Through predicting words, the model can learn which other words to focus on. Suppose I told you, “I am so tired that I can barely keep my XXXX open”. You could likely guess that the missing word is “eyes”. To do so, you may have focused on “tired” and “open”. Some of the word positions add little, if anything, to your efforts of deduction. In a similar way, attention-based neural networks learn what to pay attention to, where to focus.

Like the sequence of words in a novel, a sequence of amino acids is its own language, telling the story of a protein. The same techniques and approaches that were developed for natural languages (such as translating between English and French) can also be applied to the biological language of protein creation.

The Innovation of AlphaFold

Recall from Part 1 of this series that the computer program AlphaFold revolutionized the task of taking a 1D sequence of amino acids (shown on the left), and transforming it into a 3D molecule (shown on the right).

Two different ways of representing the protein insulin: a string of letters, or its 3D structure

The text of a book is 1-dimensional. It could be written on a single long ribbon, stretched out, and read as such. However, best making sense of amino acid sequences requires an additional dimension. We want to know not just the long ribbon of a single sequence, but the evolutionary relationships of related sequences it is similar to.

A crucial innovation of AlphaFold is that it expanded attention to 2 dimensions. The 2 dimensions are the sequence of the amino acids, as well as a block of sequences of similar sequences. These similar sequences provide important evolutionary clues of what our protein of interest may be like. Like distant cousins in a family tree, they can give us insights of what may be some common features. This set of related sequences is known as a Multiple Sequence Alignment (MSA) and is a key component in bioinformatics. AlphaFold applies attention to the MSA, learning which parts of related sequences to focus on in trying to decode a protein structure.

AlphaFold represents sequences 2 ways. The MSA representation requires 2D attention. Image from Jumper, et al, 2021

Returning to our list of popular architectures, AlphaFold combines aspects of Transformers (in a new form, named the Evoformer, which has 2D attention) and Graph Neural Networks (to represent the spatial relationships between amino acids). Now that we have covered some of the types of neural networks and the structure of AlphaFold, we are ready to combine this with the information about T cells from part 1. Check out part 3 to see how AI is being used to predict T cell binding.

And for understanding the ethical risks of AI systems, you may be interested in reading my previous posts:

Thank you to Jeremy Howard for feedback on earlier drafts of this post.

You can subscribe to be notified of new blog posts by submitting your email below:



I look forward to reading your responses. Create a free GitHub account to comment below.