Diffusion Models For Life Science

引言

Diffusion models 在CV和NLP上大展风采。在蛋白设计上由于蛋白质主链几何结构和序列结构关系的复杂性限制了其应用。

背景

Protein Structure Key Task

Protein structure prediction

AlphaFold
RosettaFold

Protein design

ProteinMPNN
RFjoint Inpainting
RFDiffusion

In April 2019, Baker gave a TED talk titled "5 challenges we could solve by designing new proteins"

Computational Protein Design Workflow

Motifs can have various Functions and sources

Evaluation for Designing proteins

State-of-the-art

DALL-E2: An astronaut riding a horse in a photorealistic style

Imagen: A robot couple fine dining with Eiffel Tower in the background

What makes this hard?

Post-AlphaFold, protein design is ‘guess’ & ‘check’

Naive guessing ? ~20^100 sequences
!Native structures? Too sparseExisting
ML tools?
- Low diversity
- High compute cost
- Short sequences is bad

模型详细介绍

生成模型

物理背景，搞物理的很牛，非平衡热力学。（熵增，混乱过程，逆转，从混乱中生成秩序。）

建模数据的生成概率。

GAN:生成器。判别器。对抗训练。

VAE:高维数据，近似。拟合

Flow:鲜艳分布

Diffusion: 线性，隐变量

两个过程：

数据-》噪声，

DDPM

Forward diffusion process gradually adds noise to input data.

Reverse denoising process generates data by removing noise.

缺点：

生成扩散模型的大火，则是始于2020年所提出的DDPM（Denoising Diffusion Probabilistic Model）。
DDPM的数学框架在2015年就已经完成了 (Sohl-Dickstein et al., 2015)
DDPM是首次将它在高分辨率图像生成上调试出来了，从而引导出了后面的火热(DDPM; Ho et al. 2020).

The training and sampling algorithms in DDPM (Image source: Ho et al. 2020)

Forward diffusion process

$$
q(\mathbf{x}t \vert \mathbf{x}{t-1}) = \mathcal{N}(\mathbf{x}t; \sqrt{1 - \beta_t} \mathbf{x}{t-1}, \beta_t\mathbf{I}) \quad
q(\mathbf{x}{1:T} \vert \mathbf{x}_0) = \prod^T{t=1} q(\mathbf{x}t \vert \mathbf{x}{t-1})
$$

Reverse diffusion process

反向过程就是通过估测噪声，多次迭代逐渐将被破坏的 x_t 恢复成x₀

如何训练

如何使用

高斯贯穿全部；

KL散度。

应用

总结

词汇对应：

Denoising diffusion probabilistic models (DDPMs)：a powerful class of machine learning models recently demonstrated to generate novel photorealistic images in response to text prompts

参考

What are Diffusion Models? | Lil’Log

Yang Song | Generative Modeling by Estimating Gradients of the Data Distribution

Awesome-Diffusion-Models:This repository contains a collection of resources and papers on Diffusion Models.