Skip to content

Implementation of FoldMark: Safeguarding Protein Structure Generative Models with Distributional and Evolutionary Watermarking

License

Notifications You must be signed in to change notification settings

zaixizhang/FoldMark

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

26 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

FoldMark: Safeguarding Protein Structure Generative Models with Distributional and Evolutionary Watermarking

Hugging Face Demo Paper Twitter

📰 Media Coverage

🌟 Try Our Demo!

We've created an interactive demo on Hugging Face Spaces where you can:

  • Input protein sequences and get watermarked structure predictions
  • Compare watermarked vs. non-watermarked structures
  • Visualize the differences in 3D
  • Pretrained Checkpoints and Inference code

Try the Demo →

🚀 Overview

FoldMark is a first-of-its-kind watermarking strategy designed to provide essential biosecurity safeguards for generative protein models against dual-use risks. It:

  • Balances Performance and Quality: Employs distributional and evolutionary principles to embed watermarks while maintaining high-fidelity protein structures.
  • High Bit Accuracy: Achieves over 95% watermark bit accuracy at 32 bits with minimal impact on structural integrity (maintaining >0.9 scTM scores).
  • Broad Compatibility: Works seamlessly with leading models, including AlphaFold3, ESMFold, RFDiffusion, and RFDiffusionAA.
  • Robust User Tracing: Capable of successfully tracing the source of a generated protein back to one of up to 1 million users.
  • Wet Lab Validated: Successfully tested on redesigned EGFP and CRISPR-Cas13, which showed wildtype-level function (98% fluorescence, 95% editing efficiency) and >90% watermark detection, proving its practical utility.

📊 Results

Structure Prediction with Watermarking

De Novo Protein Structure Design with Watermarking

🛠️ Installation

# Create and activate conda environment
conda env create -f foldmark.yml
conda activate fm

# Install torch-scatter
pip install torch-scatter -f https://data.pyg.org/whl/torch-2.0.0+cu117.html

# Install local package
pip install -e .

📊 Training Pipeline

Data Setup

  1. Download preprocessed SCOPe dataset (~280MB): Download Link
  2. Extract the data:
    tar -xvzf preprocessed_scope.tar.gz
    rm preprocessed_scope.tar.gz

Training Steps

  1. Pretrain the model:
    python -W ignore experiments/pretrain.py
  2. Finetune with watermarking:
    python -W ignore experiments/finetune.py

🔬 Wet Lab Verifications on GFP and Cas13 Redesign

📝 Citation

If you find this work helpful, please cite our paper:

@article{zhang2024foldmark,
  title={FoldMark: Protecting Protein Generative Models with Watermarking},
  author={Zhang, Zaixi and Jin, Ruofan and Fu, Kaidi and Cong, Le and Zitnik, Marinka and Wang, Mengdi},
  journal={bioRxiv},
  pages={2024--10},
  year={2024},
  publisher={Cold Spring Harbor Laboratory}
}

🙏 Acknowledgments

We thank the following open-source projects for their valuable contributions:

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

About

Implementation of FoldMark: Safeguarding Protein Structure Generative Models with Distributional and Evolutionary Watermarking

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages