# polygon **Repository Path**: pfsuo/polygon ## Basic Information - **Project Name**: polygon - **Description**: polygon from github - **Primary Language**: Unknown - **License**: MIT - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2026-01-03 - **Last Updated**: 2026-01-03 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # POLYpharmacology Generative Optimization Network (POLYGON) a VAE for de novo polypharmacology. This repository contains the POLYGON framework, a de novo molecular generator for polypharmacology. Akin to de novo portait generation, POLYGON attempts to optimize the chemical space for multiple protein target domains. ![alt text](https://github.com/bpmunson/polygon/blob/main/images/Figure_1r.png?raw=true) *** The codebase is primarily adapted from two excellent de novo molecular design frameworks: 1. GuacaMol for reward based reinforcement learning: https://github.com/BenevolentAI/guacamol 2. MOSES for the VAE implementation: https://github.com/molecularsets/moses ## Data Sources A key resource to the POLYGON framework is experimental binding data of small molecule ligands. We use the BindingDB as a source for this information, which can be found here: https://www.bindingdb.org/rwd/bind/chemsearch/marvin/Download.jsp Input molecule training datasets are available from the GuacaMol package: https://github.com/BenevolentAI/guacamol ## Installation of POLYGON: POLYGON has been testing on Python version 3.9.16. Installation of POLYGON with pip will automatically install the necessary dependencies, which are: * pandas>=1.0.3 * numpy>=1.18.1 * rdkit>=2019.09.3 * torch>=1.4.0 * joblib>=0.14.1 * scikit-learn>=0.22.1 ``` conda install -c conda-force rdkit conda install pytorch::pytorch -c pytorch conda install numpy pandas scikit-learn ``` ``` git clone https://github.com/bpmunson/polygon.git cd polygon pip install . ``` optionally install cudatoolkit for gpu acceleration in pytorch for example: ``` conda install cudatoolkit=11.1 -c conda-forge ``` or see https://pytorch.org/ for specific installation instructions. Installation time is on the order of minutes. *** Example Usage: Pretrain VAE to encode chemical embedding: ``` polygon train \ --train_data ../data/guacamol_v1_train.smiles \ --log_file log.txt \ --save_frequency 25 \ --model_save model.pt \ --n_epoch 200 \ --n_batch 1024 \ --debug \ --d_dropout 0.2 \ --device cpu ``` Train Ligand Binding Models for Two Protein Targets ``` polygon train_ligand_binding_model \ --uniprot_id Q02750 --binding_db_path BindingDB_All.csv --output_path Q02750_ligand_binding.pkl ``` ``` polygon train_ligand_binding_model \ --uniprot_id P42345 --binding_db_path BindingDB_All.csv --output_path P42345_ligand_binding.pkl ``` Use the chemical embedding to design polypharmacology compounds ``` polygon generate \ --model_path ../data/pretrained_vae_model.pt \ --scoring_definition scoring_definition.csv \ --max_len 100 \ --n_epochs 200 \ --mols_to_sample 8192 \ --optimize_batch_size 512 \ --optimize_n_epochs 2 \ --keep_top 4096 \ --opti gauss \ --outF molecular_generation \ --device cpu \ --save_payloads \ --n_jobs 4 \ --debug ``` The expected runtime for POLYGON is on the order of hours. POLYGON will output designs as SMILES strings in a text file. For example: ``` $ head GDM_final_molecules.txt Fc1cc(F)cc(CC(Nc2ccc3ncccc3c2)c2cccnc2)c1 N[SH](=O)(O)c1cccc(S(=O)(=O)O)c1 N#Cc1cc(C(N)=NO)ccc1Nc1nccc2ccnn12 CN(CN=C(O)c1ccco1)Nc1nccs1 ```