/tools
tools tagged “dataset”
IFM
junxia97/IFM
This repository provides a PyTorch implementation of a model aimed at understanding the limitations of deep learning in predicting molecular properties. It includes code for various machine learning models and datasets related to molecular property prediction.
DEELIG
asadahmedtech/DEELIG
DEELIG is a tool designed for predicting binding affinity using deep learning models. It provides datasets and supplementary materials to facilitate research in molecular property prediction.
MGDTA
IILab-Resource/MGDTA
MGDTA is a tool designed for predicting drug-target binding affinity using multigranular representations. It includes datasets for training and evaluation, making it a valuable resource for researchers in the field of drug discovery.
tox21_dataset
filipsPL/tox21_dataset
The Tox21 dataset repository contains data used in the Tox21 Data Challenge for evaluating the toxicity of compounds. It includes lists of compounds, their activity, and descriptors, facilitating in silico toxicity prediction and compound prioritization.
Progen
kyegomez/Progen
Progen is a Python implementation of a language model for generating protein sequences, based on the ProGen paper. It utilizes various protein sequence datasets for training and evaluation, making it a valuable tool for protein design and generation.
largeDFTdata
chemsurajit/largeDFTdata
This repository contains data for QM9 molecules and reactions, along with Python scripts for processing and analyzing molecular data. It enables users to download molecular data, create databases of atomization energies, and process reaction information, making it a valuable resource for computational chemistry research.
nmrdata
ur-whitelab/nmrdata
This repository contains data and parsing scripts for a Graph Neural Network (GNN) model that predicts chemical shifts in molecules. It includes functionalities for loading, validating, and processing molecular data, making it a useful tool for molecular property prediction.
BOOM
FLASK-LLNL/BOOM
BOOM is a tool designed for data-driven molecule discovery, focusing on out-of-distribution predictions of molecular properties. It includes benchmarks for evaluating various machine learning models on their ability to generalize to unseen molecular properties.
Matcha
LigandPro/Matcha
Matcha is a molecular docking tool that utilizes multi-stage flow matching to enhance the accuracy and physical validity of docking predictions. It includes features for benchmarking and supports various datasets for evaluating docking performance.
ECFP-Sort-and-Slice
MarkusFerdinandDablander/ECFP-Sort-and-Slice
ECFP-Sort-and-Slice provides a method for transforming RDKit molecular objects into vectorial extended-connectivity fingerprints using a novel Sort & Slice approach. It includes datasets for various molecular property prediction tasks and facilitates feature extraction for machine learning applications in molecular chemistry.
chemprop_benchmark_v2
chemprop/chemprop_benchmark_v2
Chemprop benchmarking scripts and data for v2 provide tools for evaluating the performance of molecular property prediction models. The repository includes various datasets and benchmarks related to molecular properties, facilitating research in computational chemistry and machine learning.
spice-models
openmm/spice-models
SPICE-Models provides models that are trained on the SPICE dataset, likely aimed at predicting molecular properties. This tool may be useful in various applications related to molecular design and analysis.
DrugDataResource
kexinhuang12345/DrugDataResource
DrugDataResource is a repository that offers a variety of datasets aimed at facilitating drug discovery and development. It includes datasets for drug-target interactions, ADMET properties, and other molecular characteristics, which are essential for computational chemistry and molecular biology research.
Protify
Gleghorn-Lab/Protify
Protify is an open-source platform that simplifies the process of predicting chemical properties, particularly for proteins, using deep learning models. It offers a low-code solution for users to benchmark models and utilize various datasets for protein property prediction.
ProteinF3S
phdymz/ProteinF3S
ProteinF3S is an implementation of a model that enhances enzyme function prediction by combining various protein data types. It provides processed datasets and pre-trained weights for inference, making it a useful tool in the field of protein design and bioinformatics.
ADKF-IFT
Wenlin-Chen/ADKF-IFT
ADKF-IFT is a PyTorch implementation of a meta-learning method for predicting molecular properties using adaptive deep kernel Gaussian processes. It includes code for regression tasks on the FS-Mol dataset and provides model checkpoints for classification and regression, making it a useful tool for researchers in molecular property prediction.
TED-Gen
dptech-corp/TED-Gen
TED-Gen is a framework designed for generating and analyzing atomic structures at van der Waals interfaces using a generative model. It utilizes experimental and simulated data to create high-quality training datasets and offers tools for training models to analyze stacking patterns in materials.