/tools
tools tagged “dataset”
V2DB
mcsorkun/V2DB
V2DB is a database and tool for generating and predicting properties of novel two-dimensional materials through virtual screening. It utilizes machine learning models to predict various material properties, facilitating the discovery of stable and functional 2D materials.
AI4PFAS
AI4PFAS/AI4PFAS
AI4PFAS is a repository that provides a dataset and code for predicting the toxicity of PFAS compounds using uncertainty-informed deep transfer learning. It includes various models and benchmarks for assessing toxicity, specifically focusing on LD50 values.
AutomatedSeriesClassification
rdkit/AutomatedSeriesClassification
AutomatedSeriesClassification is a tool designed for the automated classification of chemical series, leveraging datasets like ChEMBL. It prepares data for analysis and can be used to classify compounds similarly to how medicinal chemists would.
augment-atoms
jla-gardner/augment-atoms
`augment-atoms` is a tool designed for augmenting datasets of atomic configurations using a model-driven approach. It generates new molecular structures by applying transformations to existing ones, making it useful for enhancing datasets in molecular machine learning applications.
modelforge
choderalab/modelforge
Modelforge is a package designed to implement and train neural network potentials (NNPs) for molecular simulations. It includes infrastructure for optimizing and storing these models, along with datasets for accurate training and validation.
QSAR-activity-cliff-experiments
MarkusFerdinandDablander/QSAR-activity-cliff-experiments
This repository explores QSAR models for predicting activity cliffs in small-molecule inhibitors, providing datasets and methodologies for molecular property prediction. It includes clean data for various targets and allows for the reproduction of experiments related to binding affinity and activity classification.
GeomGCL
agave233/GeomGCL
GeomGCL is an implementation of a method for predicting molecular properties using geometric graph contrastive learning. It includes preprocessing steps for molecular datasets and provides a framework for training models on these datasets.
SUPERChem_eval
catalystforyou/SUPERChem_eval
SUPERChem is a multimodal reasoning benchmark designed to evaluate the chemical reasoning capabilities of large language models. It includes a dataset of 500 expert-curated problems and provides tools for evaluation and analysis of model performance in chemistry.
compound_target_pairs_dataset
chembl/compound_target_pairs_dataset
This repository contains code for automatically extracting a dataset of interacting compound-target pairs from the ChEMBL database. It enables researchers to analyze drug-target interactions and supports future analyses in drug discovery.
ichor
popelier-group/ichor
Ichor is a Python package designed to simplify data management from computational chemistry programs and support machine learning force field development. It provides interfaces for various computational chemistry software, flexible data structures for managing large datasets, and tools for benchmarking molecular dynamics simulations.
polymer-chemprop-data
coleygroup/polymer-chemprop-data
The repository contains data and tools for predicting molecular properties of polymers using a graph representation. It includes datasets of computed electron affinities and ionization potentials, as well as instructions for reproducing the associated calculations.
MBP
jiaxianyan/MBP
MBP is a PyTorch implementation designed for multi-task bioassay pre-training aimed at predicting protein-ligand binding affinities. It utilizes the ChEMBL-Dock dataset, which contains extensive protein-ligand binding data, to train models that can predict binding affinities effectively.
BC-Design
gersteinlab/BC-Design
BC-Design is a framework designed for high-precision inverse protein folding, integrating structural and biochemical features to enhance protein design accuracy. It utilizes a dual-encoder architecture to generate amino acid sequences that correspond to specific 3D protein structures, making it valuable for protein engineering and drug development.
qm9pack
raghurama123/qm9pack
QM9PACK is a Python package designed for data-mining the QM9 dataset, which contains quantum chemistry structures and properties of a large number of molecules. It facilitates the extraction and analysis of molecular properties, making it a useful tool in computational chemistry.
GS-Meta
HICAI-ZJU/GS-Meta
GS-Meta is an implementation of a meta-learning framework designed for predicting molecular properties using graph sampling techniques. It includes datasets and models specifically tailored for molecular property prediction tasks.
LMetalSite
biomed-AI/LMetalSite
LMetalSite is a tool that predicts metal ion-binding sites from protein sequences using a pretrained language model and multi-task learning. It provides datasets and trained models for users interested in reproducing the results, making it a valuable resource in the field of molecular biology.
DL_protein_ligand_affinity
meyresearch/DL_protein_ligand_affinity
This repository provides code and data for predicting protein-ligand binding affinity using deep learning techniques. It includes various encodings for proteins and ligands, and offers datasets for training and testing models in the context of drug discovery.
BigSMILES_homopolymer
CDAL-SChoi/BigSMILES_homopolymer
This repository offers an automated workflow for converting SMILES representations of homopolymers to BigSMILES format and vice versa. It includes a dataset of BigSMILES representations, facilitating research in molecular representation and potential applications in deep learning.
STCRpy
oxpig/STCRpy
STCRpy is a software suite designed for analyzing and processing T-cell receptor (TCR) structures. It provides tools for interaction profiling, geometry calculations, and generating datasets compatible with machine learning frameworks, making it useful for researchers in molecular biology and immunology.
prospr
okkevaneck/prospr
Prospr is a toolbox designed for protein structure prediction using the HP-model. It includes various prediction algorithms, a protein data structure for simulating folding, and datasets for research in protein folding and structure analysis.
GPCNDTA
LiZhang30/GPCNDTA
GPCNDTA is a tool designed for predicting drug-target binding affinity using cross-attention networks enhanced with graph features and pharmacophores. It includes benchmark datasets for training and evaluation, making it suitable for drug discovery applications.
Uni-Dock-Benchmarks
dptech-corp/Uni-Dock-Benchmarks
Uni-Dock-Benchmarks is a repository that contains a curated collection of datasets and benchmarking tests for assessing the performance and accuracy of the Uni-Dock docking system. It includes prepared structures and input files for both molecular docking and virtual screening, making it a valuable resource for researchers in computational chemistry.
SAAINT
tommyhuangthu/SAAINT
SAAINT is a structural antibody parser and database that facilitates the extraction and annotation of antibody structures and their interactions with antigens from the Protein Data Bank. It provides tools for building and analyzing a comprehensive antibody database, making it useful for antibody modeling and design.
VenusX
ai4protein/VenusX
VenusX is a benchmark tool designed for fine-grained functional annotation of proteins, focusing on tasks such as residue-level classification and fragment-level classification. It includes a comprehensive dataset with over 878,000 samples, facilitating the evaluation of protein models and their functional understanding.