/tools
tools tagged “dataset”
DeeplyTough
BenevolentAI/DeeplyTough
DeeplyTough is a tool designed for learning structural comparisons of protein binding sites using deep learning techniques. It provides built-in support for several benchmark datasets and allows users to evaluate custom datasets, making it a valuable resource for drug discovery and protein design.
intro_pharma_ai
kochgroup/intro_pharma_ai
This repository provides a collection of Jupyter Notebooks aimed at teaching life science students the fundamentals of deep learning, with a focus on applications in cheminformatics. It includes notebooks on generative models for molecular design and datasets relevant to molecular machine learning.
ChatDrug
chao1224/ChatDrug
ChatDrug is a tool for conversational drug editing that utilizes retrieval and domain feedback to assist in the design and modification of small molecules, peptides, and proteins. It includes datasets for training and evaluation, making it a comprehensive resource for drug discovery.
OntoProtein
zjunlp/OntoProtein
OntoProtein is a knowledge-enhanced protein language model that integrates Gene Ontology for improved protein function and structure prediction. It provides a large-scale dataset, ProteinKG25, for pretraining and fine-tuning on various protein-related tasks.
FLAb
Graylab/FLAb
FLAb is a dataset designed for training and benchmarking AI models in therapeutic antibody design, offering extensive data on properties such as binding affinity and thermostability. It serves as a centralized resource for researchers in protein design, facilitating the development of optimized antibody candidates.
pinder
pinder-org/pinder
PINDER is a comprehensive dataset and evaluation resource for protein-protein interactions, aimed at enhancing the training and evaluation of docking algorithms. It includes a large collection of protein structures and associated metadata, making it a valuable resource for researchers in molecular biology and computational chemistry.
bamboo
bytedance/bamboo
BAMBOO is an AI-driven machine learning force field designed for precise and efficient simulations of lithium battery electrolytes. It provides tools for training models and running molecular dynamics simulations, along with datasets for training and validation.
AMPL
ATOMScience-org/AMPL
The ATOM Modeling PipeLine (AMPL) is an open-source software pipeline that facilitates data curation, model building, and molecular property prediction to enhance in silico drug discovery efforts. It supports various machine learning techniques and has been benchmarked on extensive pharmaceutical datasets.
chemCPA
theislab/chemCPA
chemCPA is a tool designed to predict cellular responses to novel drug perturbations at a single-cell resolution. It includes code for model training, data processing, and utilizes various molecular embedding models to enhance drug discovery efforts.
Uni-3DAR
dptech-corp/Uni-3DAR
Uni-3DAR is an autoregressive model designed for unified 3D generation and understanding of molecular structures, proteins, and crystals. It supports diverse tasks including molecular property prediction and generation, utilizing pretrained models and datasets for training and inference.
DrugAssist
blazerye/DrugAssist
DrugAssist is a large language model aimed at optimizing molecules, making it a valuable tool in drug discovery. It includes a dataset for training and facilitates the generation and optimization of molecular structures.
Meta-MGNN
zhichunguo/Meta-MGNN
Meta-MGNN is a tool designed for predicting molecular properties using few-shot graph learning techniques. It includes datasets for training and evaluates performance on various molecular properties, making it relevant for computational chemistry applications.
GLN
Hanjun-Dai/GLN
GLN is a tool for predicting retrosynthesis pathways using a Conditional Graph Logic Network. It includes datasets for training and testing models, making it useful for molecular design and generation tasks.
MolRep
biomed-AI/MolRep
MolRep is a Python library designed for deep representation learning aimed at predicting molecular properties. It includes a comprehensive evaluation of state-of-the-art models across multiple benchmark datasets, facilitating advancements in molecular property prediction.
MGSSL
zaixizhang/MGSSL
MGSSL is an official implementation of a method for motif-based graph self-supervised learning aimed at predicting molecular properties. It includes pretraining and finetuning on the MoleculeNet dataset, making it a useful tool for molecular property prediction tasks.
FlowDock
BioinfoMachineLearning/FlowDock
FlowDock is a geometric flow matching model designed for generative protein-ligand docking and affinity prediction. It provides tools for predicting molecular interactions and includes datasets for training and evaluation, making it a valuable resource in computational chemistry and molecular biology.
kGCN
clinfo/kGCN
kGCN is a graph-based deep learning framework that focuses on the classification and prediction of molecular properties using graph convolutional networks. It supports the generation of molecular data and provides tools for dataset preparation and model training in cheminformatics.
geometric-gnns
AlexDuvalinho/geometric-gnns
The 'geometric-gnns' repository provides a curated list of Geometric Graph Neural Networks designed for 3D atomic systems. It includes various models, their characteristics, and a collection of datasets, facilitating research in molecular property prediction and simulations.
Molecules_Dataset_Collection
GLambard/Molecules_Dataset_Collection
Molecules_Dataset_Collection is a curated collection of datasets containing molecular structures and their associated physicochemical properties. It aims to facilitate the validation of machine learning models for predicting molecular properties, making it a valuable resource for researchers in computational chemistry and machine learning.
Alchemy
tencent-alchemy/Alchemy
Alchemy is a repository that offers tools for predicting molecular properties using graph neural networks. It includes a dataset for benchmarking AI models in quantum chemistry and provides implementations for various models like SchNet and MGCN.
PDGrapher
mims-harvard/PDGrapher
PDGrapher is a tool for combinatorial prediction of therapeutic perturbations using causally-inspired neural networks. It leverages chemical datasets to train models that can predict the effects of drug combinations, contributing to the field of drug discovery.
FLIP
J-SNACKKB/FLIP
FLIP is a collection of tasks designed to evaluate the effectiveness of protein sequence representations in modeling protein design aspects. It includes datasets and benchmarks for assessing machine learning models in the context of protein engineering.
clamp
ml-jku/clamp
CLAMP is a tool designed to enhance activity prediction models in drug discovery by leveraging natural language processing. It predicts relevant molecules based on textual descriptions of bioassays, enabling few-shot and zero-shot learning in the context of molecular properties.
PepGLAD
THUNLP-MT/PepGLAD
PepGLAD is a tool for full-atom peptide design that utilizes geometric latent diffusion models to co-design peptide sequences and structures. It supports binding conformation generation and provides datasets for benchmarking the models.