/tools
tools tagged “dataset”
awesome-molecular-docking
Thinklab-SJTU/awesome-molecular-docking
Awesome-Molecular-Docking is a curated list of resources aimed at solving molecular docking and related tasks. It includes software for docking, datasets, and references to molecular dynamics simulations, making it a valuable tool for researchers in drug discovery and molecular biology.
moleculenet
deepchem/moleculenet
MoleculeNet is a collection of datasets and benchmarks designed for evaluating machine learning models in the context of molecular property prediction. It includes various datasets relevant to physical chemistry, biophysics, and materials science, facilitating research in drug discovery and molecular modeling.
PPIRef
anton-bushuiev/PPIRef
PPIRef is a Python package designed for working with 3D structures of protein-protein interactions. It includes functionalities for extracting, visualizing, and analyzing protein-protein interfaces, as well as providing a dataset for machine learning applications in this domain.
molecularGNN_3Dstructure
masashitsubaki/molecularGNN_3Dstructure
This repository provides a graph neural network implementation for predicting molecular properties based on 3D structures. It allows users to preprocess datasets and train models to predict various molecular properties using a subset of the QM9 dataset.
Computational-ADME
molecularinformatics/Computational-ADME
Computational-ADME is a repository that provides code and data for building and validating machine learning models for predicting ADME (Absorption, Distribution, Metabolism, and Excretion) properties of drug candidates. It includes datasets of diverse compounds and various machine learning algorithms to enhance the accuracy of predictions in drug discovery.
ANI1_dataset
isayev/ANI1_dataset
The ANI-1 dataset repository contains scripts for accessing a large dataset of 20 million calculated off-equilibrium conformations for organic molecules. This dataset is useful for training machine learning models in molecular property prediction and molecular simulations.
USearchMolecules
ashvardanian/USearchMolecules
USearchMolecules is a comprehensive dataset containing over 7 billion small molecules, designed for efficient searching and clustering of molecular structures. It utilizes various molecular fingerprints and is aimed at facilitating drug discovery and cheminformatics research.
PINNACLE
mims-harvard/PINNACLE
PINNACLE is a geometric deep learning model designed to generate contextualized representations of proteins based on their interactions across different cell types and tissues. It aims to improve the understanding of protein functions and therapeutic potentials by incorporating biological context into its modeling approach.
GenAI4Drug
gersteinlab/GenAI4Drug
GenAI4Drug is a survey repository that explores the use of generative AI techniques for de novo drug design, emphasizing the generation of molecules and proteins. It includes discussions on various models, datasets, and metrics relevant to molecular design and property prediction.
Uni-pKa
dptech-corp/Uni-pKa
Uni-pKa is a machine learning-based tool designed for accurate pKa prediction of small molecules. It includes a microstate enumerator for building protonation ensembles and a model that integrates thermodynamic principles to enhance prediction accuracy.
molencoder
cxhernandez/molencoder
MolEncoder is a Molecular AutoEncoder implemented in PyTorch that allows users to train models on molecular datasets, specifically designed for tasks such as molecular representation and generation. It includes functionalities for downloading datasets and training models, making it a useful tool in computational chemistry and molecular biology.
dayhoff
microsoft/dayhoff
Dayhoff is a resource that combines extensive protein sequence data with generative language models to predict mutation effects and generate novel protein sequences. It includes datasets and models that facilitate the design and analysis of proteins, making it a valuable tool in molecular biology.
SumGNN
yueyu1030/SumGNN
SumGNN is a tool designed for predicting multi-typed drug interactions by utilizing knowledge graph summarization techniques. It provides datasets and models for drug-drug interaction prediction, making it a valuable resource in the field of bioinformatics and drug discovery.
typedb-bio
typedb-osi/typedb-bio
TypeDB Bio is an open-source biomedical knowledge graph designed to facilitate research in drug discovery, precision medicine, and drug repurposing. It allows researchers to query interconnected biomedical data, helping to identify potential drug targets and understand biological interactions.
reaction_utils
MolecularAI/reaction_utils
The 'reaction_utils' repository offers utilities for handling datasets of chemical reactions, including template extraction and data manipulation. It is designed to facilitate the analysis and processing of chemical reaction data, making it a useful tool in the field of cheminformatics.
chembl-downloader
cthoyt/chembl-downloader
ChEMBL Downloader is a tool designed to reproducibly download, open, parse, and query the ChEMBL database. It simplifies the process of accessing a wealth of molecular data, which can be utilized for property prediction and other cheminformatics tasks.
PaRoutes
MolecularAI/PaRoutes
PaRoutes is a framework that benchmarks multi-step retrosynthesis methods, providing curated datasets for building retrosynthesis models. It includes scripts for analyzing route quality and diversity, making it a valuable resource for researchers in molecular design.
public_binding_free_energy_benchmark
schrodinger/public_binding_free_energy_benchmark
This repository provides a benchmark dataset of proteins and ligands for validating binding free energy prediction methods. It includes input structures for calculations, experimental binding data for reproducibility studies, and analysis scripts for generating results.
GraphLoG
DeepGraphLearning/GraphLoG
GraphLoG is a tool for self-supervised graph-level representation learning, with applications in the chemistry domain. It provides pre-training and fine-tuning capabilities for models that can be used to analyze molecular data.
mlcgmd
kyonofx/mlcgmd
This tool provides a framework for simulating time-integrated coarse-grained molecular dynamics using multi-scale graph neural networks. It includes datasets for training and evaluation, making it suitable for molecular dynamics applications.
Dyna-1
WaymentSteeleLab/Dyna-1
Dyna-1 is a model designed to predict micro-millisecond motions in proteins using their sequence and structure. It includes curated datasets for evaluation and is aimed at advancing the understanding of protein dynamics through deep learning techniques.
Physics-aware-Multiplex-GNN
XieResearchGroup/Physics-aware-Multiplex-GNN
PAMNet is a universal framework designed for accurate and efficient geometric deep learning of molecular systems. It excels in predicting molecular properties, such as binding affinities and RNA 3D structures, and utilizes graph neural networks to enhance performance in these tasks.
meta-flow-matching
lazaratan/meta-flow-matching
Meta Flow Matching is a tool designed for integrating vector fields on the Wasserstein manifold, particularly useful in predicting individual treatment responses in personalized medicine. It includes datasets and models for training on biological experiments, specifically targeting drug-screen datasets.
PatentChem
learningmatter-mit/PatentChem
PatentChem is a tool that downloads USPTO patents and extracts molecules related to specified keyword queries. It processes patent claims to find and output SMILES strings of relevant molecules, making it useful for cheminformatics and molecular data extraction.