/tools
tools tagged “dataset”
BALM
meyresearch/BALM
BALM is a deep learning framework designed to predict binding affinities between proteins and ligands by fine-tuning pretrained language models. It utilizes the BindingDB dataset and proposes improved evaluation strategies for assessing model performance, making it a practical tool for early-stage drug discovery screening.
generative-quantum-states
PennyLaneAI/generative-quantum-states
This repository contains code for predicting properties of quantum systems using conditional generative models. It includes tools for generating datasets, training models, and simulating quantum systems, making it relevant for molecular property prediction and simulation tasks.
Uni-MOF
dptech-corp/Uni-MOF
Uni-MOF is a transformer-based framework designed for high-accuracy predictions of gas adsorption in metal-organic frameworks (MOFs). It utilizes a large dataset of MOF structures to learn representations and predict various properties, making it a valuable tool in computational chemistry and materials science.
pQSAR
Novartis/pQSAR
Profile-QSAR is a project that develops multitask machine learning models to predict the activity of compounds across numerous biological assays. It includes scripts for data retrieval, model building, and making predictions, utilizing ChEMBL data for training and validation.
meta-learning-qsar
GSK-AI/meta-learning-qsar
This repository provides code for meta-learning initializations aimed at improving molecular property prediction in low-resource settings. It includes methods for training models on molecular data, specifically utilizing graph representations of molecules.
multi-fidelity-gnns-for-drug-discovery-and-quantum-mechanics
davidbuterez/multi-fidelity-gnns-for-drug-discovery-and-quantum-mechanics
This repository contains source code for applying graph neural networks to improve molecular property prediction by leveraging both high-fidelity and low-fidelity data. It includes methods for transfer learning and provides access to multi-fidelity datasets for drug discovery and quantum mechanics.
CheTo
rdkit/CheTo
CheTo is a tool for Chemical Topic Modeling that utilizes topic modeling techniques from text mining to analyze chemical data. It provides Jupyter notebooks and datasets for exploring molecular datasets, making it useful for researchers in cheminformatics.
CoPRA
hanrthu/CoPRA
CoPRA is a tool designed for predicting protein-RNA binding affinity using pretrained sequence models. It includes datasets for training and evaluation, as well as model weights for users to implement and fine-tune their predictions.
PDBench
wells-wood-research/PDBench
PDBench is a dataset and software package that evaluates fixed-backbone sequence design algorithms for proteins. It includes a benchmark set of protein structures and provides metrics for assessing the performance of various design models.
QM9nano4USTC
bigdata-ustc/QM9nano4USTC
QM9nano4USTC is a repository that introduces the QM9 dataset, which contains information on 130,462 organic molecules and their properties. It includes preprocessed features for molecular property prediction, making it useful for data-driven experiments in computational chemistry.
cime
jku-vds-lab/cime
ChemInformatics Model Explorer (CIME) is a web application that enables users to explore chemical compounds through interactive visualizations. It supports the analysis of molecular properties and allows users to upload and visualize datasets, making it a valuable tool for cheminformatics.
Drug3D-Net
anny0316/Drug3D-Net
Drug3D-Net is a tool that implements a spatial-temporal gated attention module for predicting molecular properties based on molecular geometry. It includes datasets for various molecular properties, making it useful for researchers in computational chemistry and drug discovery.
GSCDB
JiashuLiang/GSCDB
GSCDB is a comprehensive benchmark database containing 137 datasets with detailed molecular properties such as reaction energies and barrier heights. It serves as a platform for validating density functional approximations and supports the development of machine-learned functionals in computational chemistry.
chemprop_benchmark
chemprop/chemprop_benchmark
Chemprop benchmarking scripts and data provide a framework for evaluating the performance of Chemprop, a message passing neural network designed for predicting molecular properties. The repository includes various benchmarks and datasets that facilitate the assessment of molecular property prediction models.
paccmann_datasets
PaccMann/paccmann_datasets
Pytoda is a Python package that simplifies the handling of biochemical data for deep learning applications using PyTorch. It is particularly useful for researchers working on molecular design and related tasks in computational chemistry.
mmCIF2BioLiP
kad-ecoli/mmCIF2BioLiP
The mmCIF2BioLiP repository provides a web interface and scripts for curating the BioLiP database, which contains biologically relevant ligand-protein interactions. It facilitates the download and organization of data from the PDB, including binding affinities and other molecular information, making it a valuable resource for researchers in molecular biology and drug discovery.
FraGAT
ZiqiaoZhang/FraGAT
FraGAT is a fragment-oriented multi-scale graph attention model aimed at predicting molecular properties. It includes a dataset used for experiments and provides Python files to build and utilize the FraGAT model for molecular property predictions.
MDeePred
cansyl/MDeePred
MDeePred is a tool designed for predicting the binding affinity between bioactive small molecules and target proteins using a novel protein featurization approach. It employs deep learning techniques to enhance the accuracy of predictions, making it useful for drug discovery and repositioning efforts.
confidence-bootstrapping
LDeng0205/confidence-bootstrapping
This tool implements the Confidence Bootstrapping procedure for enhancing protein-ligand docking predictions. It includes pretrained models and datasets for benchmarking, making it useful for researchers in molecular docking and drug discovery.
Predicting-Adverse-Drug-Reactions-with-Machine-Learning
ricardoamferreira/Predicting-Adverse-Drug-Reactions-with-Machine-Learning
This repository develops machine learning methods to predict adverse drug reactions (ADRs) using databases like SIDER and OFFSIDES. It provides tools and models that can be utilized in the drug discovery process to enhance safety and efficacy assessments.
SES-Adapter
tyang816/SES-Adapter
SES-Adapter is a framework designed to improve the representation learning of protein language models by utilizing structure-aware adapters. It enhances performance on tasks such as localization, function prediction, and solubility, while also providing datasets and configuration examples for training.
AlphaSeq_Antibody_Dataset
mit-ll/AlphaSeq_Antibody_Dataset
AlphaSeq_Antibody_Dataset contains two datasets with quantitative binding scores of scFv-format antibodies against a SARS-CoV-2 target peptide. It is designed to support protein representation learning and includes data for machine learning optimization of antibody candidates.
nmrformd
simongravelle/nmrformd
NMRforMD is a Python script designed to compute dipolar NMR relaxation times (T1 and T2) from molecular dynamics trajectory files. It works with various simulation packages and includes datasets for practical use in NMR analysis.
Affinity2Vec
MahaThafar/Affinity2Vec
Affinity2Vec is a tool designed for predicting drug-target binding affinities through the use of representation learning and graph mining techniques. It includes datasets and models for training and evaluating predictions related to molecular interactions.