/tools
tools tagged “dataset”
PAR-NeurIPS21
tata1661/PAR-NeurIPS21
The PAR-NeurIPS21 repository provides a PyTorch implementation of Property-Aware Relation Networks for predicting molecular properties in a few-shot learning context. It includes datasets like Tox21 and SIDER, making it a valuable tool for drug discovery and molecular property prediction.
Molecular-graph-BERT
zhang-xuan1314/Molecular-graph-BERT
Molecular-graph-BERT is a tool designed for semi-supervised learning aimed at predicting molecular properties. It includes functionalities for pre-training and fine-tuning models on specific tasks, as well as building datasets for molecular property prediction.
Uni-FEP-Benchmarks
dptech-corp/Uni-FEP-Benchmarks
Uni-FEP-Benchmarks is a benchmark dataset aimed at systematically evaluating the Uni-FEP method for binding free energy calculations. It compiles diverse protein-ligand systems and chemical transformations to facilitate the validation and optimization of the Uni-FEP methodology, contributing to advancements in drug discovery.
load-atoms
jla-gardner/load-atoms
The `load-atoms` package is designed for loading open access datasets related to atomistic materials science. It facilitates the downloading and manipulation of datasets, making it useful for researchers in computational chemistry and materials science.
PropMolFlow
Liu-Group-UF/PropMolFlow
PropMolFlow is a tool for property-guided molecular generation using SE(3) equivariant flow matching. It includes datasets for training, benchmarks for molecular properties, and supports the generation of molecules conditioned on specific properties.
CASP-and-dataset-performance
reymond-group/CASP-and-dataset-performance
This repository contains source code and documentation for a computer-assisted synthesis planning tool used to analyze reaction datasets in organic chemistry. It facilitates the extraction of templates and the training of policies for synthetic route generation.
GREA
liugangcode/GREA
GREA is a source code repository for a method that enhances graph neural networks for predicting molecular properties, particularly in the context of polymers. It includes implementations for various molecular datasets and provides tools for conducting experiments on these datasets.
DiffPhore
VicFisher/DiffPhore
DiffPhore is a tool that implements a knowledge-guided diffusion model for mapping ligands to pharmacophores in 3D space. It enhances virtual screening capabilities and provides datasets for pharmacophore-ligand pairs, making it useful for drug discovery applications.
runs-n-poses
plinder-org/runs-n-poses
Runs N' Poses is a benchmark dataset designed for evaluating protein-ligand co-folding prediction methods. It includes various metrics and data formats to facilitate machine learning applications in molecular biology, particularly for assessing the generalization of prediction models.
overlapping_assays
rinikerlab/overlapping_assays
This repository provides code and datasets for analyzing IC50 and Ki values from various sources, highlighting the noise in these measurements. It includes curated datasets from ChEMBL32 and tools for generating results related to molecular property prediction.
SPROF-GO
biomed-AI/SPROF-GO
SPROF-GO is a tool for predicting protein functions from sequences using a pretrained language model and homology-based label diffusion. It offers fast and accurate predictions and includes datasets and models for users interested in reproducing the results.
PlatonicTransformers
niazoys/PlatonicTransformers
Platonic Transformers is a framework that integrates geometric group theory into transformer architectures to enhance molecular property prediction. It supports various molecular datasets, including QM9 for quantum chemistry properties and OMol for molecular learning tasks.
paccmann_proteomics
PaccMann/paccmann_proteomics
PaccMann Proteomics provides a framework for protein language modeling using transformer architectures to predict protein classification and binding interactions. It utilizes self-supervised learning techniques to handle unlabeled protein sequences and offers pre-trained models and datasets for various protein-related tasks.
Medea
mims-harvard/Medea
Medea is an AI agent that accelerates therapeutic discovery by integrating diverse data modalities and computational resources to identify therapeutic targets and predict drug responses. It includes modules for research planning, analysis, and literature reasoning, making it a comprehensive tool for molecular biology applications.
SPECTRA
mims-harvard/SPECTRA
SPECTRA is a Python toolkit that provides a spectral framework for evaluating the generalizability of biomedical AI models, particularly in the context of molecular datasets. It allows users to define spectral properties and generate train-test splits to assess model performance across varying degrees of data overlap.
MOF_ChemUnity
AI4ChemS/MOF_ChemUnity
MOF-ChemUnity is a knowledge graph that unifies computational and experimental data for over 15,000 metal-organic frameworks. It allows users to query and retrieve information about MOFs, including their properties and applications, enhancing the understanding and research of these materials.
GLAM
yvquanli/GLAM
GLAM is a software tool that implements an adaptive graph learning method to automate the prediction of molecular interactions and properties. It aims to improve drug discovery efficiency by providing robust and interpretable predictions across various datasets.
biomed-multi-view
BiomedSciAI/biomed-multi-view
The biomed-multi-view repository features the Multi-view Molecular Embedding with Late Fusion (MMELON) architecture, which aggregates molecular representations from images, graphs, and text to enhance predictions of molecular properties. It is applicable to various tasks including ligand-protein binding and molecular solubility, utilizing a large dataset of molecules for training and evaluation.
COMP6
isayev/COMP6
The COMP6 repository contains a benchmark suite for assessing the performance of machine-learning molecular potentials. It includes results and methodologies for evaluating the accuracy of these models in predicting molecular properties.
InfoAlign
liugangcode/InfoAlign
InfoAlign is a tool that learns molecular representations by integrating molecular structures, cell morphology, and gene expressions. It provides functionalities for fine-tuning models on various molecular datasets, making it useful for tasks related to molecular property prediction.
Protein-Language-Models
ISYSLAB-HUST/Protein-Language-Models
The Protein-Language-Models repository provides a systematic review of protein language models, covering their architectures, evaluation metrics, and relevant datasets. It also introduces tools for ongoing research in the field of protein modeling and analysis.
NeuralMD
chao1224/NeuralMD
NeuralMD is a tool designed for simulating protein-ligand binding dynamics using a multi-grained symmetric differential equation model. It includes datasets for training and evaluating models, making it relevant for molecular simulations and property predictions in drug discovery.
MolRL-MGPT
HXYfighter/MolRL-MGPT
MolRL-MGPT is a code repository for a NeurIPS 2023 paper that presents a method for de novo drug design using multiple GPT agents in a reinforcement learning framework. It incorporates large molecular datasets and benchmarks for evaluating the generated molecules, making it a relevant tool in the field of molecular design and drug discovery.
qca-dataset-submission
openforcefield/qca-dataset-submission
The qca-dataset-submission repository contains scripts for generating and submitting datasets to the QCArchive, facilitating the lifecycle of molecular data submissions. It supports the creation of datasets that can be used for various molecular property predictions and computational chemistry tasks.