/tools
tools tagged “dataset”
awesome-ai4s
hyperai/awesome-ai4s
The 'Awesome AI for Science' repository is a curated collection of resources related to AI applications in various scientific fields, particularly focusing on drug discovery and molecular design. It includes models, datasets, and frameworks that facilitate the prediction and generation of molecular properties and structures.
Biomni
snap-stanford/Biomni
Biomni is a general-purpose biomedical AI agent that enhances research productivity by integrating large language model reasoning with planning and execution. It can predict molecular properties, generate hypotheses, and evaluate biological reasoning tasks, making it a versatile tool in the biomedical field.
papers_for_protein_design_using_DL
Peldom/papers_for_protein_design_using_DL
This repository is a curated list of papers that explore the use of deep learning techniques in protein design. It includes resources on benchmarks and datasets relevant to the field, making it a valuable tool for researchers in computational biology.
alphagenome
google-deepmind/alphagenome
AlphaGenome is an API that offers access to a model for predicting various functional outputs from DNA sequences, including gene expression and variant effects. It is designed for analyzing genomic data and provides tools for visualization and variant scoring.
torchdrug
DeepGraphLearning/torchdrug
TorchDrug is a PyTorch-based machine learning toolbox tailored for drug discovery, enabling users to predict molecular properties and work with graph-structured data. It provides a range of datasets and models for various tasks in molecular machine learning.
TDC
mims-harvard/TDC
The Therapeutics Data Commons (TDC) is an open-source initiative that facilitates the development and evaluation of AI methods for drug discovery. It offers ready-to-use datasets, benchmarks for model comparison, and tools for predicting molecular properties and generating new biomedical entities.
DeepPurpose
kexinhuang12345/DeepPurpose
DeepPurpose is a deep learning library that facilitates the prediction of drug-target interactions, drug properties, and protein functions. It supports various molecular encoding tasks and provides tools for drug repurposing and virtual screening.
materials_discovery
google-deepmind/materials_discovery
The Materials Discovery: GNoME repository provides a dataset of over 520,000 novel stable materials and includes models for discovering new materials using graph networks. It aims to facilitate research in materials science by offering tools for exploring chemical systems and computing material properties.
coronavirus
FoldingAtHome/coronavirus
This repository contains input files and datasets for the Folding@home efforts to understand and target the SARS-CoV-2 virus with small molecule and antibody therapeutics. It supports molecular dynamics simulations and provides resources for researchers working on COVID-19 related molecular studies.
moses
molecularsets/moses
MOSES is a benchmarking platform for molecular generation models that facilitates research in drug discovery by providing datasets and metrics to evaluate the quality and diversity of generated molecules. It implements various generative models and standardizes the evaluation process for molecular generation.
ToolUniverse
mims-harvard/ToolUniverse
ToolUniverse is a platform that democratizes the creation of AI scientists by integrating a wide range of machine learning models, datasets, and scientific tools. It enables users to perform tasks related to molecular property prediction, molecular design, and scientific workflows, making it a versatile tool in computational chemistry and molecular biology.
papers-for-molecular-design-using-DL
AspirinCode/papers-for-molecular-design-using-DL
This repository provides a comprehensive list of papers and resources related to molecular and material design using generative AI and deep learning techniques. It covers various methodologies for drug design, molecular optimization, and includes datasets and benchmarks relevant to the field.
proteinnet
aqlaboratory/proteinnet
ProteinNet is a standardized dataset designed for machine learning of protein structures, offering sequences, structures, and multiple sequence alignments. It aims to facilitate research in protein structure prediction by providing a consistent framework for training and validation across various methods.
dynamicPDB
fudan-generative-vision/dynamicPDB
Dynamic PDB is a large-scale dataset that integrates dynamic behaviors and physical properties in protein structures, enhancing existing protein databases. It includes data from all-atom molecular dynamics simulations, capturing conformational changes and various physical attributes of proteins.
chemicalx
AstraZeneca/chemicalx
ChemicalX is a deep learning library designed for predicting drug-drug interactions and polypharmacy effects. It provides integrated benchmark datasets and state-of-the-art models for drug pair scoring, making it a valuable tool in the field of computational chemistry and drug discovery.
AIRS
divelab/AIRS
AIRS is an open-source collection of software tools and datasets focused on artificial intelligence applications in quantum, atomistic, and continuum systems. It includes resources for predicting molecular properties, designing molecules, and conducting simulations, making it highly relevant to the fields of computational chemistry and molecular biology.
AI2BMD
microsoft/AI2BMD
AI2BMD is a program designed for efficient ab initio biomolecular dynamics simulations of proteins. It includes capabilities for simulating molecular dynamics, preprocessing protein structures, and provides datasets for training molecular models.
chembl_webresource_client
chembl/chembl_webresource_client
The ChEMBL webresource client is an official Python library that allows users to access ChEMBL's extensive database of bioactive molecules and their properties. It simplifies the process of retrieving molecular data through a user-friendly interface, making it a valuable resource for cheminformatics and molecular property analysis.
GeoDiff
MinkaiXu/GeoDiff
GeoDiff implements a geometric diffusion model for generating molecular conformations, which is crucial for understanding molecular structures. It also provides tools for property prediction and includes datasets for training and evaluation, making it a valuable resource in computational chemistry and molecular biology.
atomworks
RosettaCommons/atomworks
AtomWorks is an open-source platform that accelerates biomolecular modeling tasks by providing a toolkit for parsing, cleaning, and manipulating biological data. It includes advanced features for dataset featurization and sampling, making it suitable for deep learning applications in molecular biology.
ProteinGym
OATML-Markslab/ProteinGym
ProteinGym is a comprehensive repository of benchmarks for Deep Mutational Scanning (DMS) assays, allowing for the evaluation of various mutation effect predictors. It includes extensive datasets of clinical variants and experimental measurements, facilitating research in protein design and mutation effect prediction.
scikit-fingerprints
scikit-fingerprints/scikit-fingerprints
Scikit-fingerprints is a Python library designed for efficient computation of molecular fingerprints, which are crucial in drug discovery and chemical analysis. It provides a scikit-learn compatible interface, allowing users to integrate molecular fingerprints into machine learning workflows and access popular benchmark datasets.
MolCLR
yuyangw/MolCLR
MolCLR is an implementation of a contrastive learning framework for molecular representation learning using graph neural networks. It enhances the performance of models on various molecular property prediction tasks and provides datasets for pre-training and fine-tuning.
torch-molecule
liugangcode/torch-molecule
torch-molecule is a deep learning package designed for molecular discovery, featuring an sklearn-style interface for property prediction, inverse design, and representation learning. It supports various molecular tasks and includes datasets for training models on molecular properties.