/tools
tools tagged “dataset”
GraSeq
zhichunguo/GraSeq
GraSeq is a tool designed for predicting molecular properties through graph and sequence fusion learning. It provides implementations for both single-task and multi-task classification of various molecular datasets, making it a valuable resource for researchers in computational chemistry.
NAG2G
dptech-corp/NAG2G
NAG2G is a neural network model designed for predicting retrosynthesis pathways in molecular chemistry. It supports enhanced stereochemistry features and provides datasets and pretrained weights for effective model validation and usage.
S-CGIB
NSLab-CUK/S-CGIB
S-CGIB is a pre-training architecture for Graph Neural Networks aimed at predicting molecular properties without human annotations. It utilizes self-supervised learning to generate graph-level representations and has been tested on various molecular datasets.
rxn-reaction-preprocessing
rxn4chemistry/rxn-reaction-preprocessing
The RXN reaction preprocessing repository provides tools for preprocessing datasets of chemical reactions, including standardization, filtering, and data augmentation. It facilitates the creation of flexible data pipelines for chemical reaction data, which is crucial for various molecular applications.
AttentionDTA_TCBB
zhaoqichang/AttentionDTA_TCBB
AttentionDTA_TCBB is a tool designed for predicting drug-target binding affinities using a sequence-based deep learning model with an attention mechanism. It includes datasets for training and testing the model, making it relevant for molecular property prediction in drug discovery.
UniSim
yaledeus/UniSim
UniSim is a unified simulator designed for time-coarsened dynamics of biomolecules, utilizing generative models to simulate molecular interactions. It provides tools for training and evaluating models on various datasets related to small molecules and proteins.
exahustive_search_mol2mol
MolecularAI/exahustive_search_mol2mol
This repository provides tools for exhaustive exploration of chemical space using a transformer model. It includes functionalities for generating molecular structures, preprocessing datasets, and computing molecular fingerprints, making it a valuable resource for molecular design and cheminformatics.
ps4-dataset
omarperacha/ps4-dataset
The PS4 Dataset is the largest open-source dataset for predicting protein single sequence secondary structure. It includes methods for validation and evaluation of secondary structure prediction models, making it a valuable resource for researchers in protein structure prediction.
ibenchmark
ci-lab-cz/ibenchmark
iBenchmark is a collection of datasets and performance metrics for evaluating the structural interpretation of QSAR models. It includes synthetic datasets designed for regression and classification tasks, focusing on the contributions of atoms in molecular structures.
dftio
deepmodeling/dftio
dftio is a tool designed to assist machine learning communities by transcribing and manipulating DFT output into a format suitable for machine learning models. It supports various DFT software and provides functionalities for parsing different molecular properties from the output data.
Molecule-Generator
DaoyuanLi2816/Molecule-Generator
Molecule-Generator is a Variational Autoencoder-based tool that generates synthetic SMILES strings for molecules composed of specific repeat units. It allows for the creation of a large dataset of molecular representations and facilitates the generation of new molecular structures through perturbation in the latent space.
Denoise-Pretrain-ML-Potential
yuyangw/Denoise-Pretrain-ML-Potential
Denoise-Pretrain-ML-Potential provides an implementation for denoise pretraining on non-equilibrium molecular conformations to enhance the accuracy and transferability of neural potentials. It utilizes graph neural networks and includes datasets for training and fine-tuning models for molecular potential predictions.
SGNN-EBM
chao1224/SGNN-EBM
SGNN-EBM is a tool designed for structured multi-task learning aimed at predicting molecular properties. It includes a novel dataset for drug discovery and proposes a state graph neural network-energy based model for effective task modeling.
AutoGraph
BorgwardtLab/AutoGraph
AutoGraph is a scalable autoregressive model designed for generating molecular graphs by flattening them into sequences. It achieves state-of-the-art performance on various molecular benchmarks and supports both unconditional and substructure-conditioned generation.
qchem
icanswim/qchem
The 'qchem' repository is a framework for molecular modeling that combines machine learning and quantum chemistry to explore molecular properties and datasets. It provides tools for implementing models and datasets in a modular and extendable manner, facilitating research in molecular simulations and property predictions.
molecular_synthesis_and_reconstruction
leonardopicchiami/molecular_synthesis_and_reconstruction
This repository contains a deep learning project aimed at reconstructing and generating molecules from low-dimensional representations. It utilizes Variational Autoencoders (VAEs) and employs the ZINC250K dataset for training and evaluation, making it a relevant tool for molecular design.
qm9star_query
gentle1999/qm9star_query
The qm9star_query repository facilitates access to the QM9star database, which includes two million DFT-computed molecular structures. It also provides tools for querying the database and training neural network models using the dataset, making it useful for molecular property prediction and research.
catnip
gomesgroup/catnip
CATNIP is a tool designed to facilitate the prediction of enzyme compatibility with small molecules in biocatalysis. It utilizes machine learning models and a curated dataset to navigate between chemical and protein sequence spaces, aiming to streamline biocatalytic synthetic strategies.
DefiNet
Shen-Group/DefiNet
DefiNet is a tool designed for processing datasets related to high-density and low-density defects in materials. It includes functionalities for data preprocessing, model training, and predicting relaxed structures, making it relevant for molecular simulations and materials science.
dualbind
NVIDIA-Digital-Bio/dualbind
DualBind is a deep learning model designed for accurate and fast prediction of protein-ligand binding affinities. It includes a benchmark dataset, ToxBench, which provides a large-scale collection of protein-ligand complexes and their binding free energies.
ReLMole
Meteor-han/ReLMole
ReLMole is a tool for molecular representation learning that utilizes two-level graph similarities. It includes datasets for training and fine-tuning models for predicting molecular properties and drug-drug interactions.
ActiveLearning_BindingAffinity
meyresearch/ActiveLearning_BindingAffinity
This repository benchmarks active learning protocols for predicting ligand binding affinities using various datasets corresponding to different protein targets. It evaluates the performance of machine learning models in identifying top binders, providing valuable insights for computational drug discovery.
EquiHGNN
HySonLab/EquiHGNN
EquiHGNN is a framework for scalable rotationally equivariant hypergraph neural networks aimed at improving molecular modeling. It integrates symmetry-aware representations to enhance predictions of molecular properties using various datasets, including QM9 and PCQM4Mv2.
CB513_dataset
taneishi/CB513_dataset
The CB513_dataset repository contains datasets for predicting protein secondary structures, specifically designed for use in deep learning models. It includes filtered datasets that facilitate training and evaluation of models aimed at understanding protein structures.