/tools
tools tagged “dataset”
PrismNet
kuixu/PrismNet
PrismNet is a deep learning framework designed to predict dynamic cellular protein-RNA interactions by utilizing in vivo RNA structure. It includes scripts for training models, evaluating performance, and preparing datasets for research in molecular biology.
SkipGNN
kexinhuang12345/SkipGNN
SkipGNN is a tool designed to predict molecular interactions by leveraging skip-graph networks, which consider both direct and second-order interactions in molecular networks. It provides datasets for drug-target and drug-drug interactions, making it useful for applications in drug discovery and molecular property prediction.
GNPS_Workflows
CCMS-UCSD/GNPS_Workflows
GNPS_Workflows provides a collection of public workflows for analyzing mass spectrometry data, focusing on molecular networking and metabolomics. It facilitates the exploration and interpretation of complex molecular data, making it a valuable resource for researchers in the field.
Revisiting-PLMs
elttaes/Revisiting-PLMs
This repository explores evolution-aware protein language models to predict protein functions. It provides datasets related to metal ion binding and antibiotic resistance, making it a valuable resource for researchers in molecular biology and protein analysis.
CADD_Vault
DrugBud-Suite/CADD_Vault
The CADD Vault is an open-source repository that offers a comprehensive collection of resources and tools for computer-aided drug design. It includes materials on virtual screening, molecular dynamics simulations, and machine learning applications, making it a valuable resource for researchers in the field.
rna3db
marcellszi/rna3db
RNA3DB is a dataset of non-redundant RNA structures from the PDB, designed for training and benchmarking deep learning models focused on RNA structure prediction. It includes various RNA chains labeled with non-coding RNA families and provides tools for customizing and building the dataset.
LucaVirus
LucaOne/LucaVirus
LucaVirus is a tool designed to model the evolutionary and functional landscape of viruses using a unified genome-protein language model. It provides datasets of viral sequences and supports various tasks such as predicting protein properties and classifying viral sequences.
otter-knowledge
IBM/otter-knowledge
The Otter Knowledge repository enhances protein sequence and SMILES drug databases with a multimodal knowledge graph, improving predictions on drug-target binding affinity benchmarks. It provides pre-trained models and datasets for representation learning in drug discovery.
LP-PDBBind
THGLab/LP-PDBBind
LP-PDBBind is a repository that develops scoring functions using the PDBBind dataset, providing tools for dataset creation and model retraining for predicting molecular properties such as binding affinities. It includes compiled datasets and scripts for preparing and analyzing protein-ligand complexes.
3D-MIL-QSAR
cimm-kzn/3D-MIL-QSAR
3D-MIL-QSAR is a repository that implements a multi-instance learning approach for predicting the bioactivity of molecules based on their conformers. It includes datasets extracted from ChEMBL and provides a modeling pipeline for molecular machine learning.
CodonFM
NVIDIA-Digital-Bio/CodonFM
CodonFM is an open-source suite of foundation models trained on codon sequences to learn contextual representations for various downstream tasks, including mutation prediction and evaluation of translation efficiency. It provides pre-trained models and tools for working with protein-coding sequences, making it a valuable resource in molecular biology.
metl
gitter-lab/metl
The METL framework provides tools for pretraining and finetuning biophysics-informed protein language models, enabling users to train models on mutational data and generate predictions. It includes datasets for training and examples for running inference, making it a valuable resource for protein engineering and design.
EGNO
MinkaiXu/EGNO
The EGNO repository implements an Equivariant Graph Neural Operator designed for modeling 3D dynamics, particularly in the context of molecular dynamics simulations. It includes data preprocessing for various datasets, including those relevant to protein molecular dynamics, making it a useful tool for researchers in computational chemistry.
grappa
graeter-group/grappa
Grappa is a machine learned molecular mechanics force field that utilizes graph neural networks to predict bonded parameters for molecular simulations. It integrates with GROMACS and OpenMM, allowing users to parametrize systems and train custom models using various molecular datasets.
OpenQDC
valence-labs/OpenQDC
OpenQDC is an open-source hub that consolidates over 40 quantum mechanics datasets, making them readily available for machine learning applications in molecular property prediction. It supports the download of a vast array of quantum data, facilitating research in computational chemistry.
lohi_splitter
SteshinSS/lohi_splitter
Lo-Hi Splitter is a tool designed for partitioning molecular datasets to facilitate drug discovery tasks such as lead optimization and hit identification. It employs methods to ensure that training and test sets are distinct, improving the predictive performance of models in drug discovery applications.
mdCATH
compsciencelab/mdCATH
The mdCATH repository contains scripts and notebooks for generating and analyzing the mdCATH dataset, which is focused on molecular dynamics trajectories. It provides tools for users to visualize and work with the dataset, making it useful for research in computational biophysics.
data-repo_plm-finetune-eval
RSchmirler/data-repo_plm-finetune-eval
This repository provides data and notebooks for fine-tuning protein language models to enhance predictions across diverse tasks. It includes training datasets and examples for generating embeddings and training models, making it a useful resource for molecular machine learning in protein-related applications.
GrASP
tiwarylab/GrASP
GrASP is a tool that utilizes graph neural networks to predict druggable binding sites in proteins. It provides datasets and a framework for evaluating binding site predictions, making it useful for drug discovery applications.
Samsung-AI-Challenge-for-Scientific-Discovery
affjljoo3581/Samsung-AI-Challenge-for-Scientific-Discovery
This repository contains the implementation of MoT, a transformer-based model for predicting molecular properties from 3D molecular structures. It was developed as part of the Samsung AI Challenge for Scientific Discovery and utilizes large-scale datasets from PubChem for training and evaluation.
SAMPL6
samplchallenges/SAMPL6
The SAMPL6 repository contains challenge inputs and results for predicting molecular properties, specifically focusing on pKa and logP values of small molecules. It serves as a benchmark for evaluating computational methods in predicting these properties, providing datasets and performance evaluations for participants.
flexdock
vsomnath/flexdock
FlexDock is a tool designed for flexible docking and relaxation of molecular complexes, aiming to improve the accuracy of docking predictions. It includes functionalities for preparing input data, running models, and training on relevant datasets, making it useful for researchers in computational chemistry and drug discovery.
hignn
idrugLab/hignn
HiGNN is a hierarchical graph neural network framework designed for predicting molecular properties by leveraging molecular graphs and BRICS fragments. It includes datasets for training and demonstrates its effectiveness on various drug discovery-related tasks.
RamaNet
sarisabban/RamaNet
RamaNet is a tool that utilizes machine learning and PyRosetta to autonomously generate novel helical protein structures. It performs de novo design by creating a topology and optimizing the sequence to fit, while also providing datasets for training the neural network.