MotifXplorer – Genomic Peak Analysis Web Tool
Overview
MotifXplorer is a genomic peak analysis web tool built to help researchers and clinicians
analyze ChIP-seq peaks without having to write code or understand the details of machine learning.
How it works:
- Upload your ChIP-seq peak BED file
- Let the platform generate negatives, extract sequences, and train an XGBoost model
- Get back interpretable results:
- Top k-mer–based DNA signatures
- Feature importance metrics
- Optional motif analysis and decision tree visualization
MotifXplorer is being developed in collaboration with Dr. Motoki Takaku’s lab and Mounir Ouadi.
The back end and web front end are fully functional; we are currently finalizing deployment on a public server so external labs can start using it.
Who is it for?
MotifXplorer is designed for:
- Wet-lab biologists and clinicians who work with ChIP-seq peaks and want to explore regulatory patterns
- Genomics researchers who don’t have time to build full ML pipelines from scratch
- Collaborative teams where some members are non-technical but still need to interpret ML results
The main goal is to lower the barrier to exploring DNA sequence patterns and regulatory elements associated with ChIP-seq peaks.
Demo – Preview of MotifXplorer
Below is a short demo of MotifXplorer running through a typical analysis workflow:
Key Capabilities
Genome Selection
- Choose from multiple reference genomes:
- e.g., hg19, hg38, mm9, and others
- e.g., hg19, hg38, mm9, and others
- The appropriate FASTA is selected behind the scenes so users don’t have to worry about file preparation.
Positive & Negative Examples
- Positive set:
- Upload a BED file containing your ChIP-seq peaks.
- Negative set:
- Either upload your own negative BED file, or
- Let MotifXplorer automatically generate negatives by sampling genomic regions matched to your positives.
- Either upload your own negative BED file, or
This makes it easy to set up basic supervised classification without manual negative set curation.
XGBoost-Based Analysis
MotifXplorer uses an XGBoost classifier on k-mer–encoded sequences to learn patterns associated with your peaks.
The web tool:
- Converts sequences into k-mer features (e.g., k = 4–10)
- Trains an XGBoost model
- Evaluates performance and exposes:
- Accuracy, precision, recall, F1-score
- ROC curve
- Feature importances across different metrics
- Accuracy, precision, recall, F1-score
Top Signature DNA Sequences
After training, MotifXplorer extracts and displays:
- The top 10 signature DNA sequences (k-mers)
- Their importance in classification
- How often they contribute to decision splits in the ensemble
This helps highlight candidate regulatory sequence patterns without requiring the user to inspect raw model internals.
Web Workflow
From a user’s perspective, a typical run looks like this:
- Select genome
- e.g., human (hg19 / hg38) or mouse (mm9).
- Upload BED file(s)
- Positive ChIP-seq peaks required
- Optional negative BED (or let the tool generate it)
- Positive ChIP-seq peaks required
- Choose k-mer size
- Any value in a range like 4–10
- Run analysis
- XGBoost model trains on-the-fly
- Performance metrics and plots are generated
- XGBoost model trains on-the-fly
- Interpret results
- View top k-mers and importance metrics
- Optionally run motif analysis
- Inspect the decision tree visualization to better understand how the model separates positives vs negatives
- View top k-mers and importance metrics
Looking for Collaborators & Early Testers
We are interested in connecting with:
- Labs working on ChIP-seq or ATAC-seq
- Researchers studying transcription factor binding and regulatory regions
- Groups looking to interpret ML models on genomic sequence data in a more intuitive way
If you:
- Have ChIP-seq peak datasets and would like to try MotifXplorer, or
- Want to collaborate on extending the platform (e.g., new genomes, new ML models, integration with existing pipelines),
feel free to reach out, my contact information is at the bottom of this page.