MotifXplorer – Genomic Peak Analysis Web Tool

Overview

MotifXplorer is now online at this address : motifxplorer.med.und.edu

MotifXplorer is a genomic peak analysis web tool built to help researchers and clinicians analyze transcription factor binding sites (TFBS) from ChIP-seq peak data without having to write code or understand the details of machine learning.

How it works:

Upload your ChIP-seq peak BED file
Let the platform generate negatives, extract sequences, and train an XGBoost model
Get back interpretable results:
- Top k-mer–based DNA signatures
- Feature importance metrics
- Optional motif analysis and decision tree visualization

MotifXplorer was developed in collaboration with Dr. Motoki Takaku’s lab and Mounir Ouadi.

Who is it for?

MotifXplorer is designed for:

Wet-lab biologists and clinicians who work with ChIP-seq peaks and want to explore regulatory patterns
Genomics researchers who don’t have time to build full ML pipelines from scratch
Collaborative teams where some members are non-technical but still need to interpret ML results

The main goal is to lower the barrier to exploring DNA sequence patterns and regulatory elements associated with ChIP-seq peaks.

Demo – Preview of MotifXplorer

Below is a short demo of MotifXplorer running through a typical analysis workflow:

Key Capabilities

Genome Selection

Choose from multiple reference genomes:
- e.g., hg19, hg38, mm9, and others
The appropriate FASTA is selected behind the scenes so users don’t have to worry about file preparation.

Positive & Negative Examples

Positive set:
- Upload a BED file containing your ChIP-seq peaks.
Negative set:
- Either upload your own negative BED file, or
- Let MotifXplorer automatically generate negatives by sampling genomic regions matched to your positives.

This makes it easy to set up basic supervised classification without manual negative set curation.

XGBoost-Based Analysis

MotifXplorer uses an XGBoost classifier on k-mer–encoded sequences to learn patterns associated with your peaks.

The web tool:

Converts sequences into k-mer features (e.g., k = 4–10)
Trains an XGBoost model
Evaluates performance and exposes:
- Accuracy, precision, recall, F1-score
- ROC curve
- Feature importances across different metrics

Top Signature DNA Sequences

After training, MotifXplorer extracts and displays:

The top 10 signature DNA sequences (k-mers)
Their importance in classification
How often they contribute to decision splits in the ensemble

This helps highlight candidate regulatory sequence patterns without requiring the user to inspect raw model internals.

Web Workflow

From a user’s perspective, a typical run looks like this:

Select genome
- e.g., human (hg19 / hg38) or mouse (mm9).
Upload BED file(s)
- Positive ChIP-seq peaks required
- Optional negative BED (or let the tool generate it)
Choose k-mer size
- Any value in a range like 4–10
Run analysis
- XGBoost model trains on-the-fly
- Performance metrics and plots are generated
Interpret results
- View top k-mers and importance metrics
- Optionally run motif analysis
- Inspect the decision tree visualization to better understand how the model separates positives vs negatives

Looking for Collaborators & Early Testers

We are interested in connecting with:

Labs working on ChIP-seq or ATAC-seq
Researchers studying transcription factor binding and regulatory regions
Groups looking to interpret ML models on genomic sequence data in a more intuitive way

If you:

Have ChIP-seq peak datasets and would like to try MotifXplorer, or
Want to collaborate on extending the platform (e.g., new genomes, new ML models, integration with existing pipelines),

feel free to reach out, my contact information is at the bottom of this page.