This project explores graph simplification through motif clustering, using Wasserstein distance to group structurally similar subgraphs (motifs). It enables users to interactively visualize large networks by replacing repeated substructures with supernodes—without compromising structural integrity.
Large network graphs often contain repeated motifs (e.g., triangles, stars). Visualizing them as-is leads to “hairball” graphs that are hard to interpret.
- Identify recurring motifs in a graph.
- Use Wasserstein distance to measure structural similarity.
- Agglomeratively cluster motifs by threshold-based compression.
- Interactively visualize these changes using D3.js.
| Component | Stack Used |
|---|---|
| Backend | Python, Flask |
| Visualization | D3.js |
| Data Processing | NetworkX, pandas, scikit-learn |
| Math Core | SciPy (Wasserstein distance) |
| Frontend UI | HTML, CSS (with Inter font), JavaScript |
| Plotting (Dev) | matplotlib |
git clone https://github.com/yourusername/graph-motif-compression.git
cd graph-motif-compressionpip install -r requirements.txtpython server.pyJust open index.html in your browser (Google Chrome recommended).
- Set compression threshold with a slider.
- Auto-play button to gradually increase compression.
- Summary panel showing live compression stats.
- Cluster boxes show all unique compressed motif shapes.
- Each box displays how many times that motif occurred.
- Hovering on a box highlights corresponding nodes in the main graph.
- Clicking a compressed node expands its subgraph.
- Number of original vs compressed nodes.
- Compression ratio (%).
Video: Watch a complex Facebook network get compressed into interpretable structures using similarity threshold tuning.
- Motif Extraction: Local neighborhoods are extracted around each node.
- Similarity Calculation: Edge weight distributions of motifs are compared using Wasserstein distance.
- Agglomerative Clustering: Motifs are merged based on similarity threshold.
- Supernode Formation: Similar motifs are replaced with a compressed node.
- D3 Visualization: Main graph + motif gallery + summary panel are dynamically updated.
- Cybersecurity: Detect recurring attack structures (fan-outs, lateral movement).
- Social Networks: Spot repeated interaction patterns across user groups.
- Biology: Visualize repeated biochemical or protein-interaction motifs.
Project developed at the University of Utah, Scientific Computing and Imaging (SCI) Institute, under the guidance of Dr. Paul Rosen.