HPC Resources

Graph formats, converters, datasets and tools for high-performance parallel and distributed graph algorithms.

Graph Formats → Format Converters →


Graph Formats

Standard representations used across our tools and benchmarks. Full specifications at the Graph Formats documentation site.

Format Type Node IDs Description
BGR Binary (CSR) 0-indexed Binary Graph Representation — compact CSR with adaptive uint32/uint64, parallel I/O, weighted/unweighted support
MTX Text (Matrix Market) 1-indexed Widely used text format for sparse matrices — human-readable edge list with header metadata
BVGraph Compressed binary 0-indexed WebGraph’s compressed format — gamma/delta/zeta coded successor lists, used by LAW datasets
ECL Binary (CSR) 0-indexed ECL graph format — simple binary CSR used by ECL-MIS, ECL-CC, ECL-BCC and related GPU algorithms
WGBin Binary 0-indexed WebGraph binary — intermediate raw format for BVGraph decoding pipelines

Graph Format Converters

The graph-format-converters repo converts WebGraph BVGraph compressed graphs (.graph + .properties) from the LAW dataset collection into standard formats.

Tools

Tool Description
bvgraph_gen_offsets Generate .offsets from .graph + .properties (pure C++, no Java)
bvgraph_to_bgr Convert BVGraph → BGR (binary CSR)
bvgraph_to_mtx Convert BVGraph → MTX (Matrix Market text)

Quick Start

# Build C++ tools (no Java required)
make cpp

# Download a graph (example: eu-2005, 862K nodes, 19M edges)
mkdir -p data
wget -P data http://data.law.di.unimi.it/webdata/eu-2005/eu-2005.graph
wget -P data http://data.law.di.unimi.it/webdata/eu-2005/eu-2005.properties

# Generate offsets, then convert to BGR
./bvgraph_gen_offsets data/eu-2005
./bvgraph_to_bgr data/eu-2005 data/eu-2005.bgr 8

Performance

Multi-threaded C++ (4 threads) converts BGR at ~150M edges/s.

Pipeline MTX time BGR time
C++ 0.68s 0.26s
Java 1.76s 0.52s
Speedup 2.6× 2.0×

Benchmarked on eu-2005 (862K nodes, 19M edges), single thread.


Graph Datasets

Browse our curated collection of real-world and synthetic graphs:

  • Large Graphs — Graphs with 500M+ edges (web crawls, social networks, genomics, up to 2.5 trillion edges)
  • Sources — Where to find graph datasets (SNAP, LAW, SuiteSparse, KONECT, BioGraphs, and more)

Repository Description
Graph-Formats Documentation site for graph format specifications (BGR, MTX, BVGraph, ECL, WGBin)
graph-format-converters C++ and Java tools to convert WebGraph BVGraph → BGR / MTX
SCC Analysis Benchmark of 4 parallel SCC algorithms on graphs up to 92 billion edges
BCC-Finding Biconnected components algorithms
PHEM Parallel heterogeneous graph algorithms