BGR (Binary Graph Representation)
Compact binary CSR format with adaptive integer sizes
Overview
BGR is a compact binary format that stores graphs in Compressed Sparse Row (CSR) layout with automatic integer width selection. It is designed for high-performance graph analytics on shared-memory systems.
Key characteristics:
- Binary CSR (Compressed Sparse Row) format — no text parsing overhead
- Supports both weighted and unweighted graphs
- Adaptive integer sizes: uses uint32 when possible, uint64 for large graphs (>2³² nodes or edges)
- 1-byte header with bit flags for format detection
- Node IDs are 0-indexed (first node is 0)
- Designed for parallel I/O with
pread()/pwrite() - Used by SCC Analysis and other HPC graph algorithm benchmarks
File Layout
A BGR file consists of a fixed header followed by three (or four) contiguous arrays:
Offset 0: [1 byte] Header flags
Offset 1: [N bytes] num_nodes (uint32 or uint64)
Offset 1+N: [E bytes] num_edges (uint32 or uint64)
Offset 1+N+E: [(num_nodes+1) × E] row_ptr array
Offset ...: [num_edges × N] col_idx array
(if weighted): [num_edges × W] edge_weights array
Where:
| Symbol | Meaning |
|---|---|
| N | Integer size for node counts/IDs — 4 bytes (uint32) or 8 bytes (uint64) |
| E | Integer size for edge counts/indices — 4 bytes (uint32) or 8 bytes (uint64) |
| W | Weight size (format-dependent, typically 4 bytes float or 8 bytes double) |
All multi-byte integers are stored in little-endian byte order (native x86/x86-64).
Header Byte
The first byte of the file encodes format flags as individual bits:
Bit: 7 6 5 4 3 2 1 0
── ── ── ── ── ── ── ──
reserved │ │ │
│ │ └─ weighted (0 = no, 1 = yes)
│ └──── node ID size (0 = uint32, 1 = uint64)
└─────── edge count size (0 = uint32, 1 = uint64)
| Bit | Mask | Name | Meaning |
|---|---|---|---|
| 0 | 0x01 | weighted | 0 = unweighted graph, 1 = weighted graph |
| 1 | 0x02 | large_ids | 0 = uint32 node IDs, 1 = uint64 node IDs |
| 2 | 0x04 | large_edges | 0 = uint32 edge counts, 1 = uint64 edge counts |
| 3–7 | reserved | Must be zero |
Common header values
| Header | Nodes | Edges | Weighted | Description |
|---|---|---|---|---|
0x00 | uint32 | uint32 | No | Small unweighted graph (<4B nodes, <4B edges) |
0x02 | uint64 | uint32 | No | Large node count, few edges per node |
0x04 | uint32 | uint64 | No | Small node count, many edges (>4B) |
0x06 | uint64 | uint64 | No | Large graph (>4B nodes and edges) |
0x01 | uint32 | uint32 | Yes | Small weighted graph |
0x03 | uint64 | uint32 | Yes | Large weighted, few edges per node |
0x07 | uint64 | uint64 | Yes | Large weighted graph |
Integer sizes are selected automatically based on magnitude: uint64 is used when num_nodes > 0xFFFFFFFF or num_edges > 0xFFFFFFFF.
CSR Layout
BGR uses Compressed Sparse Row (CSR) representation, which stores the adjacency structure of a graph in two arrays:
row_ptr (row pointer array)
- Length: num_nodes + 1 entries
- Each entry is an edge index (uint32 or uint64 based on the
large_edgesflag) row_ptr[i]= index intocol_idxwhere nodei’s neighbors beginrow_ptr[i+1] - row_ptr[i]= out-degree of nodeirow_ptr[0]is always0row_ptr[num_nodes]equalsnum_edges
col_idx (column index array)
- Length: num_edges entries
- Each entry is a node ID (uint32 or uint64 based on the
large_idsflag) col_idx[row_ptr[i] .. row_ptr[i+1]-1]= sorted neighbor list of nodei
edge_weights (optional)
- Present only when the
weightedflag is set (bit 0 of header) - Length: num_edges entries
- Each entry is a weight value (size depends on implementation)
edge_weights[k]corresponds to the edgecol_idx[k]
Diagram
row_ptr: [ 0 | 2 | 3 | 5 | 6 ] (num_nodes + 1 = 5 entries)
col_idx: [ 1 | 2 | 0 | 2 | 3 | 1 ] (num_edges = 6 entries)
Node 0: neighbors at col_idx[0..1] → {1, 2}
Node 1: neighbors at col_idx[2..2] → {0}
Node 2: neighbors at col_idx[3..4] → {2, 3}
Node 3: neighbors at col_idx[5..5] → {1}
Example
Consider a small directed graph with 4 nodes and 6 edges:
0 → 1, 2
1 → 0
2 → 2, 3
3 → 1
CSR arrays (0-indexed)
row_ptr = [0, 2, 3, 5, 6]
col_idx = [1, 2, 0, 2, 3, 1]
BGR binary (hex dump, annotated)
Since num_nodes=4 and num_edges=6 (both fit in uint32), the header is 0x00:
Offset Hex Meaning
------ ----------------------------------- ---------------------------
0x00 00 Header: uint32/uint32, unweighted
0x01 04 00 00 00 num_nodes = 4
0x05 06 00 00 00 num_edges = 6
0x09 00 00 00 00 row_ptr[0] = 0
0x0D 02 00 00 00 row_ptr[1] = 2
0x11 03 00 00 00 row_ptr[2] = 3
0x15 05 00 00 00 row_ptr[3] = 5
0x19 06 00 00 00 row_ptr[4] = 6
0x1D 01 00 00 00 col_idx[0] = 1 (node 0 → 1)
0x21 02 00 00 00 col_idx[1] = 2 (node 0 → 2)
0x25 00 00 00 00 col_idx[2] = 0 (node 1 → 0)
0x29 02 00 00 00 col_idx[3] = 2 (node 2 → 2)
0x2D 03 00 00 00 col_idx[4] = 3 (node 2 → 3)
0x31 01 00 00 00 col_idx[5] = 1 (node 3 → 1)
Total file size: 1 + 4 + 4 + (5 × 4) + (6 × 4) = 53 bytes
Reading BGR Files (C++)
#include <cstdio>
#include <cstdint>
#include <vector>
struct BGRGraph {
uint64_t num_nodes;
uint64_t num_edges;
bool weighted;
std::vector<uint64_t> row_ptr;
std::vector<uint64_t> col_idx;
};
BGRGraph read_bgr(const char* filename) {
BGRGraph g;
FILE* fp = fopen(filename, "rb");
// Read header
uint8_t header;
fread(&header, 1, 1, fp);
g.weighted = header & 0x01;
bool large_ids = header & 0x02;
bool large_edges = header & 0x04;
// Read num_nodes
if (large_ids) {
fread(&g.num_nodes, 8, 1, fp);
} else {
uint32_t n; fread(&n, 4, 1, fp);
g.num_nodes = n;
}
// Read num_edges
if (large_edges) {
fread(&g.num_edges, 8, 1, fp);
} else {
uint32_t m; fread(&m, 4, 1, fp);
g.num_edges = m;
}
// Read row_ptr: (num_nodes + 1) entries
g.row_ptr.resize(g.num_nodes + 1);
if (large_edges) {
fread(g.row_ptr.data(), 8, g.num_nodes + 1, fp);
} else {
std::vector<uint32_t> buf(g.num_nodes + 1);
fread(buf.data(), 4, g.num_nodes + 1, fp);
for (uint64_t i = 0; i <= g.num_nodes; i++)
g.row_ptr[i] = buf[i];
}
// Read col_idx: num_edges entries
g.col_idx.resize(g.num_edges);
if (large_ids) {
fread(g.col_idx.data(), 8, g.num_edges, fp);
} else {
std::vector<uint32_t> buf(g.num_edges);
fread(buf.data(), 4, g.num_edges, fp);
for (uint64_t i = 0; i < g.num_edges; i++)
g.col_idx[i] = buf[i];
}
fclose(fp);
return g;
}
For large graphs, consider memory-mapping the file with mmap() and using pread() for parallel access instead of sequential fread().
Writing BGR Files
Pseudocode for writing a graph in CSR form to BGR:
function write_bgr(filename, num_nodes, num_edges, row_ptr, col_idx):
large_ids = (num_nodes > 0xFFFFFFFF)
large_edges = (num_edges > 0xFFFFFFFF)
header = 0x00
if weighted: header |= 0x01
if large_ids: header |= 0x02
if large_edges: header |= 0x04
open file for binary writing
write header (1 byte)
write num_nodes as uint32 or uint64 (4 or 8 bytes)
write num_edges as uint32 or uint64 (4 or 8 bytes)
write row_ptr[] as uint32[] or uint64[] ((num_nodes+1) entries)
write col_idx[] as uint32[] or uint64[] (num_edges entries)
if weighted:
write edge_weights[] (num_edges entries)
close file
For best write performance on large graphs, pre-allocate the file with ftruncate() and use parallel pwrite() calls to fill different sections concurrently.
Performance Notes
- No parsing overhead: Binary format eliminates the text-to-integer conversion cost of MTX
- Memory-mapped I/O: The contiguous array layout enables
mmap()for zero-copy access - Parallel I/O: Known array offsets allow concurrent
pread()/pwrite()across threads - Compact: A uint32 BGR file uses 4 bytes per edge vs. ~8–12 bytes per edge in text MTX
- Typical throughput: 75–120 M edges/s for conversion from BVGraph (multi-threaded C++)
File size comparison (indochina-2004: 7.4M nodes, 194M edges)
| Format | File size | Notes |
|---|---|---|
| BVGraph (.graph) | ~110 MB | Compressed bitstream |
| BGR (.bgr) | ~785 MB | Binary CSR (uint32) |
| MTX (.mtx) | ~2.4 GB | Text edge list |
Tools
Tools for converting to and from BGR format:
| Tool | Description | Language |
|---|---|---|
bvgraph_to_bgr | Convert BVGraph to BGR (multi-threaded) | C++ |
WG2BGR | Convert BVGraph to BGR (WebGraph library) | Java |
bgr2scc | Convert BGR to algorithm-specific formats (for SCC benchmarking) | C++ |
All tools are available at: github.com/HPC-Heterogeneous-Graph-Algorithms/graph-format-converters