V0 Labsvzerolabs.ai ↗

DBSCAN Clustering

Density-based spatial clustering: finds clusters of arbitrary shape and identifies noise

Unlike K-Means which forces every point into a cluster, DBSCAN identifies dense regions as clusters and labels sparse points as noise. This is useful for funding data because startups don't form neat spherical groups: there are dense cores of typical companies and scattered outliers.

Parameters: eps=0.8 (neighborhood radius) and min_samples=10 (minimum points to form a cluster).

Clusters Found

11

Noise Points

279

0.8% of data

Total Companies

34,492

PCA Variance

83.3%

2 components

Noise (279)
Cluster 0 (18,964)
Cluster 1 (7,756)
Cluster 2 (1,852)
Cluster 3 (531)
Cluster 4 (3,660)
Cluster 5 (139)
Cluster 6 (948)
Cluster 7 (31)
Cluster 8 (68)
Cluster 9 (239)
Cluster 10 (25)

Outlier Detection

Isolation Forest: anomalous companies that don't fit normal patterns

Isolation Forest detects outliers by randomly partitioning data. Anomalies are isolated in fewer splits than normal points. Contamination is set to 5%: the expected proportion of outliers.

Outliers Found

2,024

Outlier Avg Funding

$162.3M

Normal Avg Funding

$6.8M

Ratio

23.7×

outlier vs normal

Outliers by Status

operating1,760
acquired221
closed43

Top Markets with Outliers

Biotechnology 277
Software 161
Clean Technology 119
Health Care 118
Mobile 75
Enterprise Software 70
E-Commerce 65
Advertising 57
MarketCountryFundingRoundsStatusAnomaly Score
Analytics USA$950.0M12operating-0.208
Automotive USA$823.0M11operating-0.204
File Sharing USA$564.1M12operating-0.204
E-Commerce USA$516.9M11operating-0.203
Communities USA$2.4B11operating-0.202
Online Shopping IND$2.4B11operating-0.202
Construction USA$1.0B13operating-0.198
Automotive USA$1.5B9acquired-0.198
E-Commerce USA$934.7M10operating-0.196
Enterprise Software NLD$1.4B9operating-0.194
Consumer Electronics USA$518.8M11operating-0.192
Peer-to-Peer USA$566.2M10operating-0.191
Manufacturing USA$1.6B8closed-0.190
Information Technology USA$384.4M9acquired-0.188
Solar USA$845.0M9operating-0.188
Clean Technology USA$307.6M11operating-0.187
Health Care USA$291.5M9operating-0.187
Technology USA$866.6M9operating-0.186
Software USA$290.0M10operating-0.186
Analytics USA$1.2B8operating-0.186

DBSCAN vs K-Means

K-Means (used on the Clusters page) assigns every point to exactly one of K clusters. It assumes clusters are spherical and roughly equal-sized. Good for market segmentation where you want clean groups.

DBSCAN doesn't require a pre-set number of clusters. It finds clusters of any shape and explicitly labels noise. Good for anomaly detection: the noise points are the interesting ones.

Isolation Forest is a dedicated anomaly detection method. It builds random trees and measures how quickly each point gets isolated. Faster isolation = more anomalous. Unlike DBSCAN's noise label, it provides a continuous anomaly score.