Startup Success Prediction
Will your startup get acquired or close? Seven classification models trained on 5,500 startup outcomes.
Select a market, country, funding range and number of rounds below. The model returns the predicted probability of acquisition based on historical patterns from the Crunchbase dataset. This is not financial advice: it's a statistical lookup based on what happened to similar companies.
Your Startup Profile
Probability of Acquisition
55.0%
Based on Software startups in United States with 1M-10M funding across 2 round(s)
Model Performance Comparison
5-fold cross-validated metrics: seven models from linear to ensemble to instance-based
| Model↕ | Accuracy↕ | Precision↕ | Recall↕ | F1↓ | ROC-AUC↕ |
|---|---|---|---|---|---|
| Random ForestBest | 74.9% | 74.5% | 75.7% | 75.1% | 81.8% |
| XGBoost | 74.7% | 74.4% | 75.4% | 74.9% | 82.4% |
| Decision Tree | 69.1% | 69.2% | 69.0% | 69.1% | 72.7% |
| SVM (RBF) | 68.2% | 69.4% | 65.1% | 67.2% | 74.9% |
| Logistic Regression | 67.7% | 68.6% | 65.4% | 66.9% | 73.5% |
| K-Nearest Neighbors | 68.6% | 71.1% | 62.9% | 66.7% | 74.8% |
| Gaussian Naive Bayes | 59.3% | 76.3% | 27.3% | 40.0% | 71.2% |
Feature Importances
What matters most in predicting startup success
How This Works
The prediction is a pre-computed lookup. We trained seven classifiers on ~5,500 startups that had a definitive outcome (acquired or closed), then selected the best performer by F1 score to generate predictions for every valid combination of market, country, funding bucket and round count.
Why these seven models? They cover every major paradigm in machine learning. Logistic Regression is the linear baseline: interpretable coefficients, calibrated probabilities. Decision Tree is a single tree: pure interpretability, shows what splits matter. Random Forest averages many trees to reduce variance. XGBoost builds trees sequentially, correcting errors (current SOTA for tabular data). SVM (RBF) finds a non-linear boundary in kernel space; different inductive bias from trees. K-Nearest Neighbors is instance-based: no model at all, just similarity lookup. Gaussian Naive Bayes is probabilistic: fast, assumes feature independence, useful as a calibrated baseline.
Accuracy is the overall correct prediction rate. Precision measures how many predicted acquisitions were actually acquired. Recall measures how many actual acquisitions the model caught. F1 is the harmonic mean of precision and recall. ROC-AUC measures the model's ability to distinguish between outcomes across all thresholds.