Ethics & Privacy

Responsible use of funding intelligence data

Data Source & Consent

This platform uses the Crunchbase Startup Investments dataset: a publicly available dataset of startup funding records. All data points (company names, funding amounts, geographic locations) were voluntarily disclosed by the companies through Crunchbase's public API or direct submissions. No private or scraped data is used.

However, "publicly available" does not mean "free of ethical obligations." Aggregated funding data can reveal competitive intelligence about markets and regions. We present this data for educational and analytical purposes, not for targeted competitive action against specific companies.

Bias & Representation

The Crunchbase dataset has significant biases that affect every analysis on this platform:

Geographic bias

53% of companies are US-based. Funding ecosystems in Africa, Southeast Asia and South America are heavily underrepresented. Our geographic maps and country comparisons reflect Crunchbase's coverage, not global reality.

Survivorship bias

Companies that never raised venture funding or closed before being tracked aren't in the dataset. The ~41K 'operating' companies are disproportionately VC-backed survivors.

Temporal bias

The dataset covers companies founded through ~2014. Post-2014 funding dynamics (crypto, AI boom, COVID effects) are not represented.

Size bias

Smaller companies and bootstrapped startups are underrepresented. Crunchbase skews toward VC-funded companies, creating a distorted picture of entrepreneurship.

Label bias

The 'acquired' = success framing ignores acqui-hires (fire-sale acquisitions), and 'operating' doesn't distinguish between thriving and zombie companies.

Predictive Model Ethics

The prediction tool estimates acquisition probability based on historical patterns. This raises several concerns:

Self-fulfilling prophecies

If investors use a tool like this, low-predicted startups may receive less funding, making the prediction true for the wrong reasons. We explicitly state that predictions are statistical correlations, not causal guarantees.

Feature encoding bias

The model encodes market and country as features. This means it can learn and perpetuate geographic and sector biases from historical data, e.g., penalizing companies from underrepresented regions.

Transparency

We show feature importances, model metrics and confusion matrices. Users can see exactly what the model weighs and where it fails. No black-box predictions.

Not financial advice

This tool is educational. It should not be used as the sole basis for investment, hiring, or business decisions.

AI Chatbot Ethics

The AI Analyst chatbot uses Google's Gemini model to answer questions about the dataset. It operates under strict constraints: it can only reference data from this platform, it cannot access external information, and it does not store conversation history. All queries are processed client-side: no data leaves the browser except to the Gemini API.

The chatbot may generate plausible-sounding but incorrect analysis. Its responses should be verified against the actual charts and tables on the platform. The system prompt constrains it to cite specific numbers, but hallucination remains a risk with any LLM.

Privacy

This platform collects no user data. There are no cookies, no analytics, no tracking scripts. The site is deployed as a static export: every interaction happens in your browser. The only external request is the chatbot calling the Gemini API, which processes queries under Google's standard API terms.

Social Impact of BI Tools

Business intelligence platforms like this one democratize access to data that was historically available only to venture capital firms with expensive Crunchbase Pro subscriptions. This is broadly positive: students, researchers and bootstrapped founders can now analyze funding patterns.

But democratization also means more people can act on the same signals, potentially increasing herd behavior in already-concentrated markets. The concentration of startup funding in a handful of sectors and geographies is a real problem, and tools that surface these patterns could reinforce rather than correct them.