Model Explainability¶

Understanding how Chain Sentinel's AI models make predictions.

Overview¶

Chain Sentinel uses explainable AI (XAI) techniques to show you why a token was classified as SCAM or LEGIT, not just the prediction itself.

Why Explainability Matters¶

Trust & Transparency¶

See which features influenced the decision
Understand the model's reasoning
Verify predictions make sense
Build confidence in AI decisions

Better Decision Making¶

Identify key risk factors
Understand token weaknesses
Learn what makes tokens legitimate
Make informed investment choices

Model Improvement¶

Detect model biases
Identify missing features
Validate model logic
Improve accuracy over time

SHAP Explanations¶

What is SHAP?¶

SHAP (SHapley Additive exPlanations) is a game-theoretic approach to explain machine learning predictions.

Key Concepts: - Feature Contribution: How much each feature pushed the prediction toward SCAM or LEGIT - Positive Values: Push toward LEGIT (blue bars) - Negative Values: Push toward SCAM (red bars) - Magnitude: Larger bars = stronger influence

Reading SHAP Values¶

Example SHAP Explanation:

Feature                    SHAP Value    Direction
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
holder_count               +0.15         → LEGIT
liquidity_usd              +0.12         → LEGIT
transaction_count          +0.08         → LEGIT
top_10_concentration       -0.23         → SCAM
creator_risk_score         -0.18         → SCAM
age_days                   +0.05         → LEGIT

Interpretation: - top_10_concentration (-0.23): High concentration of tokens in top 10 holders strongly suggests SCAM - creator_risk_score (-0.18): Creator has history of scams, pushes toward SCAM - holder_count (+0.15): Many holders is a positive sign, pushes toward LEGIT - liquidity_usd (+0.12): Good liquidity is positive, pushes toward LEGIT

SHAP Waterfall Plot¶

Visual representation showing how features combine to reach final prediction:

Base Value (50%)
    ↓ +15% (holder_count)
    ↓ +12% (liquidity_usd)
    ↓ +8% (transaction_count)
    ↓ +5% (age_days)
    ↓ -23% (top_10_concentration)
    ↓ -18% (creator_risk_score)
    ↓
Final Prediction: 49% (SCAM)

Feature Importance¶

Global Feature Importance¶

Which features matter most across all predictions:

Rank	Feature	Importance	Description
1	top_10_concentration	18.5%	Token distribution
2	creator_risk_score	15.2%	Creator reputation
3	liquidity_usd	12.8%	Available liquidity
4	holder_count	11.3%	Number of holders
5	transaction_count	9.7%	Trading activity

Local Feature Importance¶

For a specific token, which features mattered most:

Example: SCAM Token

Feature                    Impact
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
top_10_concentration       ████████████ 45%
creator_risk_score         ████████ 30%
liquidity_usd              ████ 15%
holder_count               ██ 10%

Model Comparison¶

XGBoost vs GNN v2¶

Aspect	XGBoost	GNN v2
Accuracy	95.59%	96.06%
Explainability	✅ SHAP values	❌ Limited
Features	20+ metrics	Graph structure
Speed	Fast (< 1s)	Slower (2-3s)
Best For	Individual tokens	Network analysis

When to Trust Each Model¶

Trust XGBoost when: - You need to understand WHY - Token has clear metrics (holders, liquidity, etc.) - You want feature-level insights - Making investment decisions

Trust GNN v2 when: - Accuracy is critical - Token is part of complex network - Creator has multiple related tokens - Detecting scam rings

Feature Descriptions¶

Token Metrics¶

holder_count - Number of unique wallet addresses holding the token - Higher = more distributed = more legitimate - Scams often have few holders (< 100)

transaction_count - Total number of transactions involving the token - Higher = more activity = more legitimate - Scams often have low activity after initial pump

liquidity_usd - Total liquidity available in DEX pools (USD) - Higher = easier to buy/sell = more legitimate - Scams often have low liquidity (< $10K)

top_10_concentration - Percentage of supply held by top 10 holders - Lower = more distributed = more legitimate - Scams often have high concentration (> 80%)

age_days - Days since token was created - Older = more established = more legitimate - Scams often rug within 48 hours

Creator Metrics¶

creator_risk_score - Risk score of wallet that deployed the token (0-100) - Lower = safer creator = more legitimate - Based on creator's history of scams

creator_tokens_count - Number of tokens created by this wallet - Many tokens can be good (successful dev) or bad (serial rugger) - Context matters: check scam_rate

creator_scam_rate - Percentage of creator's tokens that were scams - Lower = trustworthy creator = more legitimate - > 50% = high risk creator

Network Metrics¶

cluster_size - Number of wallets in same cluster as creator - Large clusters can indicate scam rings - Legitimate projects may also have large teams

cluster_scam_rate - Percentage of tokens from this cluster that were scams - High rate = dangerous cluster - Used by GNN v2 model

Practical Examples¶

Example 1: Clear SCAM¶

Token: SCAM (SCAM)

Prediction: SCAM (98% confidence)
Model: XGBoost

Top Contributing Features:
1. top_10_concentration: 95% (-0.35) → SCAM
   "Top 10 holders own 95% of supply"

2. creator_risk_score: 85 (-0.28) → SCAM
   "Creator has 87% scam rate (13/15 tokens)"

3. liquidity_usd: $500 (-0.15) → SCAM
   "Very low liquidity, hard to sell"

4. age_days: 0.5 (-0.08) → SCAM
   "Token created 12 hours ago"

Recommendation: AVOID - Multiple critical red flags

Example 2: Clear LEGIT¶

Token: BONK (BONK)

Prediction: LEGIT (92% confidence)
Model: XGBoost

Top Contributing Features:
1. holder_count: 125,000 (+0.25) → LEGIT
   "Large, distributed holder base"

2. liquidity_usd: $2.5M (+0.18) → LEGIT
   "Excellent liquidity"

3. transaction_count: 5M (+0.15) → LEGIT
   "High trading activity"

4. top_10_concentration: 28% (+0.12) → LEGIT
   "Fair distribution"

Recommendation: SAFE - Strong fundamentals

Example 3: Uncertain Case¶

Token: MOON (MOON)

Prediction: SCAM (62% confidence)
Model: XGBoost

Top Contributing Features:
1. creator_risk_score: 45 (-0.12) → SCAM
   "Creator has 1 previous scam (1/3 tokens)"

2. top_10_concentration: 55% (-0.10) → SCAM
   "Moderate concentration"

3. holder_count: 500 (+0.08) → LEGIT
   "Decent holder base"

4. liquidity_usd: $50K (+0.06) → LEGIT
   "Adequate liquidity"

Recommendation: CAUTION - Mixed signals, do more research

Limitations¶

SHAP Limitations¶

Important Considerations

Only for XGBoost: GNN v2 predictions don't have SHAP values
Correlation ≠ Causation: High correlation doesn't prove causation
Feature Interactions: SHAP shows individual features, not complex interactions
Data Quality: Explanations are only as good as the data

Model Limitations¶

XGBoost: - Doesn't capture network relationships - May miss coordinated scam rings - Relies on feature engineering

GNN v2: - Less explainable (black box) - Requires more data - Slower inference

Best Practices¶

Using Explanations¶

Do's

✅ Read SHAP values to understand predictions
✅ Look for multiple red flags, not just one
✅ Consider feature magnitudes, not just direction
✅ Cross-reference with network graph
✅ Use explanations to learn about scam patterns

Don'ts

❌ Don't rely on single feature
❌ Don't ignore low-confidence predictions
❌ Don't assume model is always right
❌ Don't skip manual verification
❌ Don't invest based solely on AI

Support¶

Questions about model explainability?

📧 Email: support@chainsentinel.net
💬 Telegram: @chainsentinel_net
📖 FAQ: Frequently Asked Questions