Skip to content

Model Explainability

Understanding how Chain Sentinel's AI models make predictions.

Overview

Chain Sentinel uses explainable AI (XAI) techniques to show you why a token was classified as SCAM or LEGIT, not just the prediction itself.

Why Explainability Matters

Trust & Transparency

  • See which features influenced the decision
  • Understand the model's reasoning
  • Verify predictions make sense
  • Build confidence in AI decisions

Better Decision Making

  • Identify key risk factors
  • Understand token weaknesses
  • Learn what makes tokens legitimate
  • Make informed investment choices

Model Improvement

  • Detect model biases
  • Identify missing features
  • Validate model logic
  • Improve accuracy over time

SHAP Explanations

What is SHAP?

SHAP (SHapley Additive exPlanations) is a game-theoretic approach to explain machine learning predictions.

Key Concepts: - Feature Contribution: How much each feature pushed the prediction toward SCAM or LEGIT - Positive Values: Push toward LEGIT (blue bars) - Negative Values: Push toward SCAM (red bars) - Magnitude: Larger bars = stronger influence

Reading SHAP Values

Example SHAP Explanation:

Feature                    SHAP Value    Direction
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
holder_count               +0.15         → LEGIT
liquidity_usd              +0.12         → LEGIT
transaction_count          +0.08         → LEGIT
top_10_concentration       -0.23         → SCAM
creator_risk_score         -0.18         → SCAM
age_days                   +0.05         → LEGIT

Interpretation: - top_10_concentration (-0.23): High concentration of tokens in top 10 holders strongly suggests SCAM - creator_risk_score (-0.18): Creator has history of scams, pushes toward SCAM - holder_count (+0.15): Many holders is a positive sign, pushes toward LEGIT - liquidity_usd (+0.12): Good liquidity is positive, pushes toward LEGIT

SHAP Waterfall Plot

Visual representation showing how features combine to reach final prediction:

Base Value (50%)
    ↓ +15% (holder_count)
    ↓ +12% (liquidity_usd)
    ↓ +8% (transaction_count)
    ↓ +5% (age_days)
    ↓ -23% (top_10_concentration)
    ↓ -18% (creator_risk_score)
Final Prediction: 49% (SCAM)

Feature Importance

Global Feature Importance

Which features matter most across all predictions:

Rank Feature Importance Description
1 top_10_concentration 18.5% Token distribution
2 creator_risk_score 15.2% Creator reputation
3 liquidity_usd 12.8% Available liquidity
4 holder_count 11.3% Number of holders
5 transaction_count 9.7% Trading activity

Local Feature Importance

For a specific token, which features mattered most:

Example: SCAM Token

Feature                    Impact
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
top_10_concentration       ████████████ 45%
creator_risk_score         ████████ 30%
liquidity_usd              ████ 15%
holder_count               ██ 10%

Model Comparison

XGBoost vs GNN v2

Aspect XGBoost GNN v2
Accuracy 95.59% 96.06%
Explainability ✅ SHAP values ❌ Limited
Features 20+ metrics Graph structure
Speed Fast (< 1s) Slower (2-3s)
Best For Individual tokens Network analysis

When to Trust Each Model

Trust XGBoost when: - You need to understand WHY - Token has clear metrics (holders, liquidity, etc.) - You want feature-level insights - Making investment decisions

Trust GNN v2 when: - Accuracy is critical - Token is part of complex network - Creator has multiple related tokens - Detecting scam rings

Feature Descriptions

Token Metrics

holder_count - Number of unique wallet addresses holding the token - Higher = more distributed = more legitimate - Scams often have few holders (< 100)

transaction_count - Total number of transactions involving the token - Higher = more activity = more legitimate - Scams often have low activity after initial pump

liquidity_usd - Total liquidity available in DEX pools (USD) - Higher = easier to buy/sell = more legitimate - Scams often have low liquidity (< $10K)

top_10_concentration - Percentage of supply held by top 10 holders - Lower = more distributed = more legitimate - Scams often have high concentration (> 80%)

age_days - Days since token was created - Older = more established = more legitimate - Scams often rug within 48 hours

Creator Metrics

creator_risk_score - Risk score of wallet that deployed the token (0-100) - Lower = safer creator = more legitimate - Based on creator's history of scams

creator_tokens_count - Number of tokens created by this wallet - Many tokens can be good (successful dev) or bad (serial rugger) - Context matters: check scam_rate

creator_scam_rate - Percentage of creator's tokens that were scams - Lower = trustworthy creator = more legitimate - > 50% = high risk creator

Network Metrics

cluster_size - Number of wallets in same cluster as creator - Large clusters can indicate scam rings - Legitimate projects may also have large teams

cluster_scam_rate - Percentage of tokens from this cluster that were scams - High rate = dangerous cluster - Used by GNN v2 model

Practical Examples

Example 1: Clear SCAM

Token: SCAM (SCAM)

Prediction: SCAM (98% confidence)
Model: XGBoost

Top Contributing Features:
1. top_10_concentration: 95% (-0.35) → SCAM
   "Top 10 holders own 95% of supply"

2. creator_risk_score: 85 (-0.28) → SCAM
   "Creator has 87% scam rate (13/15 tokens)"

3. liquidity_usd: $500 (-0.15) → SCAM
   "Very low liquidity, hard to sell"

4. age_days: 0.5 (-0.08) → SCAM
   "Token created 12 hours ago"

Recommendation: AVOID - Multiple critical red flags

Example 2: Clear LEGIT

Token: BONK (BONK)

Prediction: LEGIT (92% confidence)
Model: XGBoost

Top Contributing Features:
1. holder_count: 125,000 (+0.25) → LEGIT
   "Large, distributed holder base"

2. liquidity_usd: $2.5M (+0.18) → LEGIT
   "Excellent liquidity"

3. transaction_count: 5M (+0.15) → LEGIT
   "High trading activity"

4. top_10_concentration: 28% (+0.12) → LEGIT
   "Fair distribution"

Recommendation: SAFE - Strong fundamentals

Example 3: Uncertain Case

Token: MOON (MOON)

Prediction: SCAM (62% confidence)
Model: XGBoost

Top Contributing Features:
1. creator_risk_score: 45 (-0.12) → SCAM
   "Creator has 1 previous scam (1/3 tokens)"

2. top_10_concentration: 55% (-0.10) → SCAM
   "Moderate concentration"

3. holder_count: 500 (+0.08) → LEGIT
   "Decent holder base"

4. liquidity_usd: $50K (+0.06) → LEGIT
   "Adequate liquidity"

Recommendation: CAUTION - Mixed signals, do more research

Limitations

SHAP Limitations

Important Considerations

  • Only for XGBoost: GNN v2 predictions don't have SHAP values
  • Correlation ≠ Causation: High correlation doesn't prove causation
  • Feature Interactions: SHAP shows individual features, not complex interactions
  • Data Quality: Explanations are only as good as the data

Model Limitations

XGBoost: - Doesn't capture network relationships - May miss coordinated scam rings - Relies on feature engineering

GNN v2: - Less explainable (black box) - Requires more data - Slower inference

Best Practices

Using Explanations

Do's

  • ✅ Read SHAP values to understand predictions
  • ✅ Look for multiple red flags, not just one
  • ✅ Consider feature magnitudes, not just direction
  • ✅ Cross-reference with network graph
  • ✅ Use explanations to learn about scam patterns

Don'ts

  • ❌ Don't rely on single feature
  • ❌ Don't ignore low-confidence predictions
  • ❌ Don't assume model is always right
  • ❌ Don't skip manual verification
  • ❌ Don't invest based solely on AI

Support

Questions about model explainability?