All examples for Observability Cost Optimizer

New Relic Full Stack Cost Optimization

Reduce New Relic costs from $1200/month to $600-650/month through log filtering, APM right-sizing, metrics cardinality fixes, and user seat optimization with migration alternatives.

Input

What you provide to the skill

We’re spending $1200/month on New Relic (APM, logs, infrastructure). 15 hosts, 8-person team. Can we reduce this without losing critical visibility?

Output

What the skill generates for you

New Relic Cost Optimization ($1200/month, 15 hosts, 8-person team)

Current Situation Analysis

  • Monthly cost: $1,200/month ($14,400/year)
  • Team size: 8 people
  • Infrastructure: 15 hosts
  • Services: APM, logs, infrastructure monitoring
  • Cost per host: ~$80/host/month (well above typical infrastructure monitoring costs)

Based on New Relic’s usage-based pricing model ($0.30/GB for data ingest + user seat costs), your estimated breakdown is likely:

Component Est. Monthly Cost % of Total Est. Volume/Users
Log ingestion $400-500 33-42% ~1,500 GB/month at $0.30/GB
APM (traces/spans) $400-500 33-42% 15 hosts with full APM enabled
Infrastructure monitoring $150-200 13-17% 15 hosts + custom metrics
User seats $100-200 8-17% 2-4 full platform users at $99-549/user
Total $1,200 100% -

Phase 1: Immediate Quick Wins (Week 1, 3-4 hours effort)

Tactic 1: Log Filtering and Sampling (40-60% log cost reduction)

Add drop filters in New Relic to eliminate low-value logs:

# In New Relic: Logs → Data management → Parsing
# Filter 1: Drop debug/trace logs (save ~30-40% log volume)
WHERE level IN ('DEBUG', 'TRACE')
ACTION: Drop
# Filter 2: Drop health check noise (save ~10-15% log volume)
WHERE request.uri IN ('/health', '/healthz', '/ping', '/metrics', '/ready', '/live')
ACTION: Drop
# Filter 3: Sample successful requests (keep 5%, save ~20-25% log volume)
WHERE http.statusCode >= 200 AND http.statusCode < 300 AND duration < 1000
ACTION: Sample at 5%
# Filter 4: Drop verbose cloud provider SDK logs (save ~5-10% log volume)
WHERE logger.name LIKE '%boto3%' OR logger.name LIKE '%aws-sdk%' OR logger.name LIKE '%azure-sdk%'
ACTION: Drop or Sample at 10%

Expected log reduction: 1,500 GB → 600-700 GB (60% reduction)
Estimated savings: $240-270/month

Tactic 2: Right-Size APM Coverage (30-50% APM cost reduction)

Audit which hosts actually need full APM:

# In New Relic: APM & Services → Service Map
# Identify and disable APM on:
1. Development environments (save ~$50-80/month)
2. Staging/test environments (save ~$50-80/month)
3. Internal tools/admin services (save ~$30-50/month)
4. Database replicas (monitor primary only, save ~$20-40/month)
# Keep full APM only on:
- Production application servers
- Critical API services
- Customer-facing web servers

Implementation:

# Remove APM agent from non-production hosts
# For each dev/staging host:
sudo systemctl stop newrelic-infra
sudo systemctl disable newrelic-infra
# Or set in environment:
NEW_RELIC_ENABLED=false # For dev/staging

Expected APM reduction: 15 hosts → 7-9 production hosts
Estimated savings: $150-200/month

Tactic 3: Reduce High-Cardinality Metrics (15-30% metrics cost reduction)

Identify and fix expensive custom metrics:

# In New Relic: Metrics explorer → Sort by cardinality
# Common high-cardinality culprits:
BAD: http.requests{user_id:*, session_id:*, request_id:*}
GOOD: http.requests{endpoint:/api/users, method:GET, status:200}
BAD: cache.operations{key:*} # Millions of unique keys
GOOD: cache.operations{operation:get, cache_name:redis-main}
BAD: background.job{job_id:*}
GOOD: background.job{job_type:email_worker, queue:default}

Code fix example (Python):

# Before: Creates millions of unique metric combinations
newrelic.agent.record_custom_metric(
f'Custom/User/{user_id}/requests', 1
)
# After: Aggregate by user tier instead
newrelic.agent.record_custom_metric(
f'Custom/UserTier/{user.tier}/requests', 1
)

Expected savings: $50-80/month

Phase 2: Application-Level Changes (Week 2, 4-6 hours effort)

Tactic 4: Reduce Log Verbosity at Source (20-40% additional log reduction)

Update application logging configuration:

Environment variables (fastest approach):

# Production
LOG_LEVEL=WARN # Instead of INFO or DEBUG
LOG_SAMPLE_RATE=0.05 # Sample INFO logs at 5%
NEW_RELIC_LOG_LEVEL=info # Reduce agent verbosity
# Staging
LOG_LEVEL=INFO
LOG_SAMPLE_RATE=0.2
# Development
LOG_LEVEL=DEBUG
LOG_SAMPLE_RATE=1.0

Tactic 5: Optimize User Seat Allocation

Review your user assignments in New Relic:

# In New Relic: Account → User management
# Audit current allocation:
- Full Platform Users: 2-4 users at $99-549/user
- Core Users: Typically $49/user
- Basic Users: Free (view-only)
# Optimization:
- Keep only 1-2 Full Platform Users (senior engineers who configure monitoring)
- Downgrade to Core Users for most developers (can view APM, create basic queries)
- Use Basic Users for PMs, support staff, managers (dashboard viewing only)

Expected savings: $50-200/month (depending on current allocation)

Cost Projection After Optimization

Optimization Phase Est. Monthly Cost Savings Cumulative Savings
Current state $1,200 - -
After Phase 1 (Week 1) $750-800 $400-450 33-38%
After Phase 2 (Week 2) $600-650 $150-200 46-50%

Optimized annual cost: $7,200-7,800 (down from $14,400)
Total annual savings: $6,600-7,200 (46-50% reduction)

Implementation Checklist

Week 1: Quick Wins (3-4 hours)

  • Add 4 log drop filters in New Relic Data management
  • Audit APM hosts, disable on dev/staging (save $150-200)
  • Identify top 5 high-cardinality metrics
  • Verify changes via New Relic Usage dashboard
  • Expected result: $1,200 → $750-800/month

Week 2: Deeper Changes (4-6 hours)

  • Update LOG_LEVEL=WARN in production
  • Deploy metric cardinality fixes
  • Optimize user seat allocation
  • Monitor for 5-7 days
  • Expected result: $750-800 → $600-650/month

Week 3: Validation (1 hour)

  • Check key dashboards for missing data
  • Verify alerts still firing correctly
  • Review New Relic bill estimate
  • Document changes for team

Rollback Procedures

If you lose critical visibility:

  1. Re-enable logs:

    • Disable drop filter in Data management
    • Restore LOG_LEVEL=INFO temporarily
  2. Re-enable APM:

    • Reinstall New Relic infrastructure agent on host
    • Add service to APM configuration
  3. Restore metrics:

    • Revert metric tag changes
    • Redeploy application

What You Gain

  • $6,600-7,200/year savings (46-50% cost reduction)
  • Maintained visibility for critical production issues
  • Faster log queries (less indexed data)
  • Cleaner metrics (lower cardinality = better query performance)

What You Lose

  • Full dev/staging APM (use production APM for troubleshooting)
  • Verbose debug logs (use Live Tail when needed)
  • Per-user metrics (aggregate by cohort/feature instead)

Alternative: Migrate to Grafana Cloud

If $600-650/month still feels expensive for an 8-person team:

Grafana Cloud Free Tier includes:

  • 10K metrics series (vs. limited free tier)
  • 50GB logs/month
  • 50GB traces/month
  • 3 users
  • 14-day retention

For your scale:

  • Likely $0/month on free tier for 12+ months
  • Even paid tier: ~$50-100/month (vs. $600-650 optimized New Relic)

Migration effort: 1-2 days
Annual savings: $5,400-7,200 vs. optimized New Relic