All examples for Observability Cost Optimizer

Datadog Log Cost Reduction

Optimize Datadog log costs from $400/month to $140/month through exclusion filters, agent-level filtering, and retention optimization with copy-paste configurations.

Input

What you provide to the skill

Our Datadog logs cost $400/month on 200GB. How can we cut this in half?

Output

What the skill generates for you

Log Cost Reduction Strategy for Datadog (200GB, $400/month)

Current Situation Analysis

  • Log ingestion: 200GB/month
  • Cost: $400/month (~$2.00/GB blended rate)
  • Target: $200/month (100GB or less)
  • Required reduction: 50%

Based on Datadog’s 2026 pricing model:

  • Ingest cost: $0.10 per GB (collect, process, archive)
  • Index cost: $1.70 per million log events (15-day retention)
  • Your $2.00/GB blended rate indicates you’re paying both ingestion AND indexing costs

Cost Audit Findings

Based on typical Datadog usage patterns, your 200GB likely breaks down as:

Log Source Est. Volume % of Total Usually Needed?
Application logs (INFO level) 60GB 30% Partially
Web server access logs (200 OK) 70GB 35% Only errors
Debug/trace logs 30GB 15% Rarely
Health check requests 20GB 10% No
Infrastructure logs 20GB 10% Yes

Optimization Strategy (Prioritized by Impact)

Tactic 1: Exclusion Filters (40-50% reduction, 30 min setup)

Add these filters in Datadog → Logs → Configuration → Indexes:

Filter 1: Drop debug/trace logs

Query: @level:debug OR @level:trace
Action: Exclusion filter with 100% sampling
Estimated reduction: 30GB (15%)

Filter 2: Drop health check spam

Query: @http.url_details.path:/health* OR @http.url_details.path:/ping OR @http.url_details.path:/metrics
Action: Exclusion filter with 100% sampling
Estimated reduction: 20GB (10%)

Filter 3: Sample successful requests (keep 10%)

Query: @http.status_code:[200 TO 299] NOT @duration:>1000
Action: Exclusion filter with 90% sampling (keeps 10%)
Estimated reduction: 63GB (32%)

Filter 4: Drop verbose third-party library logs

Query: @logger_name:boto3 OR @logger_name:aws-sdk OR @logger_name:urllib3
Action: Exclusion filter with 95% sampling
Estimated reduction: 10GB (5%)

Total reduction from filters: 123GB → 77GB remaining (61% reduction)

Tactic 2: Agent-Level Filtering (Additional 10-15% reduction, 1 hour)

Filter logs before they reach Datadog to avoid ingestion costs entirely. Configure your Datadog Agent with log_processing_rules:

Example configuration (datadog.yaml or service config):

logs:
- type: file
path: /var/log/application/*.log
service: your-service
source: python
log_processing_rules:
# Exclude health checks completely
- type: exclude_at_match
name: exclude_healthchecks
pattern: /health|/ping|/metrics
# Exclude debug logs
- type: exclude_at_match
name: exclude_debug
pattern: "level\":\"debug\"|level\":\"trace\""
# Exclude successful requests with fast response times
- type: exclude_at_match
name: exclude_fast_success
pattern: "status\":200.*duration\":[0-9]{1,3}\\b"

Expected reduction: 10-15GB additional savings (77GB → 65-70GB)

Tactic 3: Use Archives + Rehydration (Cost optimization without data loss)

Keep all logs but only index what you need for active searching:

  1. Enable archiving to S3/GCS/Azure (included with $0.10/GB ingestion)
  2. Index only critical logs (errors, warnings, slow requests)
  3. Rehydrate on-demand when investigating issues ($0.10/GB scan)

Configuration:

  • Go to Logs → Configuration → Archives
  • Set up archive to your cloud storage (S3/GCS/Azure)
  • Use exclusion filters to prevent indexing low-value logs
  • Logs remain queryable via Live Tail and can be rehydrated when needed

Cost impact: Save on indexing costs ($1.70 per million events) while keeping full archive

Cost Projection After Optimization

Optimization Phase Log Volume Monthly Cost Savings
Current state 200GB $400 -
After Tactic 1 (exclusion filters) 77GB $154 $246 (62%)
After Tactic 2 (agent filtering) 65GB $130 $270 (68%)
After Tactic 3 (index optimization) 200GB ingested, 50GB indexed $95* $305 (76%)

*Tactic 3 calculation: (200GB × $0.10 ingest) + (50GB × $1.70/million events ≈ $75 index) = $95/month

Recommended approach: Combine Tactic 1 + Tactic 2 + Tactic 3

  • Result: $95-130/month (52-68% reduction)
  • Annual savings: $3,240-3,660

Implementation Plan

Week 1: Add Exclusion Filters (30 minutes)

  1. Go to Datadog → Logs → Configuration → Indexes
  2. For each index, click Add exclusion filter
  3. Add the 4 exclusion filters from Tactic 1 above
  4. Monitor daily log volume: Logs → Usage
  5. Verify critical logs still arriving: Check key dashboards and monitors
  6. Expected result: 200GB → 77GB

Week 2: Configure Agent-Level Filtering (1 hour)

  1. Identify your Datadog Agent configuration location
    • Docker: Update container environment or datadog.yaml
    • Kubernetes: Update ConfigMap or pod annotations
    • Host-based: Edit /etc/datadog-agent/datadog.yaml
  2. Add log_processing_rules from Tactic 2 above
  3. Restart Datadog Agent
  4. Monitor for 3-5 days to ensure no critical logs are dropped
  5. Expected result: 77GB → 65GB

Week 3: Enable Archiving + Optimize Indexing (1 hour)

  1. Set up Logs → Configuration → Archives
    • Choose your cloud storage (S3/GCS/Azure)
    • Configure bucket and credentials
  2. Review exclusion filters to maximize non-indexed but archived logs
  3. Test rehydration with a sample query
  4. Expected result: Full archive, reduced indexing costs

Monitoring Your Progress

Create a Datadog dashboard to track optimization:

Metric 1: Daily log ingestion volume

sum:datadog.estimated_usage.logs.ingested_bytes{*} by {service}

Metric 2: Estimated monthly cost

sum:datadog.estimated_usage.logs.ingested_bytes{*}.rollup(sum, 2592000) / 1e9 * 2

Metric 3: Top log sources (identify volume hogs)

Top list: datadog.estimated_usage.logs.ingested_bytes by service

Metric 4: Indexed vs ingested logs

sum:datadog.estimated_usage.logs.ingested_events{*}
sum:datadog.estimated_usage.logs.indexed_events{*}

Rollback Procedures

If you lose critical visibility:

Rollback exclusion filters:

  1. Go to Logs → Configuration → Indexes
  2. Toggle filter to disabled (don’t delete immediately)
  3. Wait 5 minutes for logs to flow
  4. Re-evaluate filter query to be more specific

Rollback agent-level filtering:

# Remove or comment out log_processing_rules
# Restart Datadog Agent
sudo systemctl restart datadog-agent

What NOT to filter out:

  • Error logs (4xx, 5xx responses)
  • Security events (auth failures, suspicious activity)
  • Business-critical events (purchases, signups, payments)
  • Slow performance indicators (requests >1s duration)
  • Exceptions and stack traces

What You Gain

  • $3,240-3,660/year savings (52-68% cost reduction)
  • Faster log queries (less indexed data = faster search)
  • Same critical visibility (errors, security, business events preserved)
  • Full log history (via archives for compliance/forensics)
  • Cleaner dashboards (less noise, easier to spot real issues)

What You Lose

  • Full request tracing (sampled 10% instead of 100% for successful requests)
  • Debug logs in production (use Live Tail when needed, or rehydrate from archives)
  • Immediate searchability of all logs (non-indexed logs require rehydration)

Alternative: Migrate to Grafana Cloud

If Datadog remains expensive after optimization, consider Grafana Cloud:

| Feature | Datadog (Optimized) | Grafana Cloud Free | Grafana Cloud Pro |
|———|———————|––––––––––|––––––––––||
| Logs (70GB/month) | $140 | $0 (50GB free) | $0.50/GB = $10 |
| Metrics | Included | $0 (free tier) | $0 (included) |
| APM traces | Separate cost | $0 (50GB free) | $0.50/GB |
| Retention | 15 days | 14 days | 30 days |
| Total monthly | $140 | $0-10 | $10-35 |

Migration effort: 1-2 days for basic setup
Annual savings: $1,560-1,680 vs optimized Datadog