microsoft-sentinel azure siem data-lake kql cost-optimization

From Azure Sentinel Log Analytics Workspace to Data Lake - Why Now Is the Right Time

04/04/2026 · 7 min read

If you run Microsoft Sentinel in production, you know the math. Firewall logs, DNS data, proxy traffic, network flows - they all push ingestion volume up, and costs follow. Everything lands in the Log Analytics Workspace whether you need it daily or once a quarter during a forensic investigation. With the Sentinel Data Lake Tier, Microsoft has fundamentally changed that equation. This is the architectural shift, the concrete migration path, and where the traps are.

One Tier, One Price, One Problem

Log Analytics Workspaces run on Azure Data Explorer (Kusto). Fast, powerful, excellent for real-time hunting and incident response. But every gigabyte costs the same - whether it’s critical identity telemetry or millions of DNS lookups that are 99.9% benign.

The typical cost distribution in a mid-sized SOC:

20-30% of the volume is security-critical data (endpoint, identity, threat intelligence)
70-80% of the volume is high-volume, low-value logs (firewall, proxy, DNS, netflow)

At pay-as-you-go pricing of roughly EUR 5.22/GB, that adds up to five- or six-figure annual bills fast - for data that rarely gets queried.

The Dual-Tier Model

Since late 2025, Microsoft offers two tiers:

Analytics Tier (business as usual)

Kusto-backed, full real-time performance
Analytics rules, alerting, automated incident creation
Entity behavior analytics, entity pages
90 days interactive retention included
Price: ~EUR 5.22/GB (PAYG) or up to 52% discount with commitment tiers

Data Lake Tier (the new option)

Microsoft Fabric / ADLS Gen2 as backend
Up to 12 years retention
6:1 automatic compression (billing based on compressed size)
KQL queries (interactive, async, jobs)
Notebooks and Security Graph
Price: ~EUR 0.05/GB ingestion + ~EUR 0.10/GB processing

Cost savings for data-lake-only ingestion sit north of 95% compared to the Analytics Tier.

How the Pieces Fit Together

                    Data Sources
                         |
              +----------+-----------+
              |                      |
     High-Value Data         High-Volume Data
    (Identity, Endpoint,    (Firewall, Proxy,
     Threat Intel)           DNS, Netflow)
              |                      |
              v                      v
    +-------------------+   +-------------------+
    | Analytics Tier    |   | Data Lake Tier    |
    | (Log Analytics)   |   | (Fabric/ADLS)     |
    |                   |   |                   |
    | - Real-time       |   | - Long-term       |
    |   detections      |   |   storage          |
    | - Alerting        |   | - KQL jobs        |
    | - Hunting         |   | - Notebooks       |
    | - Incidents       |   | - Forensics       |
    +--------+----------+   +-------------------+
             |
             v
    Auto-Mirroring (free)
             |
             v
    +-------------------+
    | Data Lake         |
    | (long-term        |
    |  copy)            |
    +-------------------+

One detail that matters: data in the Analytics Tier gets automatically and freely mirrored to the Data Lake. You can set Analytics retention to 90 days and use the Data Lake for long-term storage - no double ingestion costs.

Federation: Querying Data You Don’t Own

As of April 2026, Data Federation is GA. It lets you query data sitting in external Microsoft Fabric, ADLS Gen2, or Azure Databricks directly from Sentinel via KQL - without copying it into Sentinel.

You only pay for query compute time. No ingestion, no storage.

This is particularly relevant for:

Compliance data that must stay in specific storage accounts for regulatory reasons
Historical data already sitting in an existing data lake
Cross-team analytics where IT operations data needs to be correlated with security data

The Migration, Phase by Phase

Phase 1: Inventory (1-2 weeks)

Figure out which tables drive your costs. The Usage table in the workspace gives you the answer:

Usage
| where TimeGenerated > ago(30d)
| summarize TotalGB = sum(Quantity) / 1000 by DataType
| sort by TotalGB desc
| take 20

Classify your tables:

Keep in Analytics Tier: tables with active analytics rules, entity correlations, or daily hunting use
Move to Data Lake Tier: high-volume tables used primarily for compliance, forensics, or infrequent queries

Check your agent infrastructure: MMA/CLv1-based custom tables don’t get mirrored to the Data Lake. These need to be migrated to AMA/DCR-based collection first.

Phase 2: Data Lake Onboarding (1 week)

Prerequisite: Sentinel must be running in the Defender Portal (automatic for new workspaces since July 2025)
Activate the Data Lake in the Sentinel Settings blade
Define retention policies: Analytics Tier at 90 days, Data Lake up to 12 years depending on compliance requirements

Phase 3: Reroute Data (2-4 weeks)

For each identified high-volume table:

Adjust the DCR (Data Collection Rule): switch routing target to Data Lake Tier
Validate existing KQL queries - not all KQL operators work in the Data Lake. Specifically:
- ingestion_time() is not supported
- User-defined functions are not available
- External URL calls don’t work
- Interactive queries are capped at 500,000 rows
Run large historical queries as KQL jobs, not interactive queries
Set up monitoring for the transition period

Phase 4: Optimization (ongoing)

Set up federation for external data sources
Evaluate commitment tiers: for remaining Analytics Tier data, commitment tiers (100 GB/day to 50,000 GB/day) can cut costs by up to 52%
Optimize query patterns: in the Data Lake, time windows need to be tight, use project for column selection, run large queries as jobs

What Can’t Move

Not everything belongs in the Data Lake. These features require the Analytics Tier:

Feature	Reason
Analytics Rules (Scheduled + NRT)	Detections only fire on Analytics Tier data
Automated Incident Creation	Depends on analytics rules
Entity Behavior Analytics (UEBA)	Requires real-time data in the workspace
Entity Pages	Access Analytics Tier tables
Real-time Hunting	Performance requirement

The rule of thumb: anything that needs real-time detection, correlation, or alerting stays in the Analytics Tier. Anything primarily serving long-term retention, compliance, or retrospective analysis goes to the Data Lake.

KQL Across Both Tiers

Analytics Tier - standard KQL

All familiar KQL queries work without restrictions. A typical hunting query for suspicious sign-in activity:

SigninLogs
| where TimeGenerated > ago(24h)
| where ResultType != "0"
| summarize
    FailedAttempts = count(),
    DistinctIPs = dcount(IPAddress),
    IPList = make_set(IPAddress, 10)
    by UserPrincipalName
| where FailedAttempts > 20 and DistinctIPs > 3
| sort by FailedAttempts desc

Data Lake Tier - KQL with constraints

The key rules: always set explicit and narrow time windows, use project for column selection, and fall back to KQL jobs for large datasets.

Forensic analysis of firewall logs over a longer period:

// Data Lake Tier - firewall forensics
CommonSecurityLog
| where TimeGenerated between (datetime(2025-12-01) .. datetime(2026-01-01))
| where DeviceVendor == "Palo Alto Networks"
| where DestinationIP == "203.0.113.42"
| project TimeGenerated, SourceIP, DestinationPort, Activity, DeviceAction
| sort by TimeGenerated asc

KQL Jobs for large queries

Interactive queries in the Data Lake are capped at 500,000 rows. For bigger analyses, KQL search jobs run asynchronously in the background and write results to a new table:

// Search job - DNS analysis over 90 days
.create async search DnsAnalysis90d <|
DnsEvents
| where TimeGenerated > ago(90d)
| where Name has_any ("malware", "c2", "exfil")
| summarize QueryCount = count() by Name, ClientIP, bin(TimeGenerated, 1d)

The results can then be queried from the generated table - with full Analytics Tier performance.

Running the Numbers

Assuming an organization ingests 500 GB/day into Sentinel:

Scenario	Configuration	Monthly Cost (approx.)
Current state	500 GB/day Analytics Tier (PAYG)	~EUR 78,300
With commitment	500 GB/day Analytics Tier (commitment)	~EUR 37,600
Hybrid optimized	150 GB/day Analytics + 350 GB/day Data Lake	~EUR 25,200

The hybrid model saves over 67% compared to PAYG and still 33% compared to pure commitment tiers.

The Fine Print

Customer-Managed Keys (CMK)

If the workspace uses CMK, Data Lake features are currently not available. That’s a showstopper for regulated environments that mandate CMK. You’ll have to wait for Microsoft’s roadmap here.

Basic Logs

Basic Logs can’t be routed directly to the Data Lake. They must be converted to Analytics Tier first. If you’re currently using Basic Logs for cheap ingestion, plan the migration carefully.

Portal Transition

Microsoft has set March 31, 2027 as the sunset date for Sentinel management in the Azure Portal. After that, everything moves to the Defender Portal. If you’re planning the Data Lake migration, factor in the portal switch at the same time.

API Deprecation

Older API versions for Sentinel Repositories will be deprecated on June 15, 2026. Automation, infrastructure-as-code, and CI/CD pipelines need to be updated ahead of time.

What I’d Do Monday Morning

The Sentinel Data Lake Tier is the biggest change to Sentinel’s cost model since launch. For most organizations, this isn’t a question of if but when.

Start with the inventory and table classification now. Plan a hybrid model as the target architecture - not everything needs to move. Evaluate federation for existing data lake infrastructure. Model the budget impact and use the savings as the business case for the migration itself.

The technology is mature enough for production use. Acting now doesn’t just save costs - it builds the foundation for a SIEM architecture that grows with Microsoft Fabric, Security Copilot, and the Unified Security Operations Platform.

Marcel Graewer

IT Security Manager · CISSP · Author of „Die neue Realität der Cybersecurity“

Security architecture and incident response by day, open-source security tooling by night. Writing about Microsoft Sentinel, detection engineering and what AI in security actually delivers.

About →LinkedIn ↗GitHub ↗

17/06/2026

The Classic-Agent Blind Spot: Detecting Non-Human Identities in Microsoft Sentinel

Microsoft Entra Agent ID secures the agents you build tomorrow. The service principals and managed identities you already run - the classic agents - are a different story. Two KQL detections, the diagnostic switch nobody flips, and what to do this week.

23/04/2026

Wiring Microsoft Security Exposure Management Into Sentinel - Triage with Asset Criticality and Attack-Path Context

MSEM gives Sentinel something it never had: asset criticality and attack-path context per entity. Architecture, KQL patterns for incident enrichment, and the entity-matching pitfalls that quietly break them.

← Zurück zum Blog