From Azure Sentinel Log Analytics Workspace to Data Lake - Why Now Is the Right Time
2026.04.04
If you run Microsoft Sentinel in production, you know the math. Firewall logs, DNS data, proxy traffic, network flows - they all push ingestion volume up, and costs follow. Everything lands in the Log Analytics Workspace whether you need it daily or once a quarter during a forensic investigation. With the Sentinel Data Lake Tier, Microsoft has fundamentally changed that equation. This is the architectural shift, the concrete migration path, and where the traps are.
One Tier, One Price, One Problem
Log Analytics Workspaces run on Azure Data Explorer (Kusto). Fast, powerful, excellent for real-time hunting and incident response. But every gigabyte costs the same - whether it’s critical identity telemetry or millions of DNS lookups that are 99.9% benign.
The typical cost distribution in a mid-sized SOC:
- 20-30% of the volume is security-critical data (endpoint, identity, threat intelligence)
- 70-80% of the volume is high-volume, low-value logs (firewall, proxy, DNS, netflow)
At pay-as-you-go pricing of roughly EUR 5.22/GB, that adds up to five- or six-figure annual bills fast - for data that rarely gets queried.
The Dual-Tier Model
Since late 2025, Microsoft offers two tiers:
Analytics Tier (business as usual)
- Kusto-backed, full real-time performance
- Analytics rules, alerting, automated incident creation
- Entity behavior analytics, entity pages
- 90 days interactive retention included
- Price: ~EUR 5.22/GB (PAYG) or up to 52% discount with commitment tiers
Data Lake Tier (the new option)
- Microsoft Fabric / ADLS Gen2 as backend
- Up to 12 years retention
- 6:1 automatic compression (billing based on compressed size)
- KQL queries (interactive, async, jobs)
- Notebooks and Security Graph
- Price: ~EUR 0.05/GB ingestion + ~EUR 0.10/GB processing
Cost savings for data-lake-only ingestion sit north of 95% compared to the Analytics Tier.
How the Pieces Fit Together
Data Sources
|
+----------+-----------+
| |
High-Value Data High-Volume Data
(Identity, Endpoint, (Firewall, Proxy,
Threat Intel) DNS, Netflow)
| |
v v
+-------------------+ +-------------------+
| Analytics Tier | | Data Lake Tier |
| (Log Analytics) | | (Fabric/ADLS) |
| | | |
| - Real-time | | - Long-term |
| detections | | storage |
| - Alerting | | - KQL jobs |
| - Hunting | | - Notebooks |
| - Incidents | | - Forensics |
+--------+----------+ +-------------------+
|
v
Auto-Mirroring (free)
|
v
+-------------------+
| Data Lake |
| (long-term |
| copy) |
+-------------------+
One detail that matters: data in the Analytics Tier gets automatically and freely mirrored to the Data Lake. You can set Analytics retention to 90 days and use the Data Lake for long-term storage - no double ingestion costs.
Federation: Querying Data You Don’t Own
As of April 2026, Data Federation is GA. It lets you query data sitting in external Microsoft Fabric, ADLS Gen2, or Azure Databricks directly from Sentinel via KQL - without copying it into Sentinel.
You only pay for query compute time. No ingestion, no storage.
This is particularly relevant for:
- Compliance data that must stay in specific storage accounts for regulatory reasons
- Historical data already sitting in an existing data lake
- Cross-team analytics where IT operations data needs to be correlated with security data
The Migration, Phase by Phase
Phase 1: Inventory (1-2 weeks)
Figure out which tables drive your costs. The Usage table in the workspace gives you the answer:
Usage
| where TimeGenerated > ago(30d)
| summarize TotalGB = sum(Quantity) / 1000 by DataType
| sort by TotalGB desc
| take 20
Classify your tables:
- Keep in Analytics Tier: tables with active analytics rules, entity correlations, or daily hunting use
- Move to Data Lake Tier: high-volume tables used primarily for compliance, forensics, or infrequent queries
Check your agent infrastructure: MMA/CLv1-based custom tables don’t get mirrored to the Data Lake. These need to be migrated to AMA/DCR-based collection first.
Phase 2: Data Lake Onboarding (1 week)
- Prerequisite: Sentinel must be running in the Defender Portal (automatic for new workspaces since July 2025)
- Activate the Data Lake in the Sentinel Settings blade
- Define retention policies: Analytics Tier at 90 days, Data Lake up to 12 years depending on compliance requirements
Phase 3: Reroute Data (2-4 weeks)
For each identified high-volume table:
- Adjust the DCR (Data Collection Rule): switch routing target to Data Lake Tier
- Validate existing KQL queries - not all KQL operators work in the Data Lake. Specifically:
ingestion_time()is not supported- User-defined functions are not available
- External URL calls don’t work
- Interactive queries are capped at 500,000 rows
- Run large historical queries as KQL jobs, not interactive queries
- Set up monitoring for the transition period
Phase 4: Optimization (ongoing)
- Set up federation for external data sources
- Evaluate commitment tiers: for remaining Analytics Tier data, commitment tiers (100 GB/day to 50,000 GB/day) can cut costs by up to 52%
- Optimize query patterns: in the Data Lake, time windows need to be tight, use
projectfor column selection, run large queries as jobs
What Can’t Move
Not everything belongs in the Data Lake. These features require the Analytics Tier:
| Feature | Reason |
|---|---|
| Analytics Rules (Scheduled + NRT) | Detections only fire on Analytics Tier data |
| Automated Incident Creation | Depends on analytics rules |
| Entity Behavior Analytics (UEBA) | Requires real-time data in the workspace |
| Entity Pages | Access Analytics Tier tables |
| Real-time Hunting | Performance requirement |
The rule of thumb: anything that needs real-time detection, correlation, or alerting stays in the Analytics Tier. Anything primarily serving long-term retention, compliance, or retrospective analysis goes to the Data Lake.
KQL Across Both Tiers
Analytics Tier - standard KQL
All familiar KQL queries work without restrictions. A typical hunting query for suspicious sign-in activity:
SigninLogs
| where TimeGenerated > ago(24h)
| where ResultType != "0"
| summarize
FailedAttempts = count(),
DistinctIPs = dcount(IPAddress),
IPList = make_set(IPAddress, 10)
by UserPrincipalName
| where FailedAttempts > 20 and DistinctIPs > 3
| sort by FailedAttempts desc
Data Lake Tier - KQL with constraints
The key rules: always set explicit and narrow time windows, use project for column selection, and fall back to KQL jobs for large datasets.
Forensic analysis of firewall logs over a longer period:
// Data Lake Tier - firewall forensics
CommonSecurityLog
| where TimeGenerated between (datetime(2025-12-01) .. datetime(2026-01-01))
| where DeviceVendor == "Palo Alto Networks"
| where DestinationIP == "203.0.113.42"
| project TimeGenerated, SourceIP, DestinationPort, Activity, DeviceAction
| sort by TimeGenerated asc
KQL Jobs for large queries
Interactive queries in the Data Lake are capped at 500,000 rows. For bigger analyses, KQL search jobs run asynchronously in the background and write results to a new table:
// Search job - DNS analysis over 90 days
.create async search DnsAnalysis90d <|
DnsEvents
| where TimeGenerated > ago(90d)
| where Name has_any ("malware", "c2", "exfil")
| summarize QueryCount = count() by Name, ClientIP, bin(TimeGenerated, 1d)
The results can then be queried from the generated table - with full Analytics Tier performance.
Running the Numbers
Assuming an organization ingests 500 GB/day into Sentinel:
| Scenario | Configuration | Monthly Cost (approx.) |
|---|---|---|
| Current state | 500 GB/day Analytics Tier (PAYG) | ~EUR 78,300 |
| With commitment | 500 GB/day Analytics Tier (commitment) | ~EUR 37,600 |
| Hybrid optimized | 150 GB/day Analytics + 350 GB/day Data Lake | ~EUR 25,200 |
The hybrid model saves over 67% compared to PAYG and still 33% compared to pure commitment tiers.
The Fine Print
Customer-Managed Keys (CMK)
If the workspace uses CMK, Data Lake features are currently not available. That’s a showstopper for regulated environments that mandate CMK. You’ll have to wait for Microsoft’s roadmap here.
Basic Logs
Basic Logs can’t be routed directly to the Data Lake. They must be converted to Analytics Tier first. If you’re currently using Basic Logs for cheap ingestion, plan the migration carefully.
Portal Transition
Microsoft has set March 31, 2027 as the sunset date for Sentinel management in the Azure Portal. After that, everything moves to the Defender Portal. If you’re planning the Data Lake migration, factor in the portal switch at the same time.
API Deprecation
Older API versions for Sentinel Repositories will be deprecated on June 15, 2026. Automation, infrastructure-as-code, and CI/CD pipelines need to be updated ahead of time.
What I’d Do Monday Morning
The Sentinel Data Lake Tier is the biggest change to Sentinel’s cost model since launch. For most organizations, this isn’t a question of if but when.
Start with the inventory and table classification now. Plan a hybrid model as the target architecture - not everything needs to move. Evaluate federation for existing data lake infrastructure. Model the budget impact and use the savings as the business case for the migration itself.
The technology is mature enough for production use. Acting now doesn’t just save costs - it builds the foundation for a SIEM architecture that grows with Microsoft Fabric, Security Copilot, and the Unified Security Operations Platform.