Moving to Sentinel Data Lake When You Already Have Long-Term Retention
You’re about to onboard a Sentinel workspace to the new data lake. The workspace runs the usual setup: 90 days of analytics retention and several years of total retention configured as archive. Microsoft’s FAQ tells you most customers see lower long-term retention costs after the change, and that your existing data won’t move. Both statements are accurate. They also miss the decisions that determine whether the onboarding is a win for your workspace.
Microsoft has the answers in their documentation, but they’re split across three Learn pages, an FAQ, an Ignite session and a billing-and-pricing doc, in different words. Here’s the version you need before someone schedules the change ticket.
What changes for existing data when you onboard
Onboarding is a configuration change. There is no data movement, no re-ingestion, and no backfill of historical data into the lake. The Sentinel data lake FAQ states it as bluntly as you can ask for:
“If a customer has already configured tables for Archive retention, existing retention settings will not change and will be automatically inherited by the Sentinel data lake. All data, including existing data in archive retention will be billed using the data lake storage meter, benefiting from 6x data compression. However, the data itself will not move.”
Read that last sentence literally. Your existing archive blobs stay in the storage they already sit in. Onboarding does not copy them to a new location; it changes the meter that bills them and adds a forward-mirror path for data ingested from that point on.
Learn-Connectors reinforces the architectural point for forward-looking data:
“When you enable Microsoft Sentinel data lake, the mirroring is automatically enabled for all the tables from onboarding forward… Preexisting data in the tables isn’t mirrored.”
The temporal split is this:
- From onboarding forward. Analytics-tier ingestion is mirrored to the lake as a single copy, no duplicate billing. That mirrored data is reachable via the new lake-native experiences: KQL Queries in Defender, KQL Jobs, and Notebooks.
- Everything already in archive at the moment you onboard. Stays exactly where it is. Not mirrored. Not exposed to lake-native tooling. Still reachable, but only via the Search and Restore experience.
The FAQ gives a worked example:
“If a customer has 12 months of total retention enabled on a table, 2 months after enabling ingestion into the Sentinel data lake, the customer will still have access to 10 months of archived data (through Sentinel search and restore experiences), but access to only 2 months of data in the data lake (since the data lake was enabled).”
For a customer with totalRetentionInDays=2556, the entire ~6.75 years of archive remains accessible only through Search and Restore. Lake exposure starts the day you onboard and grows from there.
The pricing meter flip
Two Microsoft statements need to sit side by side.
The FAQ, on pricing:
“For most customers, this change results in lower long-term retention costs. However, customers who previously had discounted archive retention pricing will not automatically receive the same discounts on the new data lake storage meters. In these cases, customers should engage their Microsoft account team to review pricing implications before enabling the Sentinel data lake.”
Learn-Onboarding, broader:
“If your organization currently uses Microsoft Sentinel Security Information and Event Management (SIEM), the billing and pricing for features like search jobs and queries, auxiliary logs, and long-term retention (also known as ‘archive’) switch to Microsoft Sentinel data lake-based billing meters, potentially increasing your costs.”
The FAQ describes the empirical outcome (most pay less). Learn-Onboarding describes the mechanism: every Sentinel SIEM customer’s meters flip, and the flip can go either direction. Learn-Sentinel-Billing pins down what the new meter does:
“Data lake storage charges are applied per GB per month for any data that remains in the data lake tier after the analytic tier retention period ends. Charges are based on a simple and uniform data compression rate of 6:1. For example, if you retain 600 GB of raw data, it’s billed as 100 GB of compressed data.”
The 6:1 is a billing factor, not a guarantee about Microsoft’s actual on-disk compression. Your storage line is calculated as raw GB divided by six, regardless of how compressible your logs really are.
Whether you save or pay more depends on three variables: ingestion volume, prior archive pricing, and how often you query that archive. Three findings carry the most commercial weight:
- Any pre-negotiated archive discount does not carry over. Microsoft explicitly directs these customers to their account team before onboarding. That single line is the most consequential commercial item in the entire conversation.
- The broader Learn-Onboarding caveat applies even with no discount in play. Model the post-onboarding bill against the new meters during planning, not after the change window closes.
- Cheaper storage does not automatically mean a cheaper bill. KQL Jobs, lake-native scans, and any other compute that runs against older mirrored data introduce new meter line items that the legacy “archive plus Search” model did not bill the same way. A bill where storage drops and compute rises is a real outcome, especially in environments that lean heavily on long-window analytics.
CMK is a hard stop
If your organisation encrypts Sentinel data with Customer-Managed Keys, the conversation may end before it starts. Learn-Onboarding leads its Prerequisites with an Important callout:
“If your organization uses Customer-Managed Keys (CMK) for data encryption, be aware that CMK isn’t supported for data stored in the Microsoft Sentinel data lake. Sentinel workspaces applying CMK aren’t accessible via data lake experiences. Any data ingested into the data lake, such as custom tables or transformed data is encrypted using Microsoft-managed keys. Onboarding to the Microsoft Sentinel data lake may not fully align with your organization’s encryption policies or data protection standards.”
For a regulated customer carrying 6 to 12 years of archived security logs under a CMK policy, treat this as a stop-the-onboarding decision. The encryption review precedes any feature or pricing analysis.
“Just Restore the archive and let the mirror catch it” doesn’t work
The idea sounds reasonable. Restore a slab of archive, let the post-onboarding analytics-to-lake mirror catch the restored data, and you’ve backfilled the lake. It fails for three independent reasons.
Search and Restore results land in suffixed tables, not in the source. Search Jobs write to _SRCH tables. Restores write to _RST tables. The two are not interchangeable: _RST preserves the original event timestamps in TimeGenerated, while _SRCH rewrites them to the search-job execution time (detailed below). Even if those tables are themselves mirrored to the lake under the forward-mirror rule (they are new analytics-tier tables created post-onboarding), they are mirrored under their suffixed names. Any KQL query, analytic rule, hunting rule, workbook or downstream consumer that references the original table name will not see the restored data. Per Microsoft Learn:
“The name of the destination table must end with _RST.”
The cost shape is wrong. Search Jobs are charged per query. Restores are charged per restore-day with a 2 TB minimum and a 12-hour minimum duration. These are per-investigation operations, not bulk-migration operations. Restoring multi-year archive volumes through them typically costs orders of magnitude more than any storage saving the mirror could deliver.
_SRCH tables rewrite temporal semantics. On a Search Job output table, TimeGenerated holds the search-job execution time, not the original event time. The job stamps the same single timestamp onto every row it returned. The original event timestamp survives in _OriginalTimeGenerated, alongside a parallel set of _Original* columns for other source metadata.
In practice:
TEMPSEARCH2026 | where TimeGenerated > ago(30d)returns every row the job retrieved, no matter how old, because every row inherits the job’s run timestamp.- Binning by
TimeGeneratedgives you one bin containing every row. - Joining on
TimeGeneratedagainst a real source table produces meaningless cross-products. - Every analytic rule, hunting query or dashboard that filters or joins on
TimeGeneratedis broken on a_SRCHtable until rewritten to use_OriginalTimeGenerated.
Microsoft’s Restore documentation states that _RST tables preserve original timestamps in TimeGenerated, so the temporal-semantics problem only affects Search Jobs. The first two problems affect both.
Pre-existing archive reaches the lake only through these suffixed retrieval artefacts, never under its original schema. No first-party path exists today that ports historical archive content into the lake under its original table name, schema and temporal semantics.
The other things that can derail an onboarding
These come from Learn-Onboarding and Learn-Connectors. Ordered by likelihood:
Workspace attachment is region-locked and all-or-nothing. Every Defender-connected workspace in the same region as your primary Sentinel workspace onboards automatically. The portal offers no per-workspace opt-out, and offboarding requires a Microsoft support ticket. If you run multiple Sentinel workspaces in the same region (per-BU workspaces, a production workspace plus a long-term-archive workspace), onboarding is a regional event, not a per-workspace one.
The subscription role must be direct. Onboarding accepts Subscription Owner or Contributor, but the role must be assigned directly on the subscription. Management-group-inherited Owner is not sufficient. Plan the PIM activation and change control before the window.
MMA and Log Analytics Agent custom tables aren’t mirrored. Tables created via the older agents stay in analytics-only mode after onboarding. Tables created via AMA, DCR or the Logs Ingestion API are mirrored. If part of your multi-year retention strategy sits on MMA-era custom tables, migrating those tables to AMA or DCR is a prerequisite to bringing them into the lake’s KQL Jobs and Notebooks experiences.
Azure Policy can block deployment. Restrictive policy assignments may prevent the onboarding resources from being created. The resource type to exempt is Microsoft.SentinelPlatformServices/sentinelplatformservices, scoped to the resource group. Pre-stage the exemption to avoid an aborted run.
M365 data residency consent is implicit in onboarding. If your Microsoft 365 data is not in the same region as your data lake, the act of onboarding consents to ingestion into the lake’s region. For regulated workloads, this should be a deliberate decision made by data-protection stakeholders, not a downstream surprise.
Auxiliary log tables disappear from Defender Advanced Hunting. Post-onboarding, auxiliary logs are accessed via Data Lake Exploration KQL queries from the Defender portal. Anything currently reaching auxiliary tables from Advanced Hunting (analyst queries, custom detections, dashboards) breaks on the day you flip the switch.
Offboarding is a support ticket. No self-service path exists for undoing onboarding. Treat the decision as one-way for change-management purposes.
A short pre-onboarding checklist
Before you raise the change request:
- Confirm whether the workspace uses CMK. If yes, the conversation pauses here while data-protection reviews whether Microsoft-managed keys are acceptable for the lake.
- Pull current archive cost data and any pre-negotiated discounts. Engage your Microsoft account team before onboarding if any discount is in play.
- Inventory every workspace in the same Azure region as your primary Sentinel workspace. They onboard together.
- Audit custom tables. Any MMA or CLV1-sourced tables that you want in lake-native experiences need re-platforming onto AMA or DCR first.
- Check Azure Policy assignments for anything that would block
Microsoft.SentinelPlatformServices/sentinelplatformservices. - Confirm the subscription Owner or Contributor role is assigned directly, not inherited from a management group.
- Identify every analyst workflow, custom detection or dashboard that touches auxiliary log tables in Defender Advanced Hunting. Plan the migration to Data Lake Exploration KQL queries.
- Set realistic expectations. Existing archive stays accessible via Search and Restore. The new lake-native experiences cover data ingested from onboarding forward. No first-party migration path closes that gap today.
Microsoft’s “free upgrade” framing covers the architecture honestly. It just stops at the boundary where your workspace’s history starts mattering. Treat the meter flip as architectural, model the bill against the new compression-based meter, and weigh the gain in lake-native tooling on forward-looking data against the friction of a one-way change for a workspace that already carries years of archived logs.
Primary sources
- Microsoft Sentinel data lake FAQ (Tech Community)
- Onboarding to Microsoft Sentinel data lake and graph (Microsoft Learn)
- Set up connectors for the Microsoft Sentinel data lake (Microsoft Learn)
- Plan costs and understand pricing and billing for Microsoft Sentinel (Microsoft Learn)
- Restore logs in Azure Monitor (Microsoft Learn)
- Restore archived logs from search in Microsoft Sentinel (Microsoft Learn)
- Microsoft Sentinel data lake overview (Microsoft Learn)