Sentinel filter and split transformations, and when to keep writing custom DCRs

For a long time, the answer to “how do I drop noisy events at ingestion in Sentinel” was the same: write a custom Data Collection Rule transformation, author the KQL, deploy the DCR, and maintain it as a JSON artefact. That is still the right answer for parsing, masking, schema shaping, and enrichment. It is now no longer the optimal route for simple “keep this, discard that, route the rest” decisions, because Microsoft has shipped two productised features that handle the common cases from the Defender portal’s Sentinel table UI: filter transformations and split transformations.

The two features are both DCR transformations under the hood. The reason to know they exist as separate features is that Microsoft has built explicit cost and behavioural guarantees around them that do not apply to a hand-rolled DCR.

What filter and split do

Both features live on the table itself in the Defender portal under Microsoft Sentinel > Configuration > Tables, and both are supported on any table that supports DCRs. Split has the additional requirement that the table must support Analytics-only ingestion, data-lake-only ingestion, and DCRs.

Filter discards data during ingestion. You author a KQL expression that evaluates to true for the data you want to throw away. From Microsoft’s how-to:

“Filters filter data out. Data matching the filter condition is discarded and isn’t ingested to either Analytics or Data lake tiers.”

Multiple filter conditions on the same table combine with logical OR. Any record matching any condition is dropped.

Split routes records between Analytics and the data lake. You author a KQL expression that defines which records land in Analytics. Anything that does not match goes to the data lake tier only. Data sent to Analytics is also mirrored to the lake, so split is not destructive: the only thing it changes for non-matching records is which tier they live in.

The lake-only portion of the split lands in a separate table:

“The split data ingested into the Data lake tier goes into a separate table with the same name as the original table but with an ‘_SPLT’ suffix. For example, if you apply a split rule to the ‘FirewallLogs’ table, the data routed to the Data lake tier is ingested into a separate ‘FirewallLogs_SPLT’ table.”

Every analytic rule, workbook, hunting query, or downstream consumer that references the original table name keeps seeing only the Analytics portion of the split. The lake-only portion is reachable, but only via the _SPLT table name. Audit your saved queries before you turn on a split rule on a table that anything downstream depends on.

Transformation cost on a Sentinel workspace

The generic Azure Monitor transformations doc states a 50% threshold rule:

“If a transformation reduces the ingested data by more than 50%, you’re charged for the amount of filtered data above 50%.”

The worked example uses 20 GB of incoming data. Drop 12 GB, get billed for 2 GB of processing (the volume dropped above the halfway point). Drop 8 GB, get billed for 0 GB.

That rule applies to Auxiliary, Analytics, and Basic Logs in standalone Azure Monitor. It does not apply to Sentinel-enabled workspaces. Microsoft’s note on that same page is explicit:

“If Microsoft Sentinel is enabled for the Log Analytics workspace, there’s no cost for transformation to Analytics tables regardless of how much data the transformation filters.”

For a Sentinel workspace, an aggressive filter rule that throws away 95% of inbound data on the Analytics path costs zero in transformation charges. You pay for what lands in Analytics. The “what if we get billed for the raw inbound volume after filtering it out” worry is a real concern for plain Azure Monitor customers and a misread for Sentinel customers.

The cost concern has shifted to the data lake side. Microsoft’s billing doc describes two meters that fire when data lands in a table set to data lake tier only:

“Data lake ingestion is charged per GB for all data ingested into tables with retention set to data lake tier only. Data lake ingestion charges don’t apply when data is ingested into tables with retention set to include both analytic and data lake tiers.”

“Data processing is charged per GB for data ingested into tables with retention set to data lake tier only. It supports transformations like redaction, splitting, filtering, and normalization.”

The _SPLT table produced by a split rule is the lake-only side of the split, so both meters apply to records that land there. Analytics + lake (the matching side of the split) does not attract either. The transformation work itself is covered by the data processing meter on the lake-only path.

When to use each

Use a filter transformation when you know the data has no future or security value. Detection, hunting, compliance, and investigation all skip past it. A short KQL where clause inside the Defender portal is easier to manage than a custom DCR for that pattern, and the multiple-conditions-OR semantics make it natural for “drop anything that looks like A, B, or C.” Watch ambiguous text matches, especially in syslog, that may catch security events you wanted to keep.

Use a split transformation when the source mixes high-value and low-value events and you want both. Security-relevant events stay hot in Analytics and detections run against them as normal. Operational chatter and verbose health data flow to lake-only retention where the storage meter is cheap and the data is still searchable via Data Lake Exploration KQL queries in Defender. The classic example is VMware ESXi and vCenter syslog: authentication events, privilege changes, and configuration changes go to Analytics, routine daemon noise goes to _SPLT.

Use a custom DCR transformation when the operation is anything more than a yes/no routing decision. Parsing raw syslog into typed fields, normalising vendor-specific schemas to ASIM, projecting away sensitive fields, masking values before they hit storage, converting types, or applying conditional logic that does not fit one KQL where clause. Filter and split are not designed for any of that, and forcing them into that shape is harder than just writing the DCR.

Use upstream filtering when the source can do the work before Sentinel sees the data. Syslog rsyslog/syslog-ng filters, WEF subscription filters, Event Hub processing, Logstash, agent-side filters. Discarded data that never reaches Sentinel is the only category that incurs no meter at all, and it is the easiest to reason about.

Conflicts with existing custom DCRs

Microsoft’s how-to flags this in an Important callout for tables that already have a custom DCR applied:

“Transformations you create in Microsoft Sentinel may conflict with transformations created in Azure Monitor by using DCRs. For example, if a DCR is already applied to a table where all but a certain region is filtered in and a filter is applied that filters out only that region, no data is ingested.”

The two layers compose. They do not override each other. Drop Region A in one layer and keep only Region A in the other and the table goes silent. There is no portal warning at the time you save the rule, and the propagation delay below means it can take an hour before the absence shows up in your log tables. Document the DCR transformation logic, document the proposed Sentinel rule, and run sample events through before relying on the result.

Three more gotchas

Up to one hour propagation. Microsoft documents that filter and split rules can take up to an hour to take effect. If you save a rule and immediately query the table for changed behaviour, the table has not caught up yet. Wait the hour before concluding the rule did not work.

XDR tables go quiet in Advanced Hunting for 30 days. This one is unusual and easy to misread as a bug. From the limitations:

“Split and filter transformations applied to XDR tables don’t appear in Advanced Hunting for the first 30 days of data. The transformations are applied, and once data ages beyond the first 30 days, it behaves normally in Advanced Hunting. Data queried from Log Analytics or Microsoft Sentinel reflects the cost savings immediately.”

The cost savings hit straight away. The visibility in Advanced Hunting only catches up after the data ages beyond the first 30 days. If you transform an XDR table and an analyst tells you the day after that the rule is broken because Advanced Hunting still shows the dropped data, this is why.

Threshold enforcement on the data lake side is not real time. If you are using the new Cost management experience in Defender to set policies on data lake meters, enforcement is supported for Data Lake Query and Advanced Data Insights, and Microsoft notes that it “can take up to 4 hours for the enforced threshold to take effect” after a limit is exceeded. That makes the policy a useful brake against month-on-month overspend and a poor circuit breaker against a runaway notebook job in the next five minutes.

What to reach for first

For a Sentinel workspace built today, the answer to “how do I drop noisy events at ingestion” is filter. The answer to “how do I keep some events hot and send the rest to cheap retention” is split. The Analytics-tier transformation cost worry from the generic Azure Monitor doc does not apply on Sentinel workspaces, and both features are configured in the Sentinel table UI rather than as a custom DCR you have to version and maintain.

Custom DCRs remain the right tool for parsing, normalisation, masking, schema work, and anything else that does not fit one KQL expression.

Primary sources

Transform data using filter and split in Microsoft Sentinel (Microsoft Learn)
Custom data ingestion and transformation in Microsoft Sentinel (Microsoft Learn)
Transformations in Azure Monitor (Microsoft Learn)
Plan costs and understand Microsoft Sentinel pricing and billing (Microsoft Learn)
Manage and monitor costs for Microsoft Sentinel (Microsoft Learn)