Cost-Effective Long-Term Storage of Security Logs with Microsoft Sentinel Data Lake

Recently, Microsoft announced the public preview of Microsoft Sentinel Data Lake[1]. This data lake makes it easier and more cost-effective to store data for the long term. In this article, we will delve into this data lake feature.

Note: This article is based on information available as of August 13, 2025. This feature is currently in public preview. The author is not responsible for any configuration errors or data loss in your environment. It is recommended to start with an evaluation in a small-scale test environment.

The Security Log Management Dilemma

It is often said that the global volume of data is exploding year after year. Consequently, security logs are also increasing rapidly, forcing many security teams to make difficult choices. Storing all logs in a perpetually analyzable state (the Analytics Tier in Sentinel) incurs enormous costs. Therefore, many companies likely manage by either narrowing the scope of log collection, creating blind spots, or shortening the retention period, sacrificing traceability and auditability.

Alternatively, some companies store logs separately in Syslog servers or Blob storage without ingesting them into a SIEM. However, this "you get what you pay for" approach is inefficient, as it requires ingesting the data into an analysis platform whenever an investigation is needed, and it's unsuitable for routine monitoring and alert detection. While large enterprises with deep pockets might solve this by storing all logs in the analytics tier, this is not sustainable and certainly not cost-effective.

Microsoft Sentinel Data Lake

Amidst this situation, the newly introduced Microsoft Sentinel Data Lake allows for cheaper and easier long-term data storage. In addition, although there are limitations compared to Sentinel's standard "Analytics Tier," it supports KQL queries, making it an optimal platform for when you need to retrieve "something" from long-term stored logs during an incident.

Comparison of Analytics Tier and Data Lake Tier

For the Analytics Tier, the Data Lake Tier, and their respective key features, the table summarized by Microsoft was very easy to understand, so I have quoted it here.[2] This is not me being lazy.

Feature Analytics Tier Data Lake Tier
Key characteristics High-performance querying and indexing of logs (also known as hot or interactive retention) Cost-effective long-term retention of large data volumes (also known as cold storage)
Best for Real-time analytics rules, alerts, hunting, workbooks, and all Microsoft Sentinel features Compliance and regulatory logging, historical trend analysis and forensics, less-frequently touched data that doesn't require real-time alerts
Ingestion cost Standard Minimal
Query pricing Included ✅ Billed separately ❌
Optimized query performance Slower queries ❌ Suitable for audits, but not optimized for real-time analysis
Query capabilities Full query capabilities in Microsoft Defender and Azure portal, and API usage Full KQL on a single table (can be enriched with data in analytics tables using lookups), run scheduled KQL or Spark jobs, use notebooks
Real-time analytics capabilities Full set ✅ Limited ❌ Restrictions on some features like analytics rules, hunting queries, parsers, watchlists, workbooks, and playbooks
Search jobs
Summary rules
KQL Full functionality Limited to a single table
Restore
Data export
Retention period 90 days for Microsoft Sentinel, 30 days for Microsoft Defender XDR. Can be extended up to 2 years with a prorated monthly long-term retention fee Same as analytics retention by default. Can be extended up to 12 years

Note: I'm still investigating what "Restore" exactly refers to. If it means bringing data into the analytics tier, that can be done from the Data Lake Tier. Perhaps it refers to whether it can be used as a destination for data restoration?

By the way, while a direct cost comparison isn't straightforward, ingesting data into the Analytics Tier with Sentinel's Pay-as-you-go plan costs $4.3 USD per GB, whereas the Data Lake costs $0.05 per GB for ingestion and $0.026 per GB/month for storage.[3]

Note: Please note that the capabilities of the Analytics Tier and the Data Lake are different, so it's not an apples-to-apples comparison. For example, running KQL queries against the Data Lake incurs additional costs. Microsoft's official information mentions it's less than 10% of the traditional cost, which can be a useful guideline. However, please estimate the actual costs based on your own use cases.

Trying Out Microsoft Sentinel Data Lake

Prerequisites

To onboard to the Microsoft Sentinel Data Lake public preview, you must meet the following prerequisites:

  • Microsoft Defender and Microsoft Sentinel must be integrated and available in Defender XDR.
  • You need an existing Azure subscription and resource group for data lake billing, and you must have owner permissions on the subscription.
  • The Microsoft Sentinel primary workspace must be connected to the Microsoft Defender portal.
  • You need read permissions to the primary and other workspaces that you want to attach to the data lake.
  • The Microsoft Sentinel primary workspace and other workspaces must be in the same region as your tenant's home region (a public preview constraint).

Note: As a public preview constraint, the Sentinel primary workspace must be in the same geographical region as the tenant (Entra ID). The "Entra ID" part is key. Many Japanese users might have their Defender logs stored in the US, but their Entra ID is likely in Japan (how confusing...). Therefore, please configure and test your Sentinel workspace in the East Japan region. Also, please be aware that not all regions are supported during the public preview.

Onboarding (Initial Setup)[4]

The steps to onboard your tenant to the Microsoft Sentinel Data Lake are simple. As a prerequisite, connect Sentinel to the Defender portal via the SIEM workspace feature. I wrote a blog about this at my previous job, which you can refer to here:

https://blog.cloudnative.co.jp/24112/

Next, navigate to the data lake settings page in the Defender XDR portal ( https://security.microsoft.com ) under [System] > [Settings] > [Microsoft Sentinel] > [Data lake]. Once all prerequisites are met, a connect button will appear. Click the "Start setup" button to launch the configuration screen. After entering all the information, click "Set up data lake." It can take up to 60 minutes for the data lake to be fully created and linked to your Defender tenant.

Setup start screen for Microsoft Sentinel Data Lake in the Defender XDR portal.

If the process is ongoing, you will see a message "Lake setup in progress". After a while, the data lake setup will be complete.

Data lake setup in progress message, followed by the completion screen.

By the way, once the setup is complete, a new data lake exploration view will appear in Defender XDR. At that time, a workspace named "default" is created. The "Default" workspace that appears in the workspace selector for KQL queries is created by Microsoft Sentinel Data Lake during onboarding.

Long-term Log Storage in the Data Lake Tier

Data connectors that ingest logs into Microsoft Sentinel are configured by default to send data to both the analytics tier and the long-term storage data lake tier. Once a Sentinel data connector is enabled, data is pushed to the analytics tier and automatically mirrored to the data lake tier. Mirroring data to the data lake with the same retention period as the analytics tier does not incur additional billing charges.[5]

After setting up the data lake, additional storage costs for the data lake are only incurred if the retention period is extended, as shown in the image below. Settings can be configured per table in the [Defender XDR portal] > [Microsoft Sentinel] > [Configuration] > [Tables].

Table configuration screen in Defender XDR portal showing retention period settings for both Analytics and Data Lake tiers.

Ingesting Data Directly into the Data Lake Tier

Alternatively, it is possible to ingest data only into the data lake tier. Raw logs from firewalls or Entra ID's `AADNonInteractiveUserSignInLogs` can be very costly if ingested directly into the analytics tier. With the new data lake, you can stream data directly to the data lake, skipping Sentinel's analytics tier. This means data is only ingested into the data lake tier, ingestion into the analytics tier is stopped, and the data is stored only in the data lake.

A diagram showing data flowing directly to the Data Lake tier, bypassing the Analytics tier.

KQL Queries

A key feature of the data lake tier is that you're not just storing data; you can also investigate logs by running KQL queries. This may sound simple, but it's actually very important. Previously, when storing logs in cheap storage (like Blob storage), analysis required the cumbersome step of ingesting them into an analysis platform first. In contrast, the data lake keeps costs low while allowing you to quickly run queries when an investigation is needed, which is a fantastic benefit.

KQL query being run against the Data Lake tier in the Defender XDR portal's advanced hunting view.

Note: Please be aware that running KQL against the data lake tier is a paid feature. Also, note that you cannot join two tables in a search.

Search & Restore

As mentioned above, KQL investigations in the data lake tier have constraints. However, if you need to perform a thorough investigation without these limitations, you can restore the data to the analytics tier. On the [Search & Restore] tab, you can select a table and a time period to restore the data, allowing you to investigate it in Advanced Hunting.

Search and Restore tab in the Defender XDR portal to restore data from the Data Lake to the Analytics tier.

However, be aware that the data format will be different from data ingested directly into Advanced Hunting, as shown below.

Restored data shown in Advanced Hunting with a different table name format.

Jobs

In addition, you can use Jobs to move small amounts of data directly from the data lake to the analytics tier. A job is a feature that runs a KQL query against data in the data lake tier and promotes the results to the analytics tier. You can run these as one-off or scheduled tasks.

While storage in the analytics tier incurs a higher billing rate than the data lake tier, using KQL allows you to reduce and filter the data, saving costs while promoting it to the analytics tier. This enables you to send all your data to the data lake tier, while specific logs meeting certain criteria are sent to the analytics tier for further hunting.

When moving data with Jobs, a dedicated table for the job is used (or created). Therefore, be aware that query changes may be necessary later on due to the change in table names.

Jobs tab in the Defender XDR portal for moving data from Data Lake to Analytics.

Select the Sentinel workspace and write your query.

Creating a new job by selecting a Sentinel workspace and writing a KQL query.

In the schedule settings, you can choose a one-time, daily, weekly, or monthly execution interval.

Job scheduling options: one-time, daily, weekly, or monthly.

Once the job runs, you can view the completed jobs in a list.

List of completed jobs.

As mentioned before, when migrating data, a separate table is created (or selected), so you will view the data in Advanced Hunting as a Custom Log.

Viewing the job's output data in a custom log table in Advanced Hunting.

Summary

Previously, while it was possible to store data long-term cheaply using services like Blob storage, it wasn't truly cost-effective when considering setup and operational overhead. With the arrival of the Data Lake, the cost-performance has improved significantly. What's also interesting is that Microsoft describes this data lake not just as a cheap storage option, but as something that "accelerates the adoption of agentized AI." Currently, the value in mitigating data silos with the Data Lake is clear, but it's also evident that Microsoft has a vision for its future use in AI, so I'll be keeping a close eye on its development.

Comments