Log Analytics and KQL Queries

IntermediateApplication Insights & Monitoring2026-03-14

What Is Log Analytics?

Log Analytics is a tool in Azure Monitor used to query and analyse log data collected from Azure resources. It uses the Kusto Query Language (KQL) — a powerful, read-only query language designed for big data analytics.

Microsoft Reference: Log Analytics overview

KQL Fundamentals

Basic Query Structure

TableName
| where TimeGenerated > ago(24h)
| where ColumnName == "value"
| summarize Count = count() by GroupColumn
| order by Count desc
| take 10

Key Operators

Operator	Description	Example
`where`	Filter rows	`where status == "Failed"`
`project`	Select columns	`project TimeGenerated, Name, Status`
`summarize`	Aggregate data	`summarize count() by Status`
`order by`	Sort results	`order by TimeGenerated desc`
`take` / `limit`	Limit rows	`take 100`
`extend`	Add computed columns	`extend DurationSec = Duration / 1000`
`join`	Join tables	`join kind=inner (Table2) on Id`
`union`	Combine tables	`union Table1, Table2`
`render`	Visualise results	`render timechart`
`parse`	Extract from strings	`parse Message with "Order " OrderId " processed"`
`mv-expand`	Expand arrays	`mv-expand items`
`make-series`	Create time series	`make-series count() on TimeGenerated`

String Operations

// Contains (case-insensitive)
| where Message contains "error"

// Contains (case-sensitive)
| where Message contains_cs "Error"

// Starts with
| where Name startswith "la-"

// Matches regex
| where Message matches regex @"ORD-\d{6}"

// String extraction
| extend OrderId = extract(@"ORD-(\d{6})", 1, Message)

Time Operations

// Relative time
| where TimeGenerated > ago(1h)
| where TimeGenerated > ago(7d)

// Absolute time range
| where TimeGenerated between (datetime(2026-03-01) .. datetime(2026-03-14))

// Time binning
| summarize count() by bin(TimeGenerated, 15m)
| summarize count() by bin(TimeGenerated, 1h)

// Floor and ceiling
| extend HourBucket = floor(TimeGenerated, 1h)

Aggregation Functions

| summarize
    Total = count(),
    Average = avg(Duration),
    Maximum = max(Duration),
    Minimum = min(Duration),
    P50 = percentile(Duration, 50),
    P95 = percentile(Duration, 95),
    P99 = percentile(Duration, 99),
    Distinct = dcount(UserId),
    StdDev = stdev(Duration)
  by bin(TimeGenerated, 1h)

Microsoft Reference: KQL reference

Logic Apps Queries

Workflow Runs Overview

AzureDiagnostics
| where ResourceType == "WORKFLOWS"
| where Category == "WorkflowRuntime"
| where OperationName == "Microsoft.Logic/workflows/workflowRunCompleted"
| where TimeGenerated > ago(24h)
| summarize
    TotalRuns = count(),
    Succeeded = countif(status_s == "Succeeded"),
    Failed = countif(status_s == "Failed"),
    Cancelled = countif(status_s == "Cancelled")
  by Resource
| extend SuccessRate = round(100.0 * Succeeded / TotalRuns, 2)
| order by Failed desc

Failed Workflow Analysis

AzureDiagnostics
| where ResourceType == "WORKFLOWS"
| where OperationName == "Microsoft.Logic/workflows/workflowRunCompleted"
| where status_s == "Failed"
| where TimeGenerated > ago(24h)
| project TimeGenerated, Resource, workflowId_s, error_code_s, error_message_s
| order by TimeGenerated desc

Workflow Duration Trends

AzureDiagnostics
| where ResourceType == "WORKFLOWS"
| where OperationName == "Microsoft.Logic/workflows/workflowRunCompleted"
| where TimeGenerated > ago(7d)
| extend DurationMs = datetime_diff('millisecond', endTime_t, startTime_t)
| summarize
    AvgDuration = avg(DurationMs),
    P95Duration = percentile(DurationMs, 95),
    RunCount = count()
  by bin(TimeGenerated, 1h), Resource
| render timechart

Action-Level Errors

AzureDiagnostics
| where ResourceType == "WORKFLOWS"
| where OperationName == "Microsoft.Logic/workflows/workflowActionCompleted"
| where status_s == "Failed"
| where TimeGenerated > ago(24h)
| summarize ErrorCount = count() by Resource, resource_actionName_s, error_code_s
| order by ErrorCount desc
| take 20

Tracked Properties

AzureDiagnostics
| where ResourceType == "WORKFLOWS"
| where isnotempty(trackedProperties_orderId_s)
| project TimeGenerated,
          Resource,
          OrderId = trackedProperties_orderId_s,
          CustomerId = trackedProperties_customerId_s,
          Status = status_s,
          ErrorCode = trackedProperties_errorCode_s
| order by TimeGenerated desc

API Management Queries

Request Volume and Latency

ApiManagementGatewayLogs
| where TimeGenerated > ago(24h)
| summarize
    TotalRequests = count(),
    AvgDuration = avg(TotalTime),
    P95Duration = percentile(TotalTime, 95),
    ErrorRate = round(100.0 * countif(ResponseCode >= 400) / count(), 2)
  by bin(TimeGenerated, 15m)
| render timechart

Top APIs by Usage

ApiManagementGatewayLogs
| where TimeGenerated > ago(7d)
| summarize
    Calls = count(),
    AvgLatency = round(avg(TotalTime), 0),
    ErrorRate = round(100.0 * countif(ResponseCode >= 400) / count(), 2)
  by ApiId
| order by Calls desc

Error Breakdown

ApiManagementGatewayLogs
| where TimeGenerated > ago(24h)
| where ResponseCode >= 400
| summarize Count = count() by ResponseCode, ApiId, OperationId
| order by Count desc
| take 20

// Error trend over time
ApiManagementGatewayLogs
| where TimeGenerated > ago(24h)
| summarize
    Total = count(),
    Client4xx = countif(ResponseCode >= 400 and ResponseCode < 500),
    Server5xx = countif(ResponseCode >= 500)
  by bin(TimeGenerated, 15m)
| render timechart

Backend Performance

ApiManagementGatewayLogs
| where TimeGenerated > ago(1h)
| where isnotempty(BackendUrl)
| extend BackendHost = parse_url(BackendUrl).Host
| summarize
    AvgBackend = round(avg(BackendTime), 0),
    P95Backend = round(percentile(BackendTime, 95), 0),
    ErrorRate = round(100.0 * countif(ResponseCode >= 500) / count(), 2),
    Calls = count()
  by BackendHost
| order by AvgBackend desc

Consumer Analytics

ApiManagementGatewayLogs
| where TimeGenerated > ago(30d)
| summarize
    TotalCalls = count(),
    UniqueDays = dcount(bin(TimeGenerated, 1d)),
    AvgCallsPerDay = count() / dcount(bin(TimeGenerated, 1d)),
    ErrorRate = round(100.0 * countif(ResponseCode >= 400) / count(), 2)
  by SubscriptionId, ProductId
| order by TotalCalls desc

Application Insights Queries

Request Performance

requests
| where timestamp > ago(24h)
| summarize
    Total = count(),
    Failed = countif(success == false),
    AvgDuration = avg(duration),
    P95Duration = percentile(duration, 95)
  by bin(timestamp, 15m)
| render timechart

Dependency Failures

dependencies
| where timestamp > ago(24h)
| where success == false
| summarize FailCount = count() by target, type, resultCode
| order by FailCount desc

Exception Analysis

exceptions
| where timestamp > ago(24h)
| summarize Count = count() by type, outerMessage
| order by Count desc
| take 20

// Exception details with stack trace
exceptions
| where timestamp > ago(1h)
| where type == "System.TimeoutException"
| project timestamp, type, outerMessage, details[0].rawStack
| order by timestamp desc

End-to-End Transaction Search

// Find all telemetry for a specific operation
union requests, dependencies, exceptions, traces
| where operation_Id == "abc123xyz"
| project timestamp,
          itemType,
          name,
          duration,
          success,
          resultCode,
          message
| order by timestamp asc

Application Map Data

// Service dependencies and health
dependencies
| where timestamp > ago(1h)
| summarize
    Calls = count(),
    FailRate = round(100.0 * countif(success == false) / count(), 2),
    AvgDuration = round(avg(duration), 0)
  by target, type
| order by Calls desc

Cross-Resource Queries

Query across multiple Application Insights and Log Analytics workspaces:

// Query another workspace
workspace('log-integration-prod').AzureDiagnostics
| where ResourceType == "WORKFLOWS"
| where status_s == "Failed"
| take 10

// Query another App Insights resource
app('appi-api-platform-prod').requests
| where timestamp > ago(1h)
| where success == false

// Union across workspaces
union
  workspace('log-integration-prod').AzureDiagnostics,
  workspace('log-api-prod').AzureDiagnostics
| where TimeGenerated > ago(1h)
| where status_s == "Failed"

Microsoft Reference: Cross-resource queries

Saved Queries and Functions

Create Reusable Functions

// Save as function: LogicAppsFailures
// Parameters: timeRange:timespan = 24h
let LogicAppsFailures = (timeRange:timespan = 24h) {
  AzureDiagnostics
  | where ResourceType == "WORKFLOWS"
  | where TimeGenerated > ago(timeRange)
  | where OperationName == "Microsoft.Logic/workflows/workflowRunCompleted"
  | where status_s == "Failed"
  | project TimeGenerated, Resource, error_code_s, error_message_s
};
LogicAppsFailures(1h)

Alert Query Examples

// Logic App failure rate exceeds 5%
AzureDiagnostics
| where ResourceType == "WORKFLOWS"
| where OperationName == "Microsoft.Logic/workflows/workflowRunCompleted"
| where TimeGenerated > ago(15m)
| summarize Total = count(), Failed = countif(status_s == "Failed") by Resource
| where Failed > 0
| extend FailureRate = round(100.0 * Failed / Total, 2)
| where FailureRate > 5

Visualisation

Render Types

// Time chart
| render timechart

// Bar chart
| render barchart

// Pie chart
| render piechart

// Scatter plot
| render scatterchart

// Area chart
| render areachart

// Stacked area chart
| render stackedareachart

Dashboard Pinning

Run your query in Log Analytics
Click Pin to dashboard
Choose an existing or new Azure dashboard
Set the refresh interval

Best Practices

Start with | where TimeGenerated > ago(...) to limit data scanned
Use project early to reduce column data processed
Use summarize with bin() for time-based aggregations
Save frequently used queries as functions for reuse
Use let statements for complex queries to improve readability
Avoid search * in production — it scans all tables and is expensive
Set query time ranges in the UI as well as the query for consistency
Use dcount() instead of count(distinct ...) for approximate distinct counts
Test queries on small time ranges before running on large datasets
Pin critical queries to Azure dashboards for operational visibility