Log Analytics and KQL Queries
What Is Log Analytics?
Log Analytics is a tool in Azure Monitor used to query and analyse log data collected from Azure resources. It uses the Kusto Query Language (KQL) — a powerful, read-only query language designed for big data analytics.
Microsoft Reference: Log Analytics overview
KQL Fundamentals
Basic Query Structure
TableName
| where TimeGenerated > ago(24h)
| where ColumnName == "value"
| summarize Count = count() by GroupColumn
| order by Count desc
| take 10
Key Operators
| Operator | Description | Example |
|---|---|---|
where |
Filter rows | where status == "Failed" |
project |
Select columns | project TimeGenerated, Name, Status |
summarize |
Aggregate data | summarize count() by Status |
order by |
Sort results | order by TimeGenerated desc |
take / limit |
Limit rows | take 100 |
extend |
Add computed columns | extend DurationSec = Duration / 1000 |
join |
Join tables | join kind=inner (Table2) on Id |
union |
Combine tables | union Table1, Table2 |
render |
Visualise results | render timechart |
parse |
Extract from strings | parse Message with "Order " OrderId " processed" |
mv-expand |
Expand arrays | mv-expand items |
make-series |
Create time series | make-series count() on TimeGenerated |
String Operations
// Contains (case-insensitive)
| where Message contains "error"
// Contains (case-sensitive)
| where Message contains_cs "Error"
// Starts with
| where Name startswith "la-"
// Matches regex
| where Message matches regex @"ORD-\d{6}"
// String extraction
| extend OrderId = extract(@"ORD-(\d{6})", 1, Message)
Time Operations
// Relative time
| where TimeGenerated > ago(1h)
| where TimeGenerated > ago(7d)
// Absolute time range
| where TimeGenerated between (datetime(2026-03-01) .. datetime(2026-03-14))
// Time binning
| summarize count() by bin(TimeGenerated, 15m)
| summarize count() by bin(TimeGenerated, 1h)
// Floor and ceiling
| extend HourBucket = floor(TimeGenerated, 1h)
Aggregation Functions
| summarize
Total = count(),
Average = avg(Duration),
Maximum = max(Duration),
Minimum = min(Duration),
P50 = percentile(Duration, 50),
P95 = percentile(Duration, 95),
P99 = percentile(Duration, 99),
Distinct = dcount(UserId),
StdDev = stdev(Duration)
by bin(TimeGenerated, 1h)
Microsoft Reference: KQL reference
Logic Apps Queries
Workflow Runs Overview
AzureDiagnostics
| where ResourceType == "WORKFLOWS"
| where Category == "WorkflowRuntime"
| where OperationName == "Microsoft.Logic/workflows/workflowRunCompleted"
| where TimeGenerated > ago(24h)
| summarize
TotalRuns = count(),
Succeeded = countif(status_s == "Succeeded"),
Failed = countif(status_s == "Failed"),
Cancelled = countif(status_s == "Cancelled")
by Resource
| extend SuccessRate = round(100.0 * Succeeded / TotalRuns, 2)
| order by Failed desc
Failed Workflow Analysis
AzureDiagnostics
| where ResourceType == "WORKFLOWS"
| where OperationName == "Microsoft.Logic/workflows/workflowRunCompleted"
| where status_s == "Failed"
| where TimeGenerated > ago(24h)
| project TimeGenerated, Resource, workflowId_s, error_code_s, error_message_s
| order by TimeGenerated desc
Workflow Duration Trends
AzureDiagnostics
| where ResourceType == "WORKFLOWS"
| where OperationName == "Microsoft.Logic/workflows/workflowRunCompleted"
| where TimeGenerated > ago(7d)
| extend DurationMs = datetime_diff('millisecond', endTime_t, startTime_t)
| summarize
AvgDuration = avg(DurationMs),
P95Duration = percentile(DurationMs, 95),
RunCount = count()
by bin(TimeGenerated, 1h), Resource
| render timechart
Action-Level Errors
AzureDiagnostics
| where ResourceType == "WORKFLOWS"
| where OperationName == "Microsoft.Logic/workflows/workflowActionCompleted"
| where status_s == "Failed"
| where TimeGenerated > ago(24h)
| summarize ErrorCount = count() by Resource, resource_actionName_s, error_code_s
| order by ErrorCount desc
| take 20
Tracked Properties
AzureDiagnostics
| where ResourceType == "WORKFLOWS"
| where isnotempty(trackedProperties_orderId_s)
| project TimeGenerated,
Resource,
OrderId = trackedProperties_orderId_s,
CustomerId = trackedProperties_customerId_s,
Status = status_s,
ErrorCode = trackedProperties_errorCode_s
| order by TimeGenerated desc
API Management Queries
Request Volume and Latency
ApiManagementGatewayLogs
| where TimeGenerated > ago(24h)
| summarize
TotalRequests = count(),
AvgDuration = avg(TotalTime),
P95Duration = percentile(TotalTime, 95),
ErrorRate = round(100.0 * countif(ResponseCode >= 400) / count(), 2)
by bin(TimeGenerated, 15m)
| render timechart
Top APIs by Usage
ApiManagementGatewayLogs
| where TimeGenerated > ago(7d)
| summarize
Calls = count(),
AvgLatency = round(avg(TotalTime), 0),
ErrorRate = round(100.0 * countif(ResponseCode >= 400) / count(), 2)
by ApiId
| order by Calls desc
Error Breakdown
ApiManagementGatewayLogs
| where TimeGenerated > ago(24h)
| where ResponseCode >= 400
| summarize Count = count() by ResponseCode, ApiId, OperationId
| order by Count desc
| take 20
// Error trend over time
ApiManagementGatewayLogs
| where TimeGenerated > ago(24h)
| summarize
Total = count(),
Client4xx = countif(ResponseCode >= 400 and ResponseCode < 500),
Server5xx = countif(ResponseCode >= 500)
by bin(TimeGenerated, 15m)
| render timechart
Backend Performance
ApiManagementGatewayLogs
| where TimeGenerated > ago(1h)
| where isnotempty(BackendUrl)
| extend BackendHost = parse_url(BackendUrl).Host
| summarize
AvgBackend = round(avg(BackendTime), 0),
P95Backend = round(percentile(BackendTime, 95), 0),
ErrorRate = round(100.0 * countif(ResponseCode >= 500) / count(), 2),
Calls = count()
by BackendHost
| order by AvgBackend desc
Consumer Analytics
ApiManagementGatewayLogs
| where TimeGenerated > ago(30d)
| summarize
TotalCalls = count(),
UniqueDays = dcount(bin(TimeGenerated, 1d)),
AvgCallsPerDay = count() / dcount(bin(TimeGenerated, 1d)),
ErrorRate = round(100.0 * countif(ResponseCode >= 400) / count(), 2)
by SubscriptionId, ProductId
| order by TotalCalls desc
Application Insights Queries
Request Performance
requests
| where timestamp > ago(24h)
| summarize
Total = count(),
Failed = countif(success == false),
AvgDuration = avg(duration),
P95Duration = percentile(duration, 95)
by bin(timestamp, 15m)
| render timechart
Dependency Failures
dependencies
| where timestamp > ago(24h)
| where success == false
| summarize FailCount = count() by target, type, resultCode
| order by FailCount desc
Exception Analysis
exceptions
| where timestamp > ago(24h)
| summarize Count = count() by type, outerMessage
| order by Count desc
| take 20
// Exception details with stack trace
exceptions
| where timestamp > ago(1h)
| where type == "System.TimeoutException"
| project timestamp, type, outerMessage, details[0].rawStack
| order by timestamp desc
End-to-End Transaction Search
// Find all telemetry for a specific operation
union requests, dependencies, exceptions, traces
| where operation_Id == "abc123xyz"
| project timestamp,
itemType,
name,
duration,
success,
resultCode,
message
| order by timestamp asc
Application Map Data
// Service dependencies and health
dependencies
| where timestamp > ago(1h)
| summarize
Calls = count(),
FailRate = round(100.0 * countif(success == false) / count(), 2),
AvgDuration = round(avg(duration), 0)
by target, type
| order by Calls desc
Cross-Resource Queries
Query across multiple Application Insights and Log Analytics workspaces:
// Query another workspace
workspace('log-integration-prod').AzureDiagnostics
| where ResourceType == "WORKFLOWS"
| where status_s == "Failed"
| take 10
// Query another App Insights resource
app('appi-api-platform-prod').requests
| where timestamp > ago(1h)
| where success == false
// Union across workspaces
union
workspace('log-integration-prod').AzureDiagnostics,
workspace('log-api-prod').AzureDiagnostics
| where TimeGenerated > ago(1h)
| where status_s == "Failed"
Microsoft Reference: Cross-resource queries
Saved Queries and Functions
Create Reusable Functions
// Save as function: LogicAppsFailures
// Parameters: timeRange:timespan = 24h
let LogicAppsFailures = (timeRange:timespan = 24h) {
AzureDiagnostics
| where ResourceType == "WORKFLOWS"
| where TimeGenerated > ago(timeRange)
| where OperationName == "Microsoft.Logic/workflows/workflowRunCompleted"
| where status_s == "Failed"
| project TimeGenerated, Resource, error_code_s, error_message_s
};
LogicAppsFailures(1h)
Alert Query Examples
// Logic App failure rate exceeds 5%
AzureDiagnostics
| where ResourceType == "WORKFLOWS"
| where OperationName == "Microsoft.Logic/workflows/workflowRunCompleted"
| where TimeGenerated > ago(15m)
| summarize Total = count(), Failed = countif(status_s == "Failed") by Resource
| where Failed > 0
| extend FailureRate = round(100.0 * Failed / Total, 2)
| where FailureRate > 5
Visualisation
Render Types
// Time chart
| render timechart
// Bar chart
| render barchart
// Pie chart
| render piechart
// Scatter plot
| render scatterchart
// Area chart
| render areachart
// Stacked area chart
| render stackedareachart
Dashboard Pinning
- Run your query in Log Analytics
- Click Pin to dashboard
- Choose an existing or new Azure dashboard
- Set the refresh interval
Best Practices
- Start with
| where TimeGenerated > ago(...)to limit data scanned - Use
projectearly to reduce column data processed - Use
summarizewithbin()for time-based aggregations - Save frequently used queries as functions for reuse
- Use
letstatements for complex queries to improve readability - Avoid
search *in production — it scans all tables and is expensive - Set query time ranges in the UI as well as the query for consistency
- Use
dcount()instead ofcount(distinct ...)for approximate distinct counts - Test queries on small time ranges before running on large datasets
- Pin critical queries to Azure dashboards for operational visibility