Monitoring APIM with Application Insights

AdvancedAPI Management2026-03-14

Monitoring Overview

Comprehensive monitoring of Azure API Management is essential for understanding API performance, identifying issues, and ensuring SLA compliance.

Microsoft Reference: Monitor API Management

Application Insights Integration

Setting Up the Logger

resource appInsights 'Microsoft.Insights/components@2020-02-02' = {
  name: 'appi-apim-${environment}'
  location: location
  kind: 'web'
  properties: {
    Application_Type: 'web'
    WorkspaceResourceId: logAnalyticsWorkspace.id
    RetentionInDays: environment == 'prod' ? 90 : 30
  }
}

resource apimLogger 'Microsoft.ApiManagement/service/loggers@2023-05-01-preview' = {
  parent: apim
  name: 'appinsights-logger'
  properties: {
    loggerType: 'applicationInsights'
    resourceId: appInsights.id
    credentials: {
      connectionString: appInsights.properties.ConnectionString
    }
  }
}

API-Level Diagnostics

Enable detailed logging per API:

resource apiDiagnostic 'Microsoft.ApiManagement/service/apis/diagnostics@2023-05-01-preview' = {
  parent: ordersApi
  name: 'applicationinsights'
  properties: {
    loggerId: apimLogger.id
    alwaysLog: 'allErrors'
    sampling: {
      samplingType: 'fixed'
      percentage: environment == 'prod' ? 25 : 100
    }
    verbosity: environment == 'prod' ? 'information' : 'verbose'
    httpCorrelationProtocol: 'W3C'
    logClientIp: true
    frontend: {
      request: {
        headers: ['X-Correlation-Id', 'X-Forwarded-For']
        body: { bytes: 1024 }
      }
      response: {
        headers: ['Content-Type', 'X-Request-Id']
        body: { bytes: 1024 }
      }
    }
    backend: {
      request: {
        headers: ['Host']
        body: { bytes: 1024 }
      }
      response: {
        headers: ['Content-Type']
        body: { bytes: 1024 }
      }
    }
  }
}

Global Diagnostics

Apply diagnostics to all APIs:

resource globalDiagnostic 'Microsoft.ApiManagement/service/diagnostics@2023-05-01-preview' = {
  parent: apim
  name: 'applicationinsights'
  properties: {
    loggerId: apimLogger.id
    alwaysLog: 'allErrors'
    sampling: {
      samplingType: 'fixed'
      percentage: 10
    }
    httpCorrelationProtocol: 'W3C'
  }
}

Sampling Strategies

Environment	Sampling Rate	Rationale
Development	100%	Full visibility for debugging
Testing	100%	Capture all test scenarios
Staging	50%	Balance between visibility and cost
Production	5–25%	Cost-effective monitoring at scale

Microsoft Reference: Application Insights integration

Azure Monitor Metrics

Key APIM Metrics

Metric	Description	Alert Threshold
Requests	Total API call count	Baseline deviation
Failed Requests	4xx and 5xx responses	> 5% of total
Unauthorised Requests	401 responses	> 10 per minute
Overall Gateway Requests Duration	End-to-end latency	p95 > 2 seconds
Backend Request Duration	Backend response time	p95 > 1 second
Capacity	Gateway utilisation (%)	> 80%
Event Hub Events (Dropped)	Lost diagnostic events	> 0

Capacity Monitoring

Capacity is the most critical metric for APIM scaling decisions:

resource capacityAlert 'Microsoft.Insights/metricAlerts@2018-03-01' = {
  name: 'alert-apim-capacity'
  location: 'global'
  properties: {
    severity: 2
    scopes: [apim.id]
    evaluationFrequency: 'PT5M'
    windowSize: 'PT15M'
    criteria: {
      'odata.type': 'Microsoft.Azure.Monitor.SingleResourceMultipleMetricCriteria'
      allOf: [
        {
          name: 'HighCapacity'
          metricName: 'Capacity'
          operator: 'GreaterThan'
          threshold: 80
          timeAggregation: 'Average'
        }
      ]
    }
    actions: [
      { actionGroupId: opsActionGroup.id }
    ]
  }
}

Microsoft Reference: APIM capacity

Log Analytics Queries (KQL)

Request Volume and Latency

// API request volume over time (15-minute intervals)
ApiManagementGatewayLogs
| where TimeGenerated > ago(24h)
| summarize RequestCount = count(),
            AvgDuration = avg(TotalTime),
            P95Duration = percentile(TotalTime, 95),
            P99Duration = percentile(TotalTime, 99)
  by bin(TimeGenerated, 15m)
| render timechart

Error Analysis

// Top errors by API and operation
ApiManagementGatewayLogs
| where TimeGenerated > ago(24h)
| where ResponseCode >= 400
| summarize ErrorCount = count() by ApiId, OperationId, ResponseCode
| order by ErrorCount desc
| take 20

Slow Requests

// Requests slower than 2 seconds
ApiManagementGatewayLogs
| where TimeGenerated > ago(1h)
| where TotalTime > 2000
| project TimeGenerated, ApiId, OperationId, TotalTime,
          BackendTime, ClientTime,
          ResponseCode, CallerIpAddress
| order by TotalTime desc

Consumer Analytics

// Top API consumers by subscription
ApiManagementGatewayLogs
| where TimeGenerated > ago(7d)
| summarize CallCount = count(),
            AvgLatency = avg(TotalTime),
            ErrorRate = countif(ResponseCode >= 400) * 100.0 / count()
  by SubscriptionId
| order by CallCount desc
| take 20

Backend Health

// Backend response time by backend URL
ApiManagementGatewayLogs
| where TimeGenerated > ago(1h)
| where isnotempty(BackendUrl)
| summarize AvgBackendTime = avg(BackendTime),
            P95BackendTime = percentile(BackendTime, 95),
            ErrorRate = countif(ResponseCode >= 500) * 100.0 / count(),
            TotalRequests = count()
  by BackendUrl
| order by AvgBackendTime desc

Policy Execution Analysis

// Track custom trace messages from policies
ApiManagementGatewayLogs
| where TimeGenerated > ago(1h)
| extend PolicyTrace = tostring(parse_json(ResponseBody)["trace"])
| where isnotempty(PolicyTrace)
| project TimeGenerated, ApiId, PolicyTrace, TotalTime

Diagnostic Settings

Enable Diagnostic Logging

resource apimDiagnostics 'Microsoft.Insights/diagnosticSettings@2021-05-01-preview' = {
  name: 'apim-diagnostics'
  scope: apim
  properties: {
    workspaceId: logAnalyticsWorkspace.id
    logs: [
      {
        category: 'GatewayLogs'
        enabled: true
        retentionPolicy: { enabled: true, days: 90 }
      }
      {
        category: 'WebSocketConnectionLogs'
        enabled: true
        retentionPolicy: { enabled: true, days: 30 }
      }
    ]
    metrics: [
      {
        category: 'AllMetrics'
        enabled: true
        retentionPolicy: { enabled: true, days: 30 }
      }
    ]
  }
}

Log Categories

Category	Description	Retention
GatewayLogs	API request/response details	90 days
WebSocketConnectionLogs	WebSocket connection events	30 days
AllMetrics	Performance and usage metrics	30 days

Alerts

Failed Request Alert

resource failedRequestAlert 'Microsoft.Insights/metricAlerts@2018-03-01' = {
  name: 'alert-apim-failed-requests'
  location: 'global'
  properties: {
    severity: 1
    scopes: [apim.id]
    evaluationFrequency: 'PT5M'
    windowSize: 'PT5M'
    criteria: {
      'odata.type': 'Microsoft.Azure.Monitor.SingleResourceMultipleMetricCriteria'
      allOf: [
        {
          name: 'HighErrorRate'
          metricName: 'FailedRequests'
          operator: 'GreaterThan'
          threshold: 50
          timeAggregation: 'Total'
        }
      ]
    }
    actions: [
      { actionGroupId: opsActionGroup.id }
    ]
  }
}

Latency Alert

resource latencyAlert 'Microsoft.Insights/metricAlerts@2018-03-01' = {
  name: 'alert-apim-latency'
  location: 'global'
  properties: {
    severity: 2
    scopes: [apim.id]
    evaluationFrequency: 'PT5M'
    windowSize: 'PT15M'
    criteria: {
      'odata.type': 'Microsoft.Azure.Monitor.SingleResourceMultipleMetricCriteria'
      allOf: [
        {
          name: 'HighLatency'
          metricName: 'Duration'
          operator: 'GreaterThan'
          threshold: 3000
          timeAggregation: 'Average'
        }
      ]
    }
    actions: [
      { actionGroupId: opsActionGroup.id }
    ]
  }
}

Action Groups

resource opsActionGroup 'Microsoft.Insights/actionGroups@2023-01-01' = {
  name: 'ag-ops-team'
  location: 'global'
  properties: {
    groupShortName: 'OpsTeam'
    enabled: true
    emailReceivers: [
      {
        name: 'OpsTeamEmail'
        emailAddress: 'ops-team@enterprise.com'
        useCommonAlertSchema: true
      }
    ]
    azureAppPushReceivers: [
      {
        name: 'OnCallEngineer'
        emailAddress: 'oncall@enterprise.com'
      }
    ]
    webhookReceivers: [
      {
        name: 'PagerDuty'
        serviceUri: 'https://events.pagerduty.com/integration/{key}/enqueue'
        useCommonAlertSchema: true
      }
    ]
  }
}

Distributed Tracing

W3C Trace Context

APIM supports W3C Trace Context for end-to-end distributed tracing:

Client → APIM (traceparent header) → Backend → Database
   ↓         ↓                         ↓
Application Insights (correlated traces across all services)

Correlation in Policies

<inbound>
    <!-- Set correlation ID from incoming header or generate new -->
    <set-header name="X-Correlation-Id" exists-action="skip">
        <value>@(context.RequestId.ToString())</value>
    </set-header>
    <!-- Pass to backend -->
    <set-header name="traceparent" exists-action="skip">
        <value>@{
            var traceId = context.RequestId.ToString("N");
            var spanId = Guid.NewGuid().ToString("N").Substring(0, 16);
            return $"00-{traceId}-{spanId}-01";
        }</value>
    </set-header>
</inbound>

Viewing Traces

In Application Insights:

Open Transaction Search
Filter by time range and operation
Click a request to see the end-to-end trace
View the Application Map for service dependencies

Microsoft Reference: Distributed tracing with APIM

Azure Monitor Workbooks

Create custom dashboards for API monitoring:

API Health Dashboard

Key sections to include:

Request Volume — Time series of total requests
Error Rate — Percentage of 4xx/5xx responses
Latency Distribution — p50, p95, p99 response times
Top Errors — Most common error responses
Capacity — Gateway utilisation
Top Consumers — Most active subscriptions
Backend Health — Backend response times and error rates
Geographic Distribution — Request origins

Microsoft Reference: Azure Monitor workbooks

Policy-Based Logging

Custom Trace Messages

<trace source="custom-trace" severity="information">
    <message>@{
        return String.Format("Processing order {0} for customer {1}",
            context.Request.MatchedParameters["orderId"],
            context.Request.Headers.GetValueOrDefault("X-Customer-Id", "unknown"));
    }</message>
    <metadata name="orderId" value="@(context.Request.MatchedParameters["orderId"])" />
    <metadata name="customerRegion" value="@(context.Request.Headers.GetValueOrDefault("X-Region", "unknown"))" />
</trace>

Emit Custom Metrics

<inbound>
    <emit-metric name="custom-api-calls" value="1" namespace="apim-custom-metrics">
        <dimension name="API" value="@(context.Api.Name)" />
        <dimension name="Operation" value="@(context.Operation.Name)" />
        <dimension name="ProductName" value="@(context.Product?.Name ?? "none")" />
    </emit-metric>
</inbound>

Microsoft Reference: Emit custom metrics policy

Best Practices

Enable Application Insights on all APIM instances for comprehensive monitoring
Use sampling in production (5–25%) to control costs while maintaining visibility
Set up alerts for capacity, error rates, and latency thresholds
Log request/response bodies only in non-production or at low sampling rates
Use W3C Trace Context for end-to-end distributed tracing
Create dashboards for different audiences (operations, development, business)
Monitor capacity proactively — scale before hitting limits
Retain logs for compliance (90 days minimum for production)
Use custom metrics for business-specific monitoring
Review analytics regularly to identify trends and optimisation opportunities