Hello,
In order to put alerts for a VM we should enable diagnostics (metrics) for it and then create the desired alerts.
Now, I have some problems with triggering an alert for CPU guest usage when I have enabled this type of diagnostics with this json code :
{"name": "agent","properties": {"name": "agent","publicConfiguration": {"odata.type": "Microsoft.Azure.Management.Insights.Models.PublicMonitoringConfiguration","diagnosticMonitorConfiguration": {"overallQuotaInMB": 4096,"metrics": {"resourceId": "/subscriptions/xxxxxxxxxx/resourceGroups/xxxxx/providers/Microsoft.Compute/virtualMachines/xxx","aggregations": [ {"scheduledTransferPeriod": "PT1H" }, {"scheduledTransferPeriod": "PT5M" } ] },"diagnosticInfrastructureLogs": {"scheduledTransferLogLevelFilter": "Warning","scheduledTransferPeriod": "PT5M" },"directories": {"iisLogs": "vhds","scheduledTransferPeriod": "PT5M" },"performanceCounters": {"counters": [ {"annotations": [ {"value": "CPU percentage guest OS","locale": "en-us" } ],"counterSpecifier": "\\Processor\\PercentProcessorTime","sampleRate": "PT15S","unit": "Percent" } ],"scheduledTransferPeriod": "PT0S" }, },"storageAccount": "xxxx" } } }
With an alert created like this :
{"location": "xxxx","tags": { },"properties": {"name": "CPU Idle","description": "CPU Idle Time","isEnabled": true,"condition": {"odata.type": "Microsoft.Azure.Management.Insights.Models.ThresholdRuleCondition","dataSource": {"odata.type": "Microsoft.Azure.Management.Insights.Models.RuleMetricDataSource","resourceUri": "/subscriptions/xxxxxxxxx/resourceGroups/xxx/providers/Microsoft.Compute/virtualMachines/xxx","metricName": "\\Processor\\PercentProcessorTime" },"threshold": 10,"windowSize": "PT8H","operator": "LessThan","timeAggregation": "Maximum" },"action": {"$type": "Microsoft.WindowsAzure.Management.Monitoring.Alerts.Models.RuleEmailAction, Microsoft.WindowsAzure.Management.Mon.Client","odata.type": "Microsoft.Azure.Management.Insights.Models.RuleEmailAction","sendToServiceOwners": false },"actions": [ {"$type": "Microsoft.WindowsAzure.Management.Monitoring.Alerts.Models.RuleWebhookAction, Microsoft.WindowsAzure.Management.Mon.Client","odata.type": "Microsoft.Azure.Management.Insights.Models.RuleWebhookAction","serviceUri": "xxxxxxxxx","properties": {} } ] } }
The problem comes from bold text when I enable the diagnostics because it's telling me in portal "No available data" and nothing happens with my alert. The alert is not triggered also for other scheduledTranferPerioad values, starting from 2 minutes and > .
"scheduledTransferPeriod": "PT5M"
If I use 1 minute sample rate, everything goes ALL RIGHT and it's working!
"scheduledTransferPeriod": "PT1M"
Why I need a lower sample rate? Because I want to increase my alert windowsize *24 hours
I need to use a slow sampling rate because I want to put my alert to check it for the last 24 hours and IF I am using 1 minute sample rate for metrics I am over the maximum data points (1000).
So, using 1 minute sample rate for metrics I can use my alert for maxim 16hours window [ 60 (minutes) / 1 (minute sample rate) * 16 = 960 < 1000 which is OK ]
For 24hours alert window size and 1 minute sample rate I will have 60/1 * 24 = 1440 > 1000 (It triggers me an error)
So I should have at least 2 minute sample rate ! For this sample rate I will have 60/2 * 24 = 720 < 1000 (GOOD)
please advise.
Thanks!