The following checklist can help to assess the realisation of an Azure application (mainly focusing on PaaS architectures) at any point of the development or rollout phase.
Scope | Check | Notes | Result |
---|---|---|---|
Defender for Cloud | Plan configured | At least free plan should be activated, standard plan (with Azure Defender) depending on architecture and used components. If architecture includes IaaS resources (VM, VNET) this is highly recommended | |
Defender for Cloud | Security Contact and email notifications configured | ||
Defender for Cloud | Continuous export of Alerts, Assessments and/or Scores to Log Analytics Workspace configured | Log Analytics Workspace must be deployed in same subscription, see snippet for complete deployment of resources and continuous export configuration | |
Partner Information | Partner reference configured on subscription | This is particularly applicable if the partner link has not already been made at the tenant level |
Scope | Check | Notes | Result |
---|---|---|---|
Naming & Tagging | All resources named and tagged following the convention defined by the customer (e.g. in Cloud Operating Model) or according to Microsoft recommendations | ||
Structuring | Environment specific resources (e.g. regarding deployment stages) are separated and isolated. Resource Groups do not contain resources from different environments or environment-specific resources mixed up with common/shared resources | ||
Structuring | All resources in a Resource Group origin from one single deployment source | ||
Structuring | The structuring with Resource Groups supports foreseeable later extensions (e.g. additional languages) and scaling measures | Typical shortcoming: Apps can only be moved to an App Service Plan in the same Resource Group |
Scope | Check | Notes | Result |
---|---|---|---|
Infrastructure as Code (IaC) | All resource deployments and configurations are managed based on script(s) | One-time administrations (e.g. user or access configurations) may be excluded if those manual steps are clearly documented | |
Continuous Integration & Deployment (CI/CD) | The deployment is working to both initial or pre-existing environments fully automated. The target environment (tenant, subscription) is only a matter of configuration | One-time administrations (e.g. user or access configurations) may be excluded if those manual steps are clearly documented | |
Configurations | All sensitive configurations are solely managed with Key Vault and deployed using a safe deployment process | ||
Configurations | Sensitive configuration values are never stored in Git repositories but injected during deployment processes (e.g. using Azure DevOps Variable Groups) | ||
Configurations | Sensitive parameters use according type definition (e.g. securestring in ARM templates) and are never logged or published (e.g. as outputs in ARM templates) |
||
Configurations | Key Vault references in app settings of App Service resources are either not version-specific or the correct deployment order (using dependsOn in ARM templates) is ensured to prevent failures by chance |
||
Consistency | The structuring and naming of the source data (Git repository) correlates to the deployed resources (e.g. Function App projects) | ||
Causality | Resources are not defined or supplied from multiple sources | This mainly involves Function Apps (all included functions from same Visual Studio project and with same deployment process) and API Management. Rule: In a disaster recovery situation, a resource must be recoverable with one process. |
Scope | Check | Notes | Result |
---|---|---|---|
Identity | Managed Identity assigned to all (supported) resources having connections with each other and used for authorisation | This mainly includes components executing business logic such as App Service based resources, Logic Apps or Data Factory Pipelines | |
Keys and Certificates | Managed with Key Vault where possible and deployed using a safe deployment process. Authorisation keys for Function Apps (host or function level) are used if connectivity is not protected in another way and a regular key exchange process is foreseen (stable, automated process) | ||
TLS | TLS (HTTPS) is enforced wherever possible | Mainly includes Storage Account configuration and App Service based resources |
Scope | Check | Notes | Result |
---|---|---|---|
Logging | Diagnostic settings for all (supported) resources set to one Log Analytics Workspace (per application & environment) with all relevant event types which may support analytics | Also Application Insights are workspace-based | |
Logging | The ingested data volume of Application Insights resource(s) is examined and validated that production usage does not generate unexpected or unreasonable costs | You may consider reduction of log levels or event sources or apply sampling | |
Alerting | For non-user driven processes (e.g. synchronisation or import/export jobs) an error handling strategy is realized | ||
Alerting | Defined SLA aspects are measured with metrics (SLI) and alerted when becoming critical | See Best Practices for Monitoring | |
Error Handling | Enduring runtime errors are escalated and an according process is available | Typical checks are Logic Apps (with a number of automatic resubmits) and Service Bus Dead Letter Queue handling |
Scope | Check | Notes | Result |
---|---|---|---|
Cost Optimisation | Capacity reservations (mainly for productive environment) examined including dimensioning and proposed to owner | Beside reserved instances for VM’s, capacity reservations are mostly available for data storage services (Log Analytics Workspace / Sentinel, Synapse, Cosmos and SQL databases and others) | |
Cost Optimisation | The implementation does not lead to an unexpected growth of costs regarding the stored data | Blobs may be moved or deleted with lifecycle management rules, data in Cosmos DB or Queue systems may use a Time to Live (TTL) definition or other clean-up processes may be foreseen | |
Scaling | The resources adapt to the expected workloads (amount of use) without being oversized for the regular load. The concept is well-thought-out and documented. | For validation, take every exposed endpoint (i.e. user interface, API or event sources such as Event Grids) and follow the lineage of dependant resources. | |
Scaling | On resources with auto-scaling functionality enabled, the according scale-down rules are appropriately configured | ||
Disaster Recovery | Data is classified and data storage resources have appropriate backup procedures and recovery process is foreseen, tested and known to those responsible. RPO and RTO metrics are well-thought-out and documented. Recovery processes do not (likely) cause troubles due to data inconsistencies with other resource. | As Table Storage (Storage Account) does not have automatic backup functionality, it should not be used for critical data (replace with Cosmos DB) or you need to provide an according process (e.g. using Data Factory Pipelines) | |
Disaster Recovery | Cognitive Search Indexes can be recovered and rebuilt | ||
Cold Start Behaviour | APIs provided with Function Apps in Consumption plan do not cause unexpected cold start issues. App Service based resources have AlwaysOn configuration appropriately set. |
Use a premium or dedicated plan for such Functions | |
Timeout Behaviour | Function Apps may not (likely) run into an execution timeout, which is configured to an appropriate value | Use a premium or dedicated plan for such Functions which supports longer timeout configuration or implement logic with a Durable Function. In case of uncertainty, you may create an Alert which informs you about longer durations before timeouts occur. | |
Latency | Resources run in the appropriate Azure region close to the users and all in the same region as far as possible and reasonable, especially resources which exchange high data volumes | ||
Geographic Availability | The availability according to customer’s needs and specifications is appropriately foreseen | This also applies in particular to data replication | |
Redundancy | The replication configuration (e.g. for Storage Accounts) is appropriately configured | Consider separation of data (especially business and application data) to multiple Storage Accounts with appropriate configuration | |
Soft Delete | Activated on Key Vault instances (generally recommended) and examined for Blob Storage resources | ||
Archiving | Data retention ensured according to requirements | This may include the use of lifecycle management rules for Blob Storage (with retention policies) | |
Resource Protection | Critical production resources are protected with locks, if this risk is not mitigated by RBAC |
Scope | Check | Notes | Result |
---|---|---|---|
Advisor | The Azure Advisor Recommendations are examined and reasonable proposals are implemented | ||
Defender for Cloud | The Security Score and Recommendations are examined and reasonable proposals are implemented |