Blog

November 7, 2023

Configuration Drift: How It Happens, Top Sources + How to Stop It for Good

Q: What is Configuration Drift?

Configuration drift is when configurations in an IT system gradually change over time. Drift is often unintentional and happens when undocumented or unapproved changes are made to software, hardware, and operating systems. It can have an impact on system performance and security.

Configuration Management,

Security & Compliance

Bad news: Configuration drift is going to happen no matter what you do. It’s very easy to miss, caused by some innocuous mistake or patch and buried in paper trails that don’t exist. By the time you’ve found which configurations have drifted, it usually means something’s already going wrong. But here’s the good news: You and your precious configurations don’t have to take it lying down.

In fact, there are a lot of ways to find and manage config drift, no matter where you’re deployed – but not all drift remediation tools and strategies are created equal. Take a few minutes to read this blog and review where configuration drift might be coming from in your IT, what it can do to your system reliability, and what you should be doing about it right now.

What is Configuration Drift?

Configuration drift is when configurations in an IT system gradually change over time. Drift is often unintentional and happens when undocumented or unapproved changes are made to software, hardware, and operating systems. It can have an impact on system performance and security.

The risk of configuration drift is that it causes inconsistencies in system configurations. Operating system configs that differ between development, testing, and production environments, for example, can create performance issues, security gaps, and compliance errors.

Those problems are especially risky in large IT environments, multi-cloud deployments, or IT hosted between the data center and the cloud.

Configuration Drift Examples: Consequences + Risks of Just Letting It Happen

Specific side effects of configuration drift include app failure, downtime, prolonged development lifecycles, increased IT tickets, security vulnerabilities, compliance failures, audit fines, and more.

The side effects of config drift range from ‘mildly inconvenient’ to ‘five-alarm fire’. All of them impact your organization in some way or another, and the only difference between levels of severity is how proactive you are about finding and managing drift.

At best, configuration drift causes:

Toil
Lost productivity
Rework
Reduced efficiency
Downtime of nonessential services
Difficult audits

And at worst, drifted configurations can be responsible for:

Security vulnerabilities including cyberattacks and breaches
Compliance violations with fines and reputational damage
Downtime of essential services, all the way to the app level
Data loss from misconfigured backup and storage settings
Unpredictable behavior in deployment due to inconsistent dev, testing, and production environments

What’s more, configuration drift is one of those problems that gets worse fast when you don’t treat it. Each misconfigured resource can be a liability – for example, a single instance of configuration drift can lead to sensitive data being exposed, security breaches, and compliance fines.

Imagine a lot of those individual misconfigurations piling up over time, without you noticing. All those misconfigured resources add up quickly, meaning config drift quickly goes from ‘easy fix’ to a major problem if you don’t address it swiftly.

The Top Causes of Configuration Drift: How Does Config Drift Happen, Anyway?

The top causes of configuration drift are:

Manual changes by devs or admins
Software updates that conflict with existing configurations
Patches that alter system configurations
Human error like typos, tunables, and other incorrect attributes
Debug and fixes that don’t get documented

There are a lot of reasons why configuration drift happens, but all config drift happens when code changes get through without being monitored or approved (even ransomware). Sometimes, it’s individual engineers making changes to their environment without telling production. Sometimes, it’s the code that slides in with a vital patch, or in the latest version of a software update.

The worst part is that when configuration drift causes a problem, developers are sometimes encouraged to drop everything to go find and fix it. And guess what happens while they’re trying to fix a problem. That’s right: More code changes get pushed without proper review, monitoring, tracking or reporting, leading to more inconsistency between the desired configurations and actual state. The vicious config drift cycle continues, with drift adding to drift adding to drift.

If you’re thinking this drift recursion cycle makes it tough to stay ready for audits, you’d be right. Read our blog to find out how to you can ahead of drift by automatically managing compliance >>

Configuration Drift Management: Definition, Tools + Strategies to Reduce Drift

Configuration drift management automatically compares system configurations against baselines to identify configuration drift that can lead to inefficiency, performance issues, and compliance errors.

How to Fix Configuration Drift: Two Approaches

There are two basic ways to fix configuration drift:

Track down configurations that have drifted and manually reconfigure them back to your desired state.
Use a configuration drift monitoring and remediation tool to automatically find and revert drifted configurations.

Obviously, manual remediation is a big, time-consuming, laborious task. It also doesn’t work that well, especially at scale (remember the “drift adding to drift” problem we mentioned above?). Most DevOps teams just don’t have the time or headcount to fix configuration drift as it pops up. Bottom line: If you’re managing more than a few configurations, the hands-on method is only going to lead to wasted time and more problems.

That’s why the best way to manage configuration drift is to use a configuration management tool like Puppet, Ansible, or Chef, along with tools for version control, continuous integration/continuous delivery (CI/CD), and documentation.

Puppet's agent-based automation and continuous compliance make Puppet Enterprise the configuration management tool of choice. Compare Puppet vs. Ansible here >>

Configuration Drift Management Tools

Config drift management tools do what it sounds like: They periodically scan your systems to identify configurations that have drifted from their defined, hardened state. Some tools also automatically remediate drifted configurations.

Different configuration management tools feature different capabilities and strengths. Some are designed for deep monitoring and reporting, while others can do all that and then remediate configurations automatically. Examples of configuration drift management tools include:

Configuration Drift Management Tool	Description	Uses/Capabilities
The CIS-CAT Pro Assessor	Created by the Center for Internet Security (CIS), this tool scans systems to track system hardening efforts aligned with the widely used CIS Benchmarks framework.	Configuration monitoring and alerting across hybrid infrastructure, aligned to customizable CIS Benchmarks.
Datadog	One of the most popular configuration drift monitoring tools out there, known for its easy onboarding and observability – and, depending on your usage, its hefty price tag.	Observability to enhance configuration drift detection, troubleshooting, and change validation.
Tripwire Enterprise	A security and compliance tool for protecting critical systems and sensitive data.	Monitoring file changes, notifying sysadmins, and automating enforcement of configurations.
AWS Config	AWS Config comes built into the platform to enable change tracking, drift detection, and remediation of AWS cloud configurations.	Configuration management for infrastructure hosted on AWS.
Chef	An automation and configuration management platform.	Configuration management to maintain desired state and prevent drift – but it’s known to struggle at scale.

Configuration management, the easy way: A configuration management system includes tools for monitoring, testing, reporting, logging, and fixing configuration drift >>

How to Prevent Configuration Drift from Becoming a Problem in the First Place

As long as you have people using the resources you’ve configured, configuration drift is going to happen one way or another. What matters is monitoring for drift continuously, catching it early, remediating it quickly, and documenting it properly.

Catching up to configuration drift is a great way to make sure you’re never out of desired state for long. But like with any healthy system or process, prevention is the best medicine. Creating and enforcing rules for configurations as code enables more proactive configuration drift management:

Policy as code (PaC) lets you establish rules for configurations that can be repeated as code across complex infrastructure (data center, public cloud, private cloud, and hybrid).
Compliance as code is the use of policy as code to align system configurations with your desired compliance policy (including standards and frameworks for regulated industries).
Infrastructure as code (IaC) executes on PaC policies to provision, build, and maintain infrastructure configurations at scale.
IT automation and configuration management tools take your policy as code and automate enforcement across on-premises, cloud, and hybrid IT.

Puppet Enterprise is an automation and configuration management solution that helps reduce drift, especially across large and complex infrastructure. Write your policies as code using Puppet’s simple configuration syntax (or leverage CIS-compliant modules), and Puppet Enterprise’s robust agent-based automation and configuration management capabilities will enforce them across your whole infrastructure. Automated runs detect drift from the configurations you wrote as part of your PaC, and Puppet Enterprise can provide recommendations and even automatically repair drifted configurations.

Agent-based automation means that Puppet enforces your written policies even if there’s a network disruption to your server. It doesn’t matter if the server is located in the data center or in the cloud – the Puppet agent will run every 30 minutes by default, enforcing your coded policies and maintaining system state. No agentless technology can match that capability at scale.

Compare agent-based vs. agentless automation for security >>

Read a configuration drift case study showing how Finastra used Puppet Enterprise to audit and correct drift, applying their policy as code to maintain compliance and optimize resources. Then request a demo of Puppet Enterprise or try it in some of your own infrastructure today.

DEMO PUPPET TRY PUPPET