Disaster recovery

Disaster recovery creates a replica of your primary server.

You can have only one replica at a time, and you can add disaster recovery to an installation with or without compilers.

There are two main advantages to enabling disaster recovery:

If your primary server fails, the replica takes over the handling of Puppet Server and PuppetDB traffic, allowing existing agents to remain operational and Puppet runs to continue without interruption. By configuring nodes to automatically fail over to the replica when the primary is unreachable, you can ensure that they still receive catalogs and enforce your desired state.
If your primary server can’t be repaired, you can promote the replica to primary server. Promotion establishes the replica as the new, permanent primary server.

Disaster recovery architecture

The replica is not an exact copy of the primary server. Rather, the replica duplicates specific infrastructure components and services. By default Hiera data and other custom configurations are not replicated. However, if you store Hiera data in the control repository, as recommended, the data is replicated through Code Manager.

Replication can be read-write, meaning that data can be written to the service or component on either the primary server or the replica, and the data is synced to both nodes. Alternatively, replication can be read-only, where data is written only to the primary server and synced to the replica. Some components and services, like Puppet Server and the console service UI, are not replicated because they contain no native data.

Some components and services are activated immediately when you enable a replica; others aren't active until you promote a replica.


Component or service	Type of replication	Activated when replica is...
Puppet Server	none	enabled
File sync client	read-only	enabled
PuppetDB	read-write	enabled
Certificate authority	read-only	promoted
RBAC service	read-only	enabled
Node classifier service	read-only	enabled
Activity service	read-only	enabled
Orchestration service	read-only	promoted
Console service UI	none	promoted
Agentless Catalog Executor (ACE) service	none	promoted
Bolt service	none	promoted
Host Action Collector service	read-only	promoted

The following services performed by the primary server are unavailable on a replica until the replica is promoted:

Certificate authority: The replica cannot provision new agents.
Orchestration: Tasks, plans, and Puppet runs can not be initiated from the replica. This includes running operations via the Agentless Catalog Executor.
Console: The console is not available on the replica, and classification changes cannot be made from the replica.

In a standard installation, when a Puppet run fails over, agents communicate with the replica instead of the primary server. In a large or extra-large installation with compilers, agents communicate with load balancers or compilers, which communicate with the primary server or replica.

What happens during failovers

Failover occurs when the replica takes over services usually performed by the primary server.

Failover is automatic — you don’t have to take action to activate the replica. With disaster recovery enabled, Puppet runs are directed first to the primary server. If the primary server is either fully or partially unreachable, runs are directed to the replica.

In partial failovers, Puppet runs can use the server, node classifier, or PuppetDB on the replica if those services aren’t reachable on the primary server. For example, if the primary server’s node classifier fails, but its Puppet Server is still running, agent runs use the Puppet Server on the primary server but fail over to the replica’s node classifier.

What works during failovers:

Scheduled Puppet runs
Catalog compilation
Viewing classification data using the node classifier API
Reporting and queries based on PuppetDB data

What doesn’t work during failovers:

Deploying new Puppet code
Editing node classifier data
Using the console
Certificate functionality, including provisioning new agents, revoking certificates, or running the puppet certificate command
Most CLI tools
Running Puppet tasks or plans through the orchestrator.

System and software requirements for disaster recovery

Your Puppet infrastructure must meet specific requirements in order to configure disaster recovery.


Component	Requirement
Operating system	All supported PE primary server platforms.
Software	You must use Code Manager so that code is deployed to both the primary server and the replica after you enable a replica. Code Manager also replicates the certificate authority state, as well as PE configuration files. Even if you have an alternate method for syncing your code across nodes, Code Manager must still be enabled. You must use the default PE node classifier so that disaster recovery classification can be applied to nodes. Orchestrator must be enabled so that it can perform PE maintenance and upgrade actions.
Replica	Must be an agent node that doesn’t have a specific function already. You can decommission a node, uninstall all puppet packages, and re-commission the node to be a replica. However, a compiler cannot perform two functions, for example, as a compiler and a replica. Must have the same hardware specifications and capabilities as your primary server. Must use the same operating system type and version as your primary server. Must have the same agent version as your primary server.
Firewall	Your replica must comply with the same port requirements as your primary server to ensure that the replica can operate as the primary server during failover. For details, see the firewall configuration requirements for your installation type.
Node names	You must use resolvable domain names when specifying node names for the primary server and replica.
RBAC tokens	You must have an admin RBAC token when running some `puppet infrastructure` commands, including `provision`, `enable`, and `forget`. You can generate a token using the `puppet-access` command. However, an RBAC token isn't required to promote a replica or to run the `enable_ha_failover` command.

Classification changes in disaster recovery installations

When you provision and enable a replica, the system makes a number of classification changes in order to manage disaster recovery.

Two infrastructure node groups are added in installations with disaster recovery. The PE HA Master node group includes your primary server and inherits from the PE Master node group. The PE HA Replica node group includes your replica and inherits from the PE Infrastructure node group.

Additional disaster recovery configuration is managed with these parameters:

Note: Apart from the parameters in the PE Agent and PE Infrastructure Agent node groups (manage_puppet_conf, server_list, pcp_broker_list, and primary_uris), all of these are system parameters that should not be manually modified. The PE Agent and PE Infrastructure Agent parameters are automatically updated based on the values you specify when you provision and enable a replica.

classifier_client_certname
Purpose: Specifies the name on the certificate used by the classifier.
Node group: PE Master
Class: puppet_enterprise::profile::master
DR-only parameter: No
Example with enabled replica: ["<PRIMARY_CERTNAME>","<REPLICA_CERTNAME>"]
Notes: Replica values are appended to the end of parameter when a replica is enabled.

classifier_host
Purpose: Specifies the certname of the node running the classifier service.
Node group: PE Master
Class: puppet_enterprise::profile::master
DR-only parameter: No
Example with enabled replica: ["<PRIMARY_CERTNAME>","<REPLICA_CERTNAME>"]
Notes: Replica values are appended to the end of parameter when a replica is enabled.

classifier_port
Purpose: Specifies the port used for communicating with the classifier service. Always 4433.
Node group: PE Master
Class: puppet_enterprise::profile::master
DR-only parameter: No
Example with enabled replica: [4433,4433]
Notes: Replica values are appended to the end of parameter when a replica is enabled.

ha_enabled_replicas
Purpose: Tracks replica nodes that are failover ready.
Node group: PE Infrastructure
Class: puppet_enterprise
DR-only parameter: Yes
Example with enabled replica: ["<REPLICA_CERTNAME>"]
Notes: Updated when you enable a replica.

manage_puppet_conf
Purpose: When true, specifies that the server_list setting is managed in puppet.conf.
Node group: PE Agent, PE Infrastructure Agent
Class: puppet_enterprise::profile::agent
DR-only parameter: No
Example with enabled replica: true

pcp_broker_list

Purpose

Specifies the list of Puppet Communications Protocol brokers that Puppet Execution Protocol agents contact, in order.

Node group

PE Agent, PE Infrastructure Agent

Class

puppet_enterprise::profile::agent

DR-only parameter

Example with enabled replica

PE Agent — ["<PRIMARY_CERTNAME>:8142,"<REPLICA_CERTNAME>:8142"] or in a large installation, ["<LOAD_BALANCER>:8142"]

PE Infrastructure Agent — ["<PRIMARY_CERTNAME>:8142","<REPLICA_CERTNAME>:8142"]

Notes

Infrastructure nodes must be configured to communicate directly with the primary in the PE Infrastructure Agent node group, or in a DR configuration, the primary and then the replica. In large installations with compilers, agents must be configured to communicate with the load balancers or compilers in the PE Agent node group.
When a replica is enabled, the replica is appended to the end of the list in the PE Infrastructure Agent group, and when not using a load balancer, it's appended to the list in PE Agent.
Some puppet infrastructure commands refer to this parameter as agent-server-urls, but those commands nonetheless manage the server_list parameter.

Important: Setting agents to communicate directly with the replica in order to use the replica as a compiler is not supported.

primary_uris

Purpose

Specifies the list of Puppet Server nodes hosting task files for download that Puppet Execution Protocol agents contact, in order.

Node group

PE Agent, PE Infrastructure Agent

Class

puppet_enterprise::profile::agent

DR-only parameter

Example with enabled replica

PE Agent — ["<PRIMARY_CERTNAME>:8140,"<REPLICA_CERTNAME>:8140"], or in a large installation, ["<LOAD_BALANCER>:8140"]

PE Infrastructure Agent — ["<PRIMARY_CERTNAME>:8140","<REPLICA_CERTNAME>:8140"]

Notes

Infrastructure nodes must be configured to communicate directly with the primary in the PE Infrastructure Agent node group, or in a DR configuration, the primary and then the replica. In large installations with compilers, agents must be configured to communicate with the load balancers or compilers in the PE Agent node group.
When a replica is enabled, the replica is appended to the end of the list in the PE Infrastructure Agent group, and when not using a load balancer, it's appended to the list in PE Agent.
Some puppet infrastructure commands refer to this parameter as agent-server-urls, but those commands nonetheless manage the server_list parameter.

Important: Setting agents to communicate directly with the replica in order to use the replica as a compiler is not supported.

provisioned_replicas
Purpose: Specifies the certname of replica to give access to the ca-data file sync repo.
Node group: PE HA Master
Class: puppet_enterprise::profile::master
DR-only parameter: Yes
Example with enabled replica: ["<REPLICA_CERTNAME>"]

puppetdb_host
Purpose: Specifies the certname of the node running the PuppetDB service.
Node group: PE Master
Class: puppet_enterprise::profile::master
DR-only parameter: No
Example with enabled replica: ["<PRIMARY_CERTNAME>","<REPLICA_CERTNAME>"]
Notes: Replica values are appended to the end of parameter when a replica is enabled.

puppetdb_port
Purpose: Specifies the port used for communicating with the PuppetDB service. Always 8081.
Node group: PE Master
Class: puppet_enterprise::profile::master
DR-only parameter: No
Example with enabled replica: [8081,8081]
Notes: Replica values are appended to the end of parameter when a replica is enabled.

replica_hostnames
Purpose: Specifies the certname of the replica to set up pglogical replication for non-PuppetDB databases.
Node group: PE HA Master
Class: puppet_enterprise::profile::database
DR-only parameter: Yes
Example with enabled replica: ["<REPLICA_CERTNAME>"]

replicating
Purpose: Specifies whether databases other than PuppetDB replicate data.
Node group: PE Infrastructure
Class: puppet_enterprise
DR-only parameter: Yes
Example with enabled replica: true
Notes: Used when provisioning a new replica.

replication_mode

Purpose

Sets replication type and direction on primary servers and replicas.

Node group

PE Master (none), HA Master (source)

Class

puppet_enterprise::profile::master

puppet_enterprise::profile::database

puppet_enterprise::profile::console

DR-only parameter

Yes (although "none" by default)

Example with enabled replica

PE Master — "none" (Present only in master profile.)

PE HA Master — "source" (Set automatically in the replica profile; no setting in the classifier in PE HA Replica.)

server_list

Purpose

Specifies the list of servers that agents contact, in order.

Node group

PE Agent, PE Infrastructure Agent

Class

puppet_enterprise::profile::agent

DR-only parameter

Example with enabled replica

PE Agent — ["<PRIMARY_CERTNAME>:8140","<REPLICA_CERTNAME>:8140"] or in a large installation, ["<LOAD_BALANCER>:8140"]

PE Infrastructure Agent —["<primary certname>:8140","<replica certname>:8140"]

Notes

Infrastructure nodes must be configured to communicate directly with the primary in the PE Infrastructure Agent node group, or in a DR configuration, the primary and then the replica. In large installations with compilers, agents must be configured to communicate with the load balancers or compilers in the PE Agent node group.
When a replica is enabled, the replica is appended to the end of the list in the PE Infrastructure Agent group, and when not using a load balancer, it's appended to the list in PE Agent.
Some puppet infrastructure commands refer to this parameter as agent-server-urls, but those commands nonetheless manage the server_list parameter.

Important: Setting agents to communicate directly with the replica in order to use the replica as a compiler is not supported.

sync_allowlist
Purpose: Specifies a list of nodes that the primary PuppetDB syncs with.
Node group: PE HA Master
Class: puppet_enterprise::profile::puppetdb
DR-only parameter: Yes
Example with enabled replica: ["<REPLICA_CERTNAME>"]
During upgrade, when primary is upgraded but replica hasn't been upgraded, [] to prevent syncing until upgrade is complete.

sync_peers
Purpose: Specifies a list of hashes that contain configuration data for syncing with a remote PuppetDB node. Includes the host, port, and sync interval.
Node group: PE HA Master
Class: puppet_enterprise::profile::puppetdb
DR-only parameter: Yes
Example with enabled replica: [{"host":"<REPLICA_CERTNAME>","port":8081,"sync_interval_minutes":<X>}]
During upgrade, when primary is upgraded but replica hasn't been upgraded, [] to prevent syncing until upgrade is complete.
Notes: Updated when you enable a replica.

Load balancer timeout in disaster recovery installations

Disaster recovery configuration uses timeouts to determine when to fail over to the replica. If the load balancer timeout is shorter than the server and agent timeout, connections from agents might be terminated during failover.

To avoid timeouts, set the timeout option for load balancers to four minutes or longer. This duration allows compilers enough time for required queries to PuppetDB and the node classifier service. You can set the load balancer timeout option using parameters in the haproxy or f5 modules.

Was this page helpful?

We’re sorry to hear that!
Please tell us why so we can help. Enter your feedback and email. This form is sent to the Puppet docs team. We ask for your email as we might contact you regarding your feedback. If you need help with the product itself, visit Puppet Support or ask in Puppet Community on Slack. Feedback:

Email Address:

To learn about how Puppet uses your personal information, visit our privacy policy.

If you leave us your email, we may contact you regarding your feedback. For more information on how Puppet uses your personal information, see our privacy policy.

See an issue? Please file a JIRA ticket in our [DOCUMENTATION] project.