March 25, 2022

Introduction to Puppet Content Usage Telemetry

Products & Services
How to & Use Cases

Let's rip off the bandaid and get the bad news out there first: we've rolled out telemetry for Puppet content. Read on to find out why I think that's actually good news for you, how you can see exactly what data it collects, and how to make sure it never runs if your corporate policy doesn't allow it. Plus, this blog was recently updated to included the latest updates on Puppet telemetry. 

Table of Contents: 

 

Why Puppet Added Telemetry

I'll start out with the problems we are trying to solve. You've probably seen me in our Slack bugging people about what modules they use, and how they choose between modules on the Forge. My hardest job right now is to decide on which modules to focus development efforts⁠—and let me tell you, currently it feels a lot like reading tea leaves! We have support tickets filed by customers, Jira tickets and pull requests filed by users, and ongoing efforts from opinionated people on Slack. This means that our feedback mechanisms are heavily weighted towards the people willing to make the most amount of noise, so to speak. You shouldn't have to do that for your needs to be heard.

That sounds like a me problem, and it totally is. But if I can't effectively marshal development time for the modules you actually use, then it becomes a you problem too.

The other problem we are addressing is how users like yourself choose among modules on the Forge. Right now the most relevant ranking factor is download count. But that number is heavily skewed by mirroring robots and CI pipelines and even simply the age of the module itself. In the end, new modules⁠—even if they're high quality⁠—end up being drowned out by the existing heavy hitters, and that makes our ecosystem appear to stagnate. Three modules on the Puppet Forge first page of the "most relevant modules" list haven't had a release in more than five years! Ultimately, we understand that it's harder for you to find high-quality modules than it should be.

We're addressing these issues with a telemetry client bundled in with Puppetserver as of version 7.5.0. If you opt in, then once a week it will send us some information about the public Forge content that you're using in your infrastructure.

What Data Will We Collect With Telemetry? 

I'm sure the biggest question on your mind right now is: what data are we collecting, and how identifiable is it? That's perfectly reasonable; we've seen story after story in the news about companies being quite unscrupulous with their data collection and retention.

I want to assure you of three things;

  • You can see all the data collected before you even choose to enable the system.
  • You are in full control of which data your Puppet Servers do or do not submit.
  • The data we collect is not associated with individuals and is fully aggregated before anyone can access it.

As a side benefit of the design, you can even use the client to gather useful information about the content usage of your infrastructure⁠—even if you choose not to share it with us.

How Content Usage Telemetry Works

When the client is gathering information, it cross-references against content published on the Forge and will ignore any modules you've developed internally. The most it looks at your own code is to see which public Forge modules are used by your profiles or internal modules. But don't take our word for it; if you feel so inclined then you can see exactly what is collected by reading through the source for each metric it runs.

If you trust the tool to tell you what it will collect, you can actually ask it directly. Running the command puppetserver dropsonde list in the terminal will describe all the loaded metrics plugins and what they do. Note that depending on your $PATH, you may have to use the full path, just like any other time you run puppetserver commands. That would make the full command /opt/puppetlabs/bin/puppetserver dropsonde list and other commands in this post will follow that pattern.

Then to see exactly what data is collected, run puppetserver dropsonde preview. It's rendered in human readable form, but every bit of data that makes it to our telemetry pipeline is represented there. (Hint: use --format=json if you want to use this data for your own tooling.)

$ puppetserver dropsonde preview
Updating module cache...
...
                      Puppet Telemetry Report Preview
                      ===============================

Dropsonde::Metrics::Modules
-------------------------------
This group of metrics exports name & version information about the public modules installed in all environments, ignoring private modules.
- modules: List of modules in all environments.
    {:name=>"concat", :slug=>"puppetlabs-concat", :version=>"7.1.1"}
    {:name=>"tomcat", :slug=>"puppetlabs-tomcat", :version=>"1.2.0"}
    {:name=>"archive", :slug=>"puppet-archive", :version=>"6.0.1"}
    {:name=>"stdlib", :slug=>"puppetlabs-stdlib", :version=>"8.1.0"}
    {:name=>"postgresql", :slug=>"puppetlabs-postgresql", :version=>"6.10.1"}
    {:name=>"concat", :slug=>"puppetlabs-concat", :version=>"6.4.0"}
[...]
Dropsonde::Metrics::Environments
-------------------------------
This group of metrics gathers information about environments.
- environment_count: The number of environments
    3

Site ID:
257da32bc8d2441fa0dfa25ba6e3c17c65922fea46d3c074db7758dc1831dbfb09c41d032803e2a518c4970dbcb1f3d7db8d451e22ebf9920d4ab7c67eebc150

If you'd like to omit any of the metrics listed in the report, you can use the Puppet module to add them to the disable list.

Related content >> Check out some more technical developer blogs about Puppet telemetry.

7.16.0 Content Usage Telemetry Update (Opt-Out)

We are serious about understanding and meeting your content needs, and we want to make sure we are investing in the right places that turn out high-demand content. In our December blog post, we announced that we would be switching to informed opt-out in Puppet 8. However, with the release of Puppet 8 being pushed out further than expected, we have made the decision to move to opt-out in our 7.16.0 release in April so that we can improve your module ecosystem with data-driven development that much sooner.

If you have an existing Puppetserver installation, upgrading will not unexpectedly switch on the telemetry underneath you. If you'd like to opt-in and help improve all of our data-driven development efforts – Puppet, Vox Pupuli, anyone who uses the database – then you will need to opt-in intentionally.

If you opt in by 11 April, we’ll send you a Puppet beanie. We’d love to see opt-ins come through before we switch over to opt-out. Your privacy concerns are important to us, so we’ve made sure that you can see the data collected before you enable the system and you can control which data your Puppet Servers submit. The data is not associated with individuals and is fully aggregated before anyone can access it.

New: Content Usage Dashboard

Now onto our exciting announcement! We are excited to share this epic dashboard that is now available on Vox Populi. Here you can see the aggregate usage rates which will help us help you by focusing our efforts on developing the content that is most used by you. As you can see, there's not yet a statistically significant amount of data in the database, but as reports come in over time, this board will become more and more useful.

A screenshot of content usage dashboard in Puppet.
Another visual dashboard of content usage telemetry in Puppet.

 

How to Submit Usage Data

Start out by double checking the data it will send with puppetserver dropsonde preview in your terminal. If you're okay with the data it collects, then enable reporting, either by installing the module and classifying your primary server, or manually by editing the /etc/puppetlabs/puppetserver/conf.d/puppetserver.conf file on your primary server and adding or updating the following clause. Don't forget to restart puppetserver afterwards.


 
dropsonde: {
    enabled: true
}

Then finally validate your work by submitting a report with puppetserver dropsonde submit. If it prints out a URL, then go claim your beanie!

Get Started With Puppet Enterprise

Not using Puppet Enterprise yet? Sign up for your free trial to get started today.

START MY TRIAL

 

Learn More

This blog was originally published in two parts on December 21, 2021 and March 25, 2022 and has since been consolidated.