Splunk Case Study

Industry

Technology - Data Analytics

Customer Environment

Splunk Cloud, hosted on AWS

Challenges

To scale their cloud environment at the rate customers’ businesses demand, the Splunk® Cloud operations team needed to move to a deterministic, configuration-driven automation model that removed inconsistency and drift. This ensured a more consistent operating model and also improved the end-user customer experience.

Results

  • More efficient management of customer environments with greater consistency
  • Ability to quickly scale and deploy changes across the entire fleet regardless of size • Provisioning of new customer environments in minutes instead of days

Re-thinking the Industry-old Provisioning Approach to Cloud Environments

Splunk Inc., the company that helps organizations ask questions, get answers, take actions and achieve business outcomes from their data, began rapidly growing their cloud environment and welcoming new customers to Splunk Cloud, a cloud service that enables customers to quickly leverage Splunk capabilities without having to managethe underlying infrastructure or software. With a growing customer base, Splunk wanted a more deterministic way of managing configurations that didn’t leave the environment outside of the norms or golden configurations.

Splunk started with agentless provisioning tools, which have been the industry standard. However, the company realized it was critical to move to an automated process to better manage the company’s growing customer environments.

“To serve our customers better, we needed to have environments under constant management, rather than being managed on an ad hoc basis by provisioning tools. Most of the work happens after provisioning, and once that is solved, the provisioning problem becomes much smaller,” said Chris Vervais, Director Site Reliability Engineering at Splunk.

Scaling Automation

With the move to Puppet Enterprise, the Splunk team was able to efficiently manage more customer environments, as well as cut down provisioning time from days to minutes. In fact, Splunk rolled out this global change to its customers in this environment in a couple of hours, whereas before it would have taken several weeks of planning and a huge chunk of company resources.

With this shift, Splunk quickly scaled to thousands of nodes, delivering positive outcomes, such as smaller maintenance windows and faster scale for its customers. By automating its customer cloud environments, Splunk cut down on error rates associated with rote, repetitive tasks with a two‐person pull request model.

“Puppet helped us achieve automation at scale by managing the environment, preventing drift, and making changes easy to deploy across the entire fleet regardless of size. It keeps the environment consistent and in the intended state,” said Chris.

    "We want to make sure we scale efficiently as we scale our customer base— with Puppet we can do that. We can focus on the larger picture, and drive more scalability. We’re not mired down in toil work doing rote, repetitive, error-prone tasks. We’re focused on both the high-value problems we solve at Splunk and we solve for our customers."

    Chris Vervais, Director, Site Reliability Engineering at Splunk

    Creating Deterministic, Idempotent Change

    Beyond the initial goal of scalability, Splunk also wanted to provide additional value to its customers through deterministic change. With Puppet Enterprise, Splunk has been able to effectively deliver change to environments, which decreases mean time to repair (MTTR) on a fleet‐wide basis. In fact, it now takes minutes to resolve configuration changes whereas the change could have taken hours previously.

    “When you have mission critical use cases you just can’t let hours go by. Automating for common error scenarios really reinforces how this service can be used to address mission‐critical customer issues,” said Chris.

    With Puppet Enterprise, environments are more consistent and predictable. For instance, when Splunk introduced change across its cloud environments it no longer had to test for a dozen different scenarios. Splunk has since reallocated its resources to focus less on updating its customer environments manually and instead has used this time to work on more high‐value issues, such as transparency and compliance for regulatory controls.

    “We want to make sure we scale efficiently as we scale our customer base,” said Chris. “With Puppet we can do that. We can focus on the larger picture, drive more scalability. We’re not mired down doing rote, repetitive, error‐prone work. We’re focused on both the high‐value problems we solve for Splunk and for our customers.”

      “When you have mission critical use cases you just can’t let hours go by. Automating for common error scenarios really reinforces how this service can be used to address mission-critical customer issues.”

      Chris Vervais, Director, Site Reliability Engineering at Splunk

      Top outcomes from using Puppet

      • Scaled faster to thousands of nodes while ensuring consistency
      • across environments
      • Rolled out a global change to all customers in a couple of hours, as opposed to several weeks
      • Cut down on mean time to repair (MTTR)

       

      See for yourself what Puppet Enterprise can do for you.

      TRY PUPPET ENTERPRISE