Podcast

"If I Told You, I'd Have to Kill You:" Puppet in Federal Environments

In this podcast, Eric talks with Bryan Belanger from Fervid about working with Puppet in highly regulated compute environments. As a consultant for US Government agencies, Bryan's been working to get "Authority to Operate" for governmental cloud services, to speed up time to delivery and reduce compliance risk. The Puppet modules for STIG hardening, especially for Windows servers, let them build a baseline from existing systems and enforce that across the environment.

We talk about public sector automation, about the Forge, about Cloud, and about... Canada!

Want to see how Puppet can help you automate compliance in your own environment?

TRY PUPPET FOR COMPLIANCE

00:00:38EricHey everyone! Welcome to another edition of the Puppet podcast. My name is Eric Sorenson. I'm a product manager here at Puppet and I have here in the studio with me today Bryan Belanger. Hi Bryan.

00:00:30BryanHow's it going, Eric?

00:00:32EricGood. Did I say your name correctly?

00:00:33BryanYou did say it correctly. You know, Belanger - it's like the Americanized bastardized version of it. I tell people it's like it's like Smith in Canada, my ancestors are from you know the great white north of Canada. If you go in Canada, all the Canadian listeners: Belanger. So give a shout out to them, but it is Belanger since coming to the US.

00:01:01EricExcellent. I wouldn't be surprised if they actually call it Belanger-eh in Canada.

00:01:12BryanWhere's our drum roll... that's the best I could do.

00:01:13EricNo Foley here today. Well, intros aside and terrible attempts at comedy aside we're here today to talk about some work that you've been doing with Puppet and some other related tools. I'm really excited to talk to you about some of the work that's been going on, particularly in the federal and security space. Can you talk a little bit about what your company does and kind of what you've been working on lately.

00:01:39BryanSure. Until recently, our company's name just changed. But we worked heavily in - well there was a need for our customer to move to cloud. In this case it was a federal space and there's a lot of concerns obviously you know whenever you move something as enormous as the federal government anywhere it takes a lot of time and effort, but security is a big thing. It's you know it's like those knee jerk reflexive reaction. So you know big solution to a lot of problems you know on the outside it's the cloud versus what we're spending internally - even though it's internal dollars, it's millions. We needed to save money, we needed to make things faster, we needed to get visibility but you know it's frustrating. There's customers you know who want to deliver you know I mean this time I mean internal customers to the federal government they want to deliver and they just can't. Cloud is going to be able to open up a lot of that you know saving money that was a byproduct there was no customer. It's hard to get customer visibility and a lot of these things but one of the big things is how do we get to the cloud. So the big thing in the federal government is authority to operate and the security wants to know hey that sounds great and all that but what we're going to need is we need a way to make things audible, trackable, you know, how do we make sure our stuff is. So there's just a general idea that if you're in what's called an ATO environment that everything is locked down to a certain extent which is acceptable you go to the great big cloud you know it's kind of scary. So we're trying to address a problem that is just in generally accepted internally for a while but is actually kind of like an elf in the room. In this case it's with Windows security. Windows has you know there are things called group policy. It does what supposed to do. But the big issue that security has is audit tracking so if you say you create a group policy rule to enforce this particular security thing it's going to do it. The problem is it's going to do it all behind the scenes. You just assume it's working well. What we needed is a way to say you know this event happened, this drift happened, and now you have a report, now it's audible, now you can even launch an investigation. We didn't have that before. There was actually a second byproduct. Well I get to that second. So what we needed is in the Windows space, we needed a way to enforce DOD STIG. DOD, kind of the Cadillac of you know security standard. Everybody wants to follow the DOD and some.

00:04:30EricIs that the STIG? Not from Top Gear, but the STIG.

00:04:37BryanYes. So we would go in and you know our company developed a series of modules that enforce you know like 90 percent of it and that's where it began. So now we have a way, you know, if you are in the federal space and you want to move to the cloud and you likely have some IIS in house, you now have everything needed to meet you know. Usually your ATO submission whatever you want to call it, you now have your authority to operate. Another byproduct, you know, which kind of happened second hand but almost every company I should say almost everyone that does it they have their own flavor of security. So yeah, DOD is nice but we have you know our customized version security. Everything we did was generally written in Puppet resources. In this case they were - trying to find the word but - you can reverse engineer, so if you install Puppet on your Windows machine you say give me back all of this particular rule. You can reverse engineer what you're having so I can go in to your environment, install Puppet, install these modules, I can reverse engineer exactly what you have into a nice Puppet manifest I say I can pick it up I can move it to cloud I can stand up a brand new domain controller in my own VPC in the cloud. So those are the two big benefits we came out of it.

00:06:12EricThat does sound like a core Puppet use case. I mean that sort of hits on a lot of the strengths of Puppet's architecture as the ability to take those resources and present them in infrastructure as code just by discovering what's out there. If you have types and providers that use the instances method inside of them you can run Puppet resource against a running system and get back that text description of what the resources are that are available and how they're configured. So as you said you can just take that and apply it to a new system and have a high degree of assurance that you're introducing repeatability and you're sort of correcting drift as it happens. Sounds pretty awesome. Did you have to go in and write custom types inside of those or were you able to do most that stuff with the built in types that are in Puppet.

00:07:07BryanWell, we had to write a lot of custom types. There's just no ifs, ands or buts about it. Yeah I like to say like yeah we can use Puppet to find types, or um, we are able to adopt that. But we had to do it in a lot of most of our custom we got we got really good at Ruby to say about we are really good at Ruby. But you know we're proud of the work, we're seeing adaptability, we actually hope more people pick it up. We've run across you know at least a couple service delivery providers that are planning on adopting in their own federal organizations. But you know I'd like to see it commoditized. I'd like to people just say pick it up and use it. There's no reason not to break a few things you know please please do testing. But but yeah yeah go ahead use it. It is one thing that's going to close down the layers of the onion right away. You've immediately made your environment, whatever it is, much safer.

00:08:07EricSo were you able to put those modules up on the Puppet Forge?

00:08:10BryanYep they've been out there. Yeah, I think, I want to say about a year and a half two years.

00:08:14EricI'm sure we'll put a link in the show notes at the bottom here, but can you tell the listeners what name to search for on the Forge so they can find them easily.

00:08:22BryanIt will be under Fervid and there's gonna be one that's called 'Windows hardening' it is not out there yet. It'll be out there in the next two days.

00:08:31EricExcellent. That's really cool. Forge is a great resource and hopefully we can get some more broad usage of that kind of code. The reusability of it is just a huge advantage as well as you know just having that crowd sourced kind of QA and testing and bug fixes and getting an active community around it is one of the things that I think Puppet does really really well in our community ecosystem. Can you talk about what - I know sometimes there's some sensitivity around government stuff - but can you talk about what the customer was, like what department it was and kind of what their main use case was for the servers that they're running? And if we have to redact this part and it ends up being just like 30 minutes of silence, you'll know why.

00:09:21BryanI would love to share the customer. I can't and unfortunately there's various reasons for that.

00:09:29EricYeah, no problem. Just figured I'd ask the question. And were you using the modules in sort of a standalone mode or did you have them integrated into a larger Puppet Enterprise infrastructure or how did you work with, you mentioned the reporting and audit logs that came out of it. Can you talk a little bit about that stuff outside of the modules themselves and how that fit into a larger Puppet architecture?

00:09:54BryanYeah. One of the biggest things that our customer in the government or anything regulatory need is they need to audit reports. And the best way to do it, you know, we saw is through you know the enterprise so as always had the enterprise mind to be able to get your reports in a GUI that you can see you know you keep it tracked. There was these grandiose plans of you know integrating with other third party tools you know hopefully that'll happen someday hasn't happened yet but at least right now you have the ability to say something happened, it happened at this time, who was on there doing what when. And I believe, that was just a need that needed to be filled. Yeah it's completely open, completely open license, and it's intended to be used with Puppet Enterprise.

00:10:43EricAwesome. Have you explored any of the other or were we able to integrate for this or maybe for other customers some of the capabilities in some of the newer Puppet Enterprise releases such as Bolt and tasks and task plans. Have you gotten into that kind of realm at all?

00:10:58BryanYeah same customer I talked a little bit of the story earlier. So they have kind of like this half pregnant situation. Typically you have a cloud provider which is being marketed as. But what the situation was there is this rule that the cloud provider in this case was going to manage the OS and below and then you guys manage everything OS and up and it wasn't a good setup. It caused a lot of issues. So one day we walk in and security is a big thing. This case they're a Red Hat customer you know you need to run your basic yum keeps everything patched at the OS level. That was on us. So walk in one morning you can't do anything the yum's broken. So it's like hey provider how about you know fix our stuff and I don't know why but they're like well no, what you can do is you can run these three commands on every like twelve hundred one thousand somewhere north of 1000 nodes you have. You know do the back of the math calculation we did this a little earlier it's where we spend it everybody gets on the computer and they log in one at a time to every single box. They can go ahead and fix this over 3, 4 days. I mean that was their answer.

00:12:28EricWow. Seems like there ought to be a better way.

00:12:30BryanYou think. You know, like, at the very least like you guys log in and fix it. But no, that's not how things are done in this case. Customer satisfaction I guess was not their thing. So the question is you know the operation like what do we do. I think we have a solution here. We took those roughly we took those three commands. We created a Bolt task for them and our job went down to like ten minutes. Something like that, 4 days the 10 minutes. It was a nice tackle thing when Bolt was early on. But it was an early success. Immediate impact. And we've done similar things since then, but that really saved a lot of headache and heartache.

00:13:14EricThat's pretty substantial time reduction if you can do it in 10 minutes and then spend the rest of the three days playing solitaire. That's in good shape. I haven't had too much experience working in government agencies, but I assume that's how that works.

00:13:29BryanI'm going to shrug my shoulders.

00:13:30EricWell, cool yeah. I'm a huge fan of Bolt. I mean I think it's - we did a podcast a month or so ago that was all about Bolt and that came out really good and you know I was involved in the project and we were just starting out and I still kind of keep a finger in there but it's not my primary day job anymore. And occasionally it just shocks me when I go away for a month or two and I'm working on something else and I come back and Bolt has gained all of these amazing capabilities in that intervening couple of months. It's just the project's moving really fast and it's just expanding in the amount of ecosystem and the kinds of tasks that people have written for it as well as the core capabilities of the tool. It's just really been kind of a runaway success. It just warms my heart to see that stuff out in the world actually solving people's problems.

00:14:23BryanYeah I'm not sure the impetus with it but it was something that was needed, without me even realizing it. But you saw the impact immediately. I mean Puppet was the best in the space. There's certain things that it didn't do well or couldn't do. I mean I would remember the old days, I and other providers we would spend hours writing you know Ruby stuff to do onetime things or would run all these execs. Execs are like fingernails across a chalkboard for me. So it was nice to do this and what I saw, the impact in some of the organizations I've been in, is you know I put on two big advantages the big one I saw right away is I'm able to create things that are parameter-ized. You know it's a semi formal language you know I know you can power shell things you can, you can write Linux shell scripts et cetera et cetera. But this allows you to formally have a parameter-ized strong check if you're using plans way of doing things. I'm able to take this, I'm able to handle it to some of the less technically astute operations team and they can start sysadmin-ing they don't have to learn the Linux stuff or the Windows stuff they get there and they're able to operate right away. And of course like I said earlier is what you know now with the Puppet Enterprise you know we now have auditability. So I have real pieces of code that act like real code that is we're able to hand off auditors - once again particularly in the regulatory space. It solved a big problem. I didn't see being solved before.

00:16:10EricThat's awesome. Yeah I think that type system is so powerful. It's one of the - ya know, Bolt kind of is independent from Puppet in a lot of ways but it's also closely related to Puppet. So if there's lessons that you learned, if there's techniques or are parts of the language syntax that you're used to from using Puppet you can translate those over into Bolt. And I think the type system is one of those examples where you can almost opt into it and you can just have very lightweight kind of type checking or you can get really pretty pretty intense with it and say like if you have it's a great example too of being able to hand something off to a NOC team or kind of a help desk kind of team and make sure that for each of the parameters that feed into a task or a plan that they, you know, if it's a service there's only a few things that you can do to a service. You can get its status, you can stop it, you can start it, you can reload it. And having those actions be well-defined inside of the type system means that if somebody tries to do something crazy to a service it just errors out even before it tries to run it's not even going to execute that command because it'll fail type checking before Bolt even executes the thing and I think that's that it provides the guardrails that lets those tasks be reusable and be shareable more broadly than just me running things out of my home directory which is I think kind of where people started out solving that problem.

00:17:41BryanI think was open also open up and help us see more of this is. There's a lot of really good enterprise software out there that until I think Bolt came out which is not able to be installed and managed in the old way of doing this. To me this is going to open up the ability to automate these more complex platforms, these more complex installs. I think we're going to see bigger visibility across infrastructure as a whole with plans. You know, I've seen some of the some of the POC stuff I've seen out there, you know, pass a few parameters now manage the network and the thing. I mean Puppet tried to address this, but I think Bolt is going to knock that door down. So I think it's really just beginning.

00:18:28EricYeah I'm pretty excited about it too. There's lots of good stuff going on. Let's shift gears for a minute. We were talking about the cloud migration a little bit and I'm wondering if there are cloud or containerized kinds of infrastructure that you're also responsible for managing and how you're able to take some of the, you know, we mentioned taking some of the system descriptions for on-prem systems and applying them to new VPCs in the cloud. Are you also managing cloud resources themselves using any of the Puppet toolchain?

00:19:02BryanNo. I mean that's the short answer is no. I expect it's going to happen with future engagements but past engagements it really hasn't. We did a lot of POC stuff. But unfortunately I don't have any great stories around, you know, except getting some legacy applications to the cloud. That's what we've been focused.

00:19:28EricI see. So it's more about my migrating the workloads over rather than actually managing the cloud infrastructure itself.

00:19:33BryanThat's correct. Yep.

00:19:36EricThat's that's perfectly all right. It's actually a pretty common story that we've heard. We're working on some new projects you may or may not have heard of that attempt to address some of those problems and try to bring some of the lessons from classical or traditional configuration management into some of these more cloud-native arenas for managing Kubernetes applications and cloud-native infrastructure. The project that I've been working on for the past few months called Lyra is a pretty interesting tool in that regard. It's just kind of had some of the concepts from Puppet. So again we bring over the type system, there's a Go Hiera implementation so if you have a investment or an existing Hiera hierarchy and yaml files the Go Hiera implementation can use those in a compatible way with the Ruby one. But the goal is really about orchestrating the different parts of a cloud-native application deployment rather than doing ongoing configuration management on systems because one of the paradigm shifts in cloud native is that the idea of like running an agent on a system and making changes to it live is kind of not really applicable anymore. It's kind of a different different mind shift where we're running immutable infrastructure and if I want to make a change I'll just make a new one upstream and let my continuous delivery pipeline bring that out into production.

00:21:01BryanYeah I know it's what's being seen by customers. We don't want agents running more and more we're becoming comfortable with the pets cattle model which I'm assuming people listening to this are familiar with.

00:21:12EricI think that's a square on the Puppet bingo card to actually reference to cattle not pets.

00:21:19BryanSo it's not like a drinking game where we take a shot when...

00:21:23EricYou can play it - if you're playing along at home you probably can. Here in Portland it's 2:30 in the afternoon so it's a little early to get into that.

00:21:32BryanOK so I was you know I was asked some things about Lyra ya know when it was being talked about and you know everyone's kind of familiar with you know what's the most popular tool in the space I'm kind of wondering you know you know I don't know if this can be talked about or not what Lyra's approach is. I mean I know what I don't like about the other tool but I do like about the other tool I'm hoping Lyra will address all sorts of things but I was wondering what where Lyra's trying to fit into the space.

00:22:01EricYeah I mean one of the main goals is that we want to be we want to enable reuse of existing content ecosystems. I think there's a lot of great work that people have done in application deployment on the Kubernetes with respect to Helm. It's kind of the tool that everybody's converged on and there's a rich library kind of like the Puppet Forge of Helm charts that describe how to put applications out on to Kubernetes as far as resource provisioning and kind of that initial setup. I think Terraform is a great tool. There's a lot of rich content out there that people have built either for their site specifically or they've built modules up out to Terraform and shared them on the Terraform registry. We want to enable reuse of those things without forcing people to rewrite stuff they've already done into a different syntax just to make use of Lyra itself. So the idea with Lyra is more that you'd be able to string those pieces together in a workflow rather than take over the role of being a Herraform or being a helm. So if your application requires like creating some infrastructure, deploying apps on to it, and then like sending some notifications you know a stopping a PagerDuty maintenance or sending a Slack or ChatOps notification once the deployment's all done we see Lyra as being the thing that links those pieces together and allows you to introduce some control flow and some logic and bind these different technologies together in a common language rather than trying to take the place of any one of those things that's already out there.

00:23:33BryanAlso sounds like Terraform will be possible integration point over there.

00:23:37EricAbsolutely.

00:23:37BryanOkay, interesting.

00:23:39EricBut still really early we're still working on it but yeah that's why I was kind of asking the question too. I'm always trying to find people that have problems that we can maybe help solve because as I said it's a it's only been out for a month really in a usable form and I'm just you know like kind of rattling a cup around trying to find people to try my thing out and let me know if they like it or not so.

00:24:07BryanWell, everyone hates TS state file.

00:25:07BryanYeah. So the last three to six months have been fairly busy. I wish I could say yeah there's this shiny new key out there that I want to grab. I haven't been very closely you know watching Puppet's new products. And I know they're trying out new things. One thing I had been excited about for a while was the insight product. And it looks like it's going to go to some changes now. But what I had seen over and over again you know like in places I've worked before or been part of before is there's typically a strategic offering. Most of what I've seen in Puppet has been you know tactical and I understand why, but I'm hoping that this product moves forward. I see it as a possible strategic play. It's nice to be able to go in and transform companies, you know, yes you can add some automation here and I was there and they'd be better off for it. You know one reason we share in the community is because I want people to adopt it. We're all better off for it, I don't want to reinvent the wheels but you know I like the idea of being able to measure DevOps. And I'm hoping that allow real transformation to take place. So that's one of the things I've been watching closely, I'm talking to people today about that. I don't know if you have any other thoughts or insight about that.

00:26:30EricYeah, sure. I mean I think we're one of the things that we're really interested in is like just it's just like you said about making visible what's going on across the organization. And we think about Puppet as kind of like starting out and you know our heart is really with the person on the keyboard making changes and our tools are very squarely focused generally on making their lives easier. Yvonne our CEO has this phrase about eliminating soul-crushing work and that's kind of like where like I said where our heart is. But we know from the last ten years of working on this that just making one person more powerful and eliminating their job isn't really sufficient. You've got to spread that love across to the team that the person is working on and ultimately across the organization. And the broader you go with that, you also have to raise the elevation in which you're working and provide tools and visualizations that work not just for the person that's making the change but for the team and for that larger organization. One of the things that I'm really excited about with respect to our products recently has been the CD for PE tool which does provide that kind of visualization about what's going to change when you know what's going to happen as a consequence of this change. I think it's really pretty pretty powerful and it lets people feel more safe about making changes because they can more easily predict what's going to happen and the visualizations and kind of that audit log plus the predictive capabilities of it that say hey when when this change goes to production these are the nodes they're going to be affected and here's what's going to happen to them. I think that sort of up levels the conversation a little bit and lets people who are outside of the core team that's really in the guts of Puppet everyday, lets them make change with confidence. And we want to continue that and kind of continue to expand that focus, broaden out that focus and keep movement upwards in the organization and up the stack so that, like you said, you have a more comprehensive idea of how your organization is doing overall, not just how a particular server is doing or how your Puppet infrastructure is doing. So that's definitely an area that we're really interested in looking out over the next months and years.

00:28:56BryanYeah. No, it'll be nice to have. [inaudible] Yeah there's lack of visibility, just in the data center and I know where everyone's going is most people have multiple place in multiple providers. They need a way to glue it together and they need to do something more than just deliver applications in a CD manner. So yeah I've been able to look at it. I expect, I believe my next engagement will be touching on that so I'm excited to see what it's about.

00:29:31EricSo CD for PE is a tool for creating testing pipelines to get code out into production. But the actual tests that you perform in each of those stages is sort of left up to you. There are some templates and some other tools that are out there. I'm curious what kind of tests you run across your modules. Are you using any of the tools that are out in the community ecosystem like rspec and Beaker and serverspec and our new Litmus. I'm not sure if you had a chance to look at it but we have a new open source testing tool called Litmus. Can you talk a little bit about your kind of methodology or your philosophy of testing and what tools you use for testing Puppet code?

00:30:15BryanMy background is, I'm actually from the development side which is maybe, I seem to be in the minority here. Most of the people seem to come from the administration side. To me it's a natural fit but you know I already had years of leading teams and I've seen it over and over again where things would you know in development side you're expected to test. It's just the way it is. So when I saw that it became generally available for Puppet it became big. We had an issue which allowed me to bring it in, which I hate talking about but we had a you know one of the developers made a mistake. If we had had rspec it would have saved us a big problem. What we had is it was one of the modules on the Forge. I forgot exactly which one it was, but it handled ssh. We had a rule in there that basically says you know purge all the rules unless they're explicitly defined and that wasn't brought over correctly. If we had had rspec testing that would have been you know a five minute test and we wouldn't have literally locked ourselves out of the servers. You know it's just one of those stories. Fortunately, everything was OK. There's a back door, but it was scary for a long while. So rspec testing that's the waterline. I want basically every resource at least you know ensure or not insure anything critical. I mean maybe I'm going into a dialogue about how to test but we did that with our STIG modules. I believe every resource is tested and I'm hoping as that as it evolves in the user space that they add respective rspec tests to it. So I guess is that the other thing is is you know as a general rule you know from [inaudible] in the space once again because everyone's like what's the right level you know. Do I test every parameter? Do I test every particular thing that we can and can't do with rspec? I mean that's, of course the answer's no. But my recommendation you know when I do with the juniors is make sure everything exists or doesn't exist test what you think is critical. To me, testing is an evolutionary thing. What I say is I don't want you making the same mistake twice. Let's take our SSH general thing. It's never going to happen again because there is, in this case, it's once over on the control repo. Once over says make sure that these two parameters and these values are in place or you stop right now. It's going to save you. It's an investment, you're putting in a little bit more effort to make sure everything runs smoothly. So how does it evolve. You're not going to do it perfectly. You know no one is and you know I hear like people get upset about bugs you know over time I think it's just a wrong mindset. What I don't want to see is I don't wanna see a bug happen twice. So that's how you're testing suite gets larger. A bug happens. Number one you create a test that shows you the correct state. You run it and then it should fail again. If it doesn't, there's something wrong with your tests. Then you fix it. And if you've become a - I want to see you when I see a professional software engineer and I like to see this a Puppet space is you should get to the point where you're comfortable writing your tests first and then writing the code. You have a double check and you're going to see things run a lot smoother going forward.

00:34:03EricI think that's one of the key - we mentioned infrastructure as code earlier and that's one of those key things that is a characteristic of a good system that allows you to represent infrastructure as code. It's not just is it text, can it be checked into version control, but can it be tested. Can you make assertions about what the code is going to do without actually running it and bringing that test driven development methodology into infrastructure code. It's one of the things that Puppet I think led the way on a long time ago and it's still like such such a core part of the experience of using it and writing it and running it in production is like getting that high degree of confidence before it actually goes out into the world that it's gonna do the right thing.

00:34:46BryanSo that's it, rspec was a minimum. We did our acceptance testing. I've done a fair amount of Beaker in Linux. I had trouble doing it in Windows. My expectation and I'm pretty excited about Litmus is I'm hoping that they ironed out some of the difficulties users have had with Beaker. So a solid acceptance testing there'll be the full scope and allow us to deliver and test cleanly. So that's, it's just it's it's a whole big, I guess what I was excited about. I'm excited about the promise of Litmus and I hope it's being realized.

00:36:31BryanI'm just glad to be part of the Puppet sphere. Thank you for having me on. I appreciate it and I'm excited for Puppet.

00:36:39EricRight on. Thanks for being on it. Safe travels back home.

00:36:41BryanThank you.