Get Puppet Enterprise First 10 nodes are free!
Try it now
Request a demo
Automate IT and infrastructure, manage complex workflows, and mitigate risk at scale.
Try the full-featured Puppet Enterprise for free on 10 nodes.
Puppet Comply Find and prevent compliance failures
Compliance EnforcementRemediate to stay in compliance
Continuous Delivery for Puppet Enterprise Build, test, and deploy infrastructure as code faster and easier
Content & Modules Pre-built scripts to automate common tasks
CentOS EOL Here’s how to secure your CentOS infrastructure – even after EOL.
Find thousands of component modules built by the community and guidance on using them in your own infrastructure.
Visit Puppet Forge >>
Open Source PuppetPerfect for individuals and small infrastructure
BoltAutomate tasks in orchestration workflows
See all open source projects >>
Contribute to open source projects >>
Open source has always moved fast. Today, it moves faster than ever, driven by both community demand and corporate interest. On this episode, Perforce’s Javier Perez and OSI’s Stefano Maffulli discuss the impact of recent license changes and the historical push-and-pull between consumers and providers in the world of open source.
PUPPET OPEN SOURCE PROJECTS
Javier Perez [0:19] Hello and welcome to today's episode of Pulling the Strings podcast, as always powered by Puppet. My name is Javier Perez. I'm chief open source evangelist and senior director of product management here at Perforce, part of the OpenLogic brand, and of course working very closely with the Puppet team. It was my turn to host this podcast from my friend Ben Ford. I know he does a great job, and I've been a guest before here on the podcast, so I'm happy to be part of that.
So before I introduce my guest here, let me give you a quick introduction. So today's open source, it obviously has changed. It's not the same as it was 20 years ago or 10 years ago, or even a couple of years ago. Recently, there's been a ton of discussion about the future of the various Linux distributions. We'll talk about that, the whole CentOS end of life and Red Hat announcements around the availability of their source code. There was also the big news about the HashiCorp changing their licenses, and of course what a better guest than our executive director of Open Source Initiative, Stefano Maffulli, that we have here. Stef, how are you doing?
Stefano Maffulli [1:39] I’m good, thank you. Glad to be back. It's always a pleasure.
Javier Perez [1:43] It’s great to have always conversations with you. And as I was saying, we wanted to talk about some of those news, but really instead of just talking about specific vendors, and we don't want to do really that, we want to talk about open source. So, how about have a good... Expand the conversation, step back, and talk about open source over the years, and especially what's happening now and the future of open source. And as I said before, what better guest than having here Stef, who runs a very important organization. The OSI, the Open Source Initiative. For people not that close to open source, well respected, everything that has to do with open source licenses, OSI, it's definitely the steward of the Open Source Definition, right Stef?
Stefano Maffulli [2:37] That’s exactly how we like to present ourselves. We are the stewards of the Open Source Definition. It's a very interesting term for me, not being a native English speaker, when I first... It took me a while to understand exactly what that means, and the way we interpret it at the organization is that we are the authority that define open source, but not because we have some sort of magic wand. Our authority comes from the organizations that stand behind the definition that we maintain for the community. So it's a set of principles that been established 25 years ago, and they keep on being maintained by us for the community, by the community. We have a set of committees, and we are fairly open organization. Our board gets renewed every year with public elections. And individual members of the organizations, but also affiliate organizations, which are other nonprofits that decide to join the OSI, those are the ones who elect the board. And that's where our authority for defining open source comes from, from the people who just support us.
Javier Perez [3:54] Excellent. And obviously there are some very interesting conversations now around AI. We'll talk about that in just a minute. But OSI has been a big part of those conversations. And you just mentioned 25 years. So OSI is celebrating 25 years of existence, right?
Stefano Maffulli [4:11] Indeed. Yeah. It's a big year this year.
Javier Perez [4:14] Big year. And you already mentioned some of the work that you do at OSI, but would you mind just sharing a little bit more of the role of OSI over these 25 years? And if you don't mind just also telling us a little bit of the differences. Obviously things have changed in these 25 years.
Stefano Maffulli [4:34] Of course. Of course. Yeah, things have changed. 25 years ago, the Open Source Initiative was established as a volunteer-run organization, and the mandate has always been from the beginning to maintain this list of ten principles, and check licenses, so legal documents that come with the software, to make sure that the licenses match and obey to those ten principles. We've been doing this for many years. As I joined the organization as its first executive director, we started adding and professionalizing the organization. So we're not only a volunteer-run any more, we have professional staff. Now we have established three main working programs. One is on the legal front, we call it the legal program, we keep on working on our bread and butter, maintaining that list of licenses, making sure that it's coherent, making sure that it's complete. We've done quite a lot of research and projects this year just to make sure that what we publish on the website is accurate. And we've also started and taken over widening the scope of another project called ClearlyDefined. And it's in this program, ClearlyDefined, is a distributed crowdsourced database of information about licensing that are attached to packages, like packages in PyPI or npm that often don't have very clear information about the licensing. The licensing information is either... Often while building these software builds or materials or during the procurement processes, the legal departments find inconsistencies. So large corporations especially, like Microsoft, Bloomberg, SAP, they're collectively sharing their knowledge and dropping this corrected information into a shared database. And that's ClearlyDefined. It's part of the legal programs. The other program that we have established and widened this year is on the policy front, because a lot of new laws and regulations are coming from United States, from Europe, and from other parts of the world. We are aware of the changes in the importance and increased importance of open source in everyday life, and politicians have noticed that too. So they're writing laws, they're writing other documents, legal documents, and we want to make sure that the policymakers don't write policies that prevent open source from being part of the ecosystem. So we are educating the public on that front. And a third program is called outreach and advocacy. On this front, we are educating the public in general. So we have blogs, we're launching a new website next week, and definitely before meeting in Bilbao, and we're running webinars and conferences. One of this is the research on AI that we're doing, where we're trying to identify what the definition of open source AI is.
Javier Perez [7:48] A lot of things to talk about that, and you mentioned so many other things that I was like, "All right, we can talk..." Those also could be completely different episodes here, different sessions, just... I guess you didn't mention it, but it’s obviously the reference there to the Cyber Resilience Act, right?
Stefano Maffulli [8:04] Correct.
Javier Perez [8:04] And all the concerns that we have around exceptions or no exceptions to open source, and... Well, that's a topic for another session, another podcast. But let's talk about a couple of cases. And as I said, this is not about talking companies or decisions, but it's more about talking the open source. But let me just for everyone give a brief summary on the whole situation on the Linux front. I've written a couple of blog posts, articles. There's one article also about to be published in the next few days. Had a chance to do a webinar with the representative of Rocky Linux and AlmaLinux, so I've been very much following this step by step. So, here is a quick summary for everyone. CentOS has been around for a long time, CentOS Linux, basically more than ten years, actually a lot more. The big announcement on December 8th 2020 by the community and by Red Hat, which was pretty much driving that CentOS community, said, "We're going to change what we're doing here. We're going to focus on a new Linux distribution that is called CentOS Stream. CentOS will become end of life. CentOS version 8, which at the time was the latest, is going to become end of life in a year, and we're not going to move the end of life for our long-term support," which just for everyone, long-term support meaning still updating for critical bugs or high severity vulnerabilities, there are still updates. "We're not going to touch CentOS 7 end of life, but CentOS 8, it's going to be next year." That basically reduced what all the users, all the consumers of this Linux distribution, that they were used to long-term support for ten years, it changed from ten to two years for CentOS 8. And that created a lot of, obviously, comments and controversy. And one of the issues or one of the situations there, Stef, is every community is different. They can make their own decisions, they can change the lifecycle, their support on their releases. That's normal. That actually happens on almost every community. It's very rarely where it's always the same release lifecycle and there are no changes. This happens a lot. This happened to get a lot more notice because it's Linux and because there were thousands of deployments, or there's still thousands of deployments out there. But I think the first thing that I want to comment on is that situation. These things change. Probably what make it also different here is when a community, when an open source community, is very much driven by a single vendor. And I think that's also part of the situation here, right?
Stefano Maffulli [10:55] Well, I'm sure that plays a factor, the fact that there is one single vendor changing the mind. I tend to think that this is not very different from what would happen with any product, with any brand, where conditions change. I cannot think of immediate examples, but I would imagine if, let's say, BMW changed the way they were selling cars... And in fact actually, wasn't Mercedes or BMW that started selling subscriptions to activate the heated seats in their cars, instead of... Trying to imitate somewhat the model by Tesla, by which every Tesla is built with the same hardware components, but some features are activated based on monthly fees that are paid. Or actually, maybe a even better fit example is when Adobe changed the way that they sold their packages. Instead of selling the licenses for the package that you download and you use that version that you bought, they started selling monthly subscriptions to access their software. I think we're seeing the same reaction, and the reaction is to changing the social contract, basically, that was agreed upon. I became a customer of company X, and I thought that the conditions under which I was operating were going to be always the same, and now they've changed. Of course there is pushback.
Javier Perez [12:36] And frankly, it's software. It's not an issue about open source. It's all software, commercial, proprietary, or open source. And then the next piece here, is when something like this happens, then in open source we have forks, right?
Stefano Maffulli [12:52] Of course. That's the power... Yeah, exactly. That's the extra power that Adobe doesn't have.
Javier Perez [12:57] That’s the power of open source. And anyone, including organizations, large enterprises, can go behind a fork, and has happened many times. One recent successful one is OpenSearch, which is a fork of Elasticsearch, and it's happening now with the situation with HashiCorp. I'll touch on that in just a second, but it's happening. And going back to CentOS, all of a sudden we have all the distributions. We have Rocky Linux, we have AlmaLinux, we have a few others. And the priority here is that they were not really taking the source, it was more about taking it from RHEL, from the commercial version, from Red Hat Enterprise Linux. So just recently Red Hat said, "If you want the source code, go to CentOS Stream. We no longer make it publicly available to everyone. To comply with GPL, the source code is available to our customers." And I don't want get into a legal discussion here, Stef, you probably don't want either, but I think everyone agreed, and I've been in enough conversations and forums, is that there's no issue here in terms of licensing. It was just a decision by Red Hat. They're not in the business of making it really easier for others to build a business. But technically, or at least in terms of open source licenses, nothing changed. And by the way, we're talking about GPL, one of the licenses that has been around for a long, long time, well established, well understood. Or would that be really well understood?
Stefano Maffulli [14:37] That would be my big question, I think. But if there are lawyers who understand the GPL, I'm assuming that they work for Red Hat. So I know that there are some... Again, I don't want to get into the legal argument, because honestly, I'm not a lawyer, and I think there are better qualified people who can give the answer, and they have commented on whether they think that the GPL is being violated or not. But on top of that, I think that the more important issue is that reaction to a change in a social contract. There was an agreement, a group of developers had enjoyed the... They would never become customers of Red Hat, and they were enjoying having a testing environment, for example, available through CentOS, that would imitate the Red Hat Enterprise…
Javier Perez [15:29] Compatible, right? The same apps working.
Stefano Maffulli [15:31] Exactly. So for testing purposes, and there were some convenience that other companies were also enjoying, some flexibility in licensing terms. The market will need to adapt, I'm guessing, and probably there will be... Maybe someone will throw a lawsuit and see what will happen. What I think is, fundamentally it's an issue with business practices more than a legal argument, for me. From the very intellectual point of view, it's an interesting and challenging proposition for me, or an investigation, to do some research and see what the impact will be. I would really love to see a Harvard Business Review or another Sloan analyzing this five years from now, and see what the impact has been for the business of Red Hat and IBM.
Javier Perez [16:24] Yeah. That will be very interesting. Just for everyone, some in the audience that are not familiar when we talk about GPL, let me just mention briefly, GPL has been key to the success of Linux, and why Linux is so popular. Linus Torvalds, basically the creator of Linux, basically said that, that the GPL has been one of the defining factors and the success of Linux, because it really enforce this to give back, to keep everything open source. So the restriction, or the copyleft there, it really means that you want what you... When you take source code with GPL license, you have to continue keep that on GPL license, keeping it in open source. And when you are commercializing software, and there are tools to identify all these open source licenses, and when there's a GPL license detected, that's a risk, because you not necessarily can commercialize that. You have to remain on the open source license. Just at a high level, for some of the audience that might not be familiar with. There's so many licenses, and if you go to the OSI website, you get the definitions there, right? I mean, you can also look into open source public resources.
Stefano Maffulli [17:46] There are more than 100.
Javier Perez [17:47] More than 100? Hundreds, probably. What's the difference there, Stef, for the ones that OSI has reviewed? Is that an official approval process there, or just a recommendation?
Stefano Maffulli [18:02] It’s an official approval, and it's very thorough. And we have recently also made it more clear of what the process entails. Basically in a nutshell, the person who wants to have a license reviewed sends it to a mailing list, a community of people. Anybody can really join that mailing list, and can give comments. The most meaningful ones are from lawyers. And it's a very thorough process where every single line of that license gets checked against the ten points of the Open Source Definition. And after that review process, within a couple of months usually, there is a recommendation from the license committee chair to the Open Source Initiatives board, and the board votes based on those recommendations, whether that license is approved or not.
Javier Perez [18:55] OSI has such a well earned reputation here that it's always important, if you want to define your own open source license. By the way, anyone can define their own open source license. I always make the joke, "Look in there, and then you'll find some very colorful licenses.”
Stefano Maffulli [19:15] Oh yeah, there is variety. Right, sure. There is variety, but the ones that have been approved by us, and are the only ones called open source approved licenses, those ones all respect the ten points of the Open Source Definition. And you can group them... Before you were talking about the GPL, you can basically group them into two large chunks. There is copyleft licenses and non-copyleft licenses. The copyleft licenses are the ones that you were describing before, where the permissions that all of the open source license give, all of those permissions are persistent and need to stay with the software following evolution. You pass it down as... And it's a very clever hack that was established 40 years ago. At this point, GNU, the originator, is going to celebrate 40 years in a month. And that project started this concept. It wanted to twist the concept of copyright from being an exclusive... The author of the software, to being a patrimony of humanity, if you want. So using the same mechanism of a copyright, the license cleverly says, "I am the author. I give you all the rights that I have as an author. I give them to you at the condition that you keep doing the same, forever, in the future." And that stickiness, that persistence gave Linux the kernel such pure popularity, because not only a community of developers banded together, sharing the same principles, and agreed to have that innovation continuing under the same conditions, but they also legally signed a binding agreement basically with that license, with the GPL license, to keep on doing the same thing forever, which is something that doesn't happen with licenses that the industry keeps on calling permissive, which is a misnomer. They don't have this persistency. So in other projects, like what happened in... Since we're talking a little bit about the history, at one point when commercialization of open source software became more prominent and more popular, a lot of groups from the Apache Software Foundation and other groups, they started thinking, "We don't need to be persistent anyway. We don't have to have this persistency mechanism embedded into our legal agreements. We want to leave the possibility for corporations to appropriate the software and not distribute back necessarily." We know that they will, at least for the pieces that matter, and so it's going to be enough if we have licenses that are more lax in their prescriptions, like the Apache license, the MIT, BSD, and these started to become a little bit more normal.
Javier Perez [22:23] MIT as well. Many…
Stefano Maffulli [22:25] Exactly.
Javier Perez [22:26] And this is great, and this is a good segue, because when I talk about the open source licenses you just mentioned, the copyleft ones, and then the more permissive, you can do whatever you want, and including create proprietary source code from that. And that also has been a key for the success of open source. There are millions of libraries, packages out there that are open source. And by the way, that reminds me also, and it's one of the initiatives with OSI, is to review those packages and make sure that they have a proper license and license file. I always like to tell people, if you find it in npm or in GitHub and it doesn't have a license file, it's not open source. It's not officially open source. That means that copyrights are still by the author of that source code. But things have evolved. And one thing that you can talk a lot about, Stef, is in this day and age, it is more about the corporations, not just technology companies, but we also see financial services, banks, and telecommunications and other industries very involved in open source, because they see the benefits of the innovation, and keep innovating and having more and latest technologies, it's all open source. But we have all these large enterprises being part of this, and they also build businesses behind that. So it's in their best interest to make sure that these open source projects, some of them very, very large, with contributions from hundreds or even thousands of people, that they have a roadmap, that they're going in a direction that also doesn't affect their own businesses. And where I'm going here is, we just recently had the case a couple of weeks ago of one more company, not the first one, but one more company, HashiCorp, that they basically decided to change their licenses to a non-open source license. I don't know if they even got involved with OSI stuff, but definitely that... What they call the BSL license, it's not an open source license.
Stefano Maffulli [24:36] The BUSL, I need to correct. I know everyone calls it the business software license, the BSL. The BSL on the open source license list is the Boost Software License, which is an approved license. But the BUSL is the Business Software License, according to SPDX, the standard for every…
Javier Perez [25:00] Oh, very important correction. Yes.
Stefano Maffulli [25:03] But HashiCorp is the latest in a series of corporations who... I think they all come from the same root issue. And the root issue, in my mind, I haven't... This is my opinion, is these license changes come from the same... Ten years ago, a bunch of new corporations have been established, and my impression is that they were all choosing the same licenses, very open source licenses, with a set of permissions that did not require that reciprocity for others. And as with many startups, venture capitals, their suggestion is go for growth and think about business models later. Now, ten years after, many of these startups have grown, they have used... The popularity of their software comes from having a large chunk of its components being available for free, with permissive licenses and with licenses that are open source and that permit proprietarization. And now time comes, either close to an API or right after an IPO, they need to remunerate the investors. And that challenges their approach, like, "Wait a second, we're giving away our crown jewels and permitting our competitors to appropriate that software, without even contributing back. Wait a second, this is a problem." This conversation about what happens to businesses when... You remember before, I was saying at one point, Apache Software Foundation and other foundations, the Linux Foundation, the OpenStack Foundation, they were formed around software that mandated the use of the Apache Software License, because their members, the corporations contributing the code, they didn't want to prevent proprietarization of the code. And these groups, these organizations, nonprofits, they had governance that would put incentives to corporations to keep contributing back, because of the governance systems. That works in foundations. That model works in foundations. When one single corporation, from MariaDB, Elastic, or HashiCorp, CockroachDB, any of these corporations, one single company putting a lot of resources and developers' time, giving away their software, without having a form of mandated reciprocity like the copyleft principles, and without even thinking about how I'm going to make money down the road, once my project, my product, become popular, then this conversation just was waiting to happen, in my mind. It was just waiting to happen, and now it exploded.
Javier Perez [28:10] Yeah. And part of the criticism, just to mention all the points of view, as you said, Stef, from the business perspective, is like, "Well, I need to make more money on this, and I have to protect my business as well. I'm investing so much on this, and then a cloud provider is using it and actually making a lot of money with all the work that I just did." So that creates changes on their licenses.
Stefano Maffulli [28:36] Absolutely. I mean, if there is one thing that I... So for a lot of these conversations, the bad choices of licenses, I blame the corporations themselves and their VCs, because they haven't given... I don't think they have thought this through. But at the same time, I have to admit that the open source communities did not really have very good answers for the issue that you just mentioned, that loophole that allows, for example, Google to use the Linux kernel. And in theory... I mean, it's the wrong example because Google is a big contributor to the Linux kernel, but I could name other organizations that make tremendous modifications to other packages that are copylefted, or even not copylefted, and they don't contribute back, like the issue that Elastic has mentioned in the past many times. Or Mongo also. Amazon could take our code and run it and don't give back. I think that we don't have a good answer as open source community to say, "You could use this other license that works exactly for the use case." The copyleft has not evolved, in other words. The concept of copyleft is not evolved enough to cover squarely the software as a service space or the cloud space in general.
Javier Perez [30:03] Look, part of the criticism that these companies get, because they change the open source license to a non-open source license, and then the community, the open source community, criticizes that. And the consequence is that you stop getting users. And you might say, "Well, it's not the paying users, it's just the free users that I'm losing." But that also affects reputation, affects the size of the community or how active the community is. So there are other repercussions that I'm sure these companies looked into before they make those decisions, but we have the power of many here, and if people decide just to fork and go in a different direction, that could be the end on some of those projects. And it has happened, but at the same time, it's technology, and very few very successful open source projects have lasted that long. Obviously Linux kernel, Linux is one of them. I can think on Apache HTTP, another one that has been around for a long time. But for the most part, there's always the evolution and there's the next big thing. So it's interesting that.. And then obviously the other criticism that is worth mentioning is, the community say, "Okay, you want to protect your investment, but you're been using all this time all other open source software. So where are the other guys protecting their investment?" And what we talk about now, we talk a lot about security and the dependencies, and we're not talking about one or two open source projects, we're talking about hundreds, thousands of dependencies on open source. So some will say, "Yeah, you want to protect your business, but at the same time, you're using all this other free stuff." So I think they're valid points. It's a situation that is just happening now. I guess my advice to consumers, to users, is do your homework. I mean, everything can change at any time, but make sure that you go with projects that have a diverse community, that have multiple sponsors, that probably are part of the foundation, an open source foundation, that they are made by the OSI open source license recommendations, and the definition of open source. There's a lot of also to talk about... When we talk about new technologies, things change. And I know you've been one of the leaders here in having the conversation around AI. And obviously with the popularity of all these generative AI tools out there, ChatGPT and others, there's confusion or people misusing the concept of open source. It's great that they make available their models, but there's a little bit more of that. So can you comment on that, Stef? I can't think on a better person to provide a guidance on this.
Stefano Maffulli [32:54] Definitely I can provide a little bit of... I can share the way we were thinking, and how we're approaching finding a solution for... Looking for what is an open source AI. We started two years ago when I joined the Open Source Initiative immediately, as our first big take-on was trying to understand the space. Because as you were saying, generative AI starts to include and introduce a lot of new challenges, like what copyright is the code that is spit out by Copilot, for example? As you're typing, you get a new something that comes out from that code. Whose copyright is that? Whose responsibility is it to make sure that, if there are bugs or issues with that code... It's like the problem with self-driving cars. Who's responsible if they cause accidents? It's an open question for the legal community, and we wanted to understand that. But also we want to understand better what's happening with the data themselves, after the training for machine learning systems has happened. What happens to the models? What kind of legal frameworks apply to those models in different legislations like in Europe, United States, or China? We need to understand the space a little bit better. So we started last year, we're continuing this year. This year, we are looking into finding the shared values for what an open source machine learning AI system looks like. So we started with a meeting in San Francisco in June. We're continuing this month. We're going to start a series of webinars from a bunch of people who have submitted their ideas about how to look into this issue of “open” in this space. And at All Things Open on October 17th, we're going to have a meeting of the community to discuss a very first draft of the Open Source Definition for AI machine learning.
Javier Perez [35:02] Excellent. Excellent.
Stefano Maffulli [35:03] Yeah. It's going to be the start of a wider conversation we're going to have in 2024, but definitely it's a process that is ongoing.
Javier Perez [35:10] Definitely OSI has been, or it continues to be an important voice on all things around licensing.
Stefano Maffulli [35:17] Yeah, it's not just us. This is a global conversation, multi-stakeholder, we have invited to join a drafting group made of corporations or organizations from the open space, like Mozilla, Creative Commons. We have invited venture capitals, we have invited businesses of different size and startups, we have invited researchers from academia, and also groups from the civil liberties type of place, with like the EFF and ACLU. So to make sure that we cover as many bases as possible, have all the possible stakeholders involved into identifying the principles of open, and also starting to highlight where the challenges are for society as a whole. Because down the line, the Open Source Definition is used by not only corporations to influence and inform their business decisions and their business models, but it's also used by governments and organizations like the UNICEF and United Nations to decide, for example, on their development principles, how to invest and how to stimulate local economies. They use the Open Source Definition. They are looking, these groups, they're looking for similar guidance to drive investments in AI and drive policies in AI. And we want to provide the base.
Javier Perez [36:41] Very interesting. Very, very interesting. I wrote an article about it. I think you're going to agree with me, Stef. For me, the whole point here for me is, we're figuring out a few things here, and it's the licensing, but it's also the ethics on AI, and the collaboration. One thing that it's a fact is, we're evolving our tooling. And when you have better tools, you do a better job, or maybe a faster job. I'm using ChatGPT constantly nowadays, and it could be just to help me with an email, to do a little bit of quick research on something. People are getting better tools, and that's a good thing. And by the way, talking about open source, that means also that it's going to be... It's helpful for many people to now write code, get some help on writing the code, writing better code as well. And let's stop ChatGPT or the other generative AI tools for a second. What will developers... When I was a developer, what were we using? Well, we were kind of referencing to other code, referencing to other open source code out there, some recommendations from Stack Overflow. So there's not a lot differentthere, right? You always look for help. Now it's just a lot easier with better tools. But we need to figure out, exactly... Security, obviously, the other important point. Licensing, ethics, and security. There are articles, there are publications where people talk, "Well, now it's also a lot easier to create some malicious code." Well, it is, just like it's easier to prevent that. And there are tools there now for teachers to see if the essay that they assigned to their students, if it was AI generated or not. So there's always the technology.
Stefano Maffulli [38:37] Lots of tooling, lots of tooling, lots of changes in the tooling front. I find that conversation... I mean, the tooling part is changing so much and so quickly that it's interesting. And I use a lot of these tools myself. So many people forget that both you and I are non-native English speakers, and having a system that helps you even drafting something, put bullet points, is really helpful in many circumstances.
Javier Perez [39:07] Absolutely. Hey, just a quick one. Llama is not open source, right?
Stefano Maffulli [39:14] No. No, but wait a second. Wait a second. Let me complete that sentence. Llama is not open source, and neither are most of the models that are distributed out there. They always come with some limitation or not. The reason why I wanted to have a conversation, a deep conversation with the whole community about open source AI is because there are so many moving parts. There are so many pieces that make an AI system, and we don't know which ones... There is no shared agreement. It's not "we." There is no shared agreement about which of these components and pieces are important and relevant, in order to exercise the basic rights that the Open Source Definition grants. Like if I want to study, modify, run, and copy, redistribute, monetize an AI system, what do I need as a very, very basic set of components? What's the minimum set that I need to exercise those rights? There is no shared agreement. And therefore Llama is not open source, because definitely there are pieces in the license, but other ones... We don't know, if at a glance you cannot tell.
Javier Perez [40:26] Yeah. I mean, it's cool that it's free. You can use some of the models, but if you don't even know where that coming from, what data was used to train…
Stefano Maffulli [40:36] That’s one of the questions. Provenance and transparency for example. Yeah.
Javier Perez [40:39] Then yeah, it's free, but it's not really fully open source, and you are limited by what it's giving you from the API only, or you're limited by the specific training that you don't don't know if they have bias or not. So those are the type of situations that I think we're definitely working towards that.
Stefano Maffulli [40:57] Yeah. In one of the webinars that we're going to have next month, they're going to start next month, is research from a group in the Netherlands. And they have evaluated all of the models that are freely available against a dozen or more indicators for their openness. And none of them comes out with a full score. Just to give an example. And they were looking at things like, is there a scientific paper describing the whole process, everything, how they built it? Is the data sufficiently described? And then the code itself, the training code and inference code and models themselves. So it's a quite interesting talk.
Javier Perez [41:38] Always great to have a conversation with you, Stef. We can talk for hours about open source. And as a closing remark, I think the future, it's great. There's more open source software out there. There are more tools. There's evolution and everything, and we just discussed just a few things there around licensing and around what's happening in AI. It continues to grow. I have a slide that I use on some of my presentations where I just keep the track of how many packages on npm and on NuGet and on Maven. It keeps growing. I keep adding it. Every time I go and update that slide, there are more and more and more. So, great future around open source software. Good to see that also in universities, the university students are now becoming part of some of the communities, contributing, open sourcing their code. So great to see this fascinating open source space moving along. And with their challenges, with their criticism and sometimes good news, sometimes bad news. We were going to say a few more things, and we didn't even touch about the topic on security, which is obviously another long conversation, but I think we're doing really well. I mean, I think... I keep telling everyone, most vulnerabilities, they disclose when people already provided the fix. And really, the challenge is just to keep up with the updates, not that there are not fixes for vulnerabilities. And there are so many more vulnerabilities now. Well, that's a good thing that people are disclosing, as opposed to not letting anyone know about the vulnerabilities. So, many other topics of conversation, but really appreciate that you join us, Stef, and once again, thanks. And everyone, thanks for listening. Thanks for being here on the Pulling the Strings podcast, powered by Puppet. Thank you.
Stefano Maffulli [43:30] Thank you.
Check out our open source projects, including Puppet Bolt, Open Source Puppet, and the Puppet Development Kit.