- [upbeat electronic music]

- Hey everybody, so I'm Lee Hutchinson, Senior Technology Editor at ARS Technica, and we're gonna talk about AI again. So, as we saw with Ben's session, Benja's session, AI really is everywhere and it's dominating the news cycle right now. And we're talking specifically here about what are called large language models. Large language models are great at generating human sounding text and AI and games as we've seen, but interestingly enough, they're actually capable of generating code too. Just like with spoken language code has structure and syntax and with a big enough training set, those are things that large language models are excellent at doing. LLMs are currently able to solve about one third of assigned problems, and that number is only going to go up as models improve. To find out what the future holds, we're gonna talk to our guests here. Now before we do, just real quick, you guys have been great with the YouTube comments. If you, the last 10 minutes of this video will also be a comment section. So please, if you guys wanna have any questions answered, leave them in the YouTube chat. Now I'd like to intro our two guests. We have Katie Moussouris and Drew Long. Katie Moussouris is founder and CEO of Luta Security, and a pioneer and vulnerability disclosure and security research with over 20 years of cybersecurity experience, she led the first Bug bounty programs for Microsoft and the US government, and she is the co-author and co-editor of More Than One ISO Standard. She also serves on multiple government advisory groups on technology. Welcome Katie. And then we also have Drew. Drew is a senior fellow at Georgetown's Center for Security and Emerging Technology. He researches the intersection of AI, cybersecurity, and international relations at both the technical and the big picture scale. His areas of expertise span disinformation management, cyber defense, and AI infrastructure. He's testified about the broader implications of AI before Congress, and he's also a black hat and RSA presenter. Welcome Drew.

- Thanks for having me.

- All right, so let's jump into this here. I'm gonna ask both of you guys these questions so you each have a chance to respond. But I wanna kick off first with a very, very short anecdote and then ask if something like this has happened to you guys. I had occasion the other day I was writing a script as one does, and while I was writing that script, I came across a point where I needed to write a RegX. And ordinarily me writing a RegX would mean spending like three hours Googling and having like 48 stack overflow windows and like a RegX tester window, and I would just be banging my head against my desk until I got it right. But I had just gotten access to Bing chat and so I popped on there and I was like, "Can you make me a RegX that does what I needed it to do?" And within like three seconds, I had my regular expression. And I ran it by the smart folks on the Ars Technica Discord and asked like, "hey, does this really do what I want?" And it does. It was a good enough RegX, it didn't catch some edge cases, but it was absolutely good enough for the little script that I was writing. And so I wanted to open up by asking you guys have either of you, and we'll start with Katie, had an occasion yet to ask one of the generative models online to make you some code?

- Well, first of all, I think that part of the issue with the generative models is what they're trained on. And have you seen human written code? It's like trying to read through, you know, potentially the acts of, you know, the art of Shakespeare, but has never been through a spell checker. So I don't have AI write any of my code.

- Gotcha. How about you, Drew?

- Yeah, I think that the code that Katie's complaining about is the code that I've written that looks like the actual Shakespeare. But I haven't used it all that much either actually. I know several coders that I respect who do, the times when I've used it was trying to write malware or something like this to try just trying to basically understand what the level of threat is. And in some cases, I've never been overly impressed and been like, "wow, this thing can write novel malware from scratch." It's not even close to that. I've been impressed at times that it can recreate malware that it could find in a Google search. Like, okay, this is out there, and it's able to recreate it pretty well. And there are some tasks where I'm like, "oh, it's doing a good job here." Another task where it's been more like, I've been surprised that it's been flummoxed, One of our researchers spent some time trying to do some capture the flags and it struggled to do Caesar ciphers, which maybe one of the simplest types of, you wouldn't even call it encryption, but I'll say that simple task. And it struggled with that level. What I have used it for a little bit is navigating documentation, like when I need to spin up a cloud instance or something like this in some novel structure or setup that I haven't played with before, walking through all of the documentation online has been a pain in the butt, but I've asked ChatGPT, "how do I set this up? What should I?" And it's been able to give me a more concise description, but I'd point out that it's only able to do that because all that documentation already exists out in the world and it's read it. It's not gonna be able to do that for things that haven't already been written about by somebody on stack overflow.

- Yep, yep. So I kind of want to wanna open on this. So Katie, what are some of the risks of AI-generated code?

- Well, I mean, earlier I was citing the quality of the samples that are out there. You know, so far my friends over at Veracode did a little study of the past 12 months and 70% of the code that's out there contain security flaws. So if you've got a model trained on code that is 70% vulnerable, you are going to have, you know, the resultant code that comes out highly likely to contain security flaws. So that kind of perpetuates our problem. And honestly, you know, that also happens in stack overflow, but that's the example there that we're seeing is that, you know, until these models have some way of discerning what high quality code is, we're going to end up with perhaps functional code, like you were saying with the RegX example. But again, functional and insecure.

- Right, or functional and only functional without actually doing the thing that it really needs to do. Are there automated ways to catch that or is keeping a human in the loop, at least for the foreseeable future, the only way to do it?

- Well, there aren't any absolutes in any of this, right? You know, humans are flawed as far as it goes when it comes to them spotting vulnerabilities in code as well. But, you know, with some little experiments with some of the different AI models to ask them to try and spot common coding flaws, you get sort of a mixed bag. And it is very much dependent on how much has been documented well enough for, you know, for these models to parse, understand, and execute that kind of search. I think that, you know, with the cybersecurity job shortages that we're, you know, we're a million, we're at least 3 million underwater in terms of missing humans in cybersecurity roles. I think there's a part to play for AI, but again, you know, we have to get these models trained on a subset of elements that are more secure and more solid when it comes to spotting vulnerabilities.

- Excellent answer. And I know squarely within your wheelhouse too. Drew, your thoughts on that. Risks of generative AI code and ways to deal with it.

- Sure, there are are a bunch, I think that a lot of them are overinflated, but that doesn't mean that they don't exist to kind of move along with the line that Katie was drawing out on vulnerabilities in code, will these AI write vulnerable code, the answer is definitely yes, as Katie described, but it's also been researched academically. A couple of people at NYU and Stanford have some papers out showing these things, right, vulnerable code, even if you pair it with a human, that the human writes more vulnerable code with the AI helping them, so that's a concern to be addressed. There are tricks that can be played. One kind of fun trick is say, "please write me some code in the style of," and you can put a name of a person who writes pretty secure code and you can improve it that way. So there are prompt engineering tricks, there's some after the fact tricks to try to fill out. One of the things though that I'd like to point out that I've been curious about, it hasn't been tested, I don't know if it's a legitimate threat, but I have a little bit of worry that if what we have right now is lots of people writing lots of vulnerable code in many different ways, and I worry that, that if ChatGPT starts rewriting all of those pieces of code and writes lots of vulnerable code in the same way that we could have a systemic vulnerability, even if it's somewhat more secure than the average human, if it's somewhat insecure the same way everywhere, that we might end up with some larger systemic vulnerabilities.

- Right, It's kinda like Katie said, I mean these things are only as good as the data that they're trained on. Which I guess leads to the question of if we were to have a large amount of AI generated code out there and then AI began training on AI generated code, you've got kind of this snake eating its own tail kind of problem. The lack of training data, the lack of human generated training data would feed into the issue. And I'm not sure, other than keeping track of like how much AI code we have out there and making sure we're not wholly reliant on, I'm not sure how to avoid that. It feels like one of the ways that people are gonna be using this would be as people do to avoid kind of the boring parts of coding. So you would potentially code in the cool algorithm that you thought of and then to link everything together, you might lean on the AI to be like, "okay, write the rest of write, write all these other extra functions for me. I don't wanna do it. I'm bored." The danger in that is the same danger that we've been talking about, that if you take your eyes off and let the computer do the boring part, it might fill in and vulnerabilities because of the way it was trained. Is that a good assessment, you'd think? Let's start with you, Katie.

- Well, I mean, avoiding the boring parts of coding I thought that's what libraries were for, right? And so, so I think that, you know, I think that there's a point there to be made that programmers, human programmers, have been trying to abstract away the boring tasks of coding and the repetitive tasks and certainly trying to abstract away solving problems that have already been reliably solved in code before. So what I'm hoping for is that we see, you know, an AI experience that really puts the AI in pair programming, right? Where we've got something that's a little bit more akin to the coding spell checker, you know, as it were. So checking for correctness, correcting as you go, maybe helping you spot some very common coding errors before, you know, before you check in code, that kind of thing. Essentially enforcing, you know, enforcing coding security policies there. But yeah, I mean, I'm curious as to what, you know, what your thoughts are, Drew.

- No, I was gonna make the same point, I wrote it down in my notes to cover here and I haven't actually heard other people making it. That AI for coding is actually a lot like libraries on a couple of levels. One is that this community is used to already taking other people's work and incorporating it. And another ways way is that you don't need it that much. If it's a sort of task that's been done repeatedly and is useful for a lot of people, then maybe just have a library do it and then you can put a bunch of eyes on it and you can check it and have it done in a secure way. If it's such a valuable piece. then maybe you don't need the AI write it, just go and import the thing that is already done very well.

- So I wanna change gears just a teeny bit and ask you guys what are some ways in which LLMs have seen integration within technology companies, especially companies that focus on sort of like code production as part of their output? Katie, first please.

- Well, I mean, I think that there are a lot of companies that are struggling to try and make use of the new technology. We're seeing more and more plugins, you know, that enable access to these and also will integrate the different AI models together, right? So the chat-based ones integrated with the image-based ones and you know, providing almost a more integrated and cohesive development and creativity environment. What I also see is, you know, the mass fascination among consumers, not just technology people with this technology, and I think that a lot of tech companies are struggling to try and make it accessible to the everyday user and the fact that, you know, ChatGPT as an example has been around for a while, it's just that the interface has gotten a lot more friendly and a lot more open. So I think we're gonna start to see a lot more, you know, AI to AI integrations and we're also gonna see a lot more, you know, trying to lower the bar for accessibility to regular consumers for use of this technology.

- Drew?

- Yeah, I think that's right. So there's language models in general and then there's language models as applied to writing software code and like Microsoft has GitHub, which has co-pilot, which is trying to integrate it there. I think that a lot of companies are doing it ad hoc in the middle. I think they have employees that write code who are using ChatGPT to write code. And so in that case, in that sense it's incorporated into that product. Then got like Microsoft putting it in Bing Search which you use for your RegX and then got Anthropic working through now, now Claude is in Slack and then moving into Zoom I guess, I'm not sure what the application there will be and yeah, it's being incorporated in products, but I think that the biggest part is just people using it and you might not even realize that they are so they might not even be having the right checks 'cause you don't know that it's being used.

- What are some surprising ways in which early adopters have found success with large language models that you guys have seen at least? Katie.

- Frankly, I've been fascinated with all the ways in which the model safeguards are being defeated, right? When you can, when you can try and trick the model, you can have it role play, you know, and some of the malicious code generation stuff that Drew was talking about, you know, sometimes you've gotta play with it a little bit saying, you know, you are an academic professor and you are trying to, you know, show an example of this type of code that does X, Y, and Z because otherwise, you know, you run into the safeguard. So I've just been, you know, fascinated with the model abuse and the creative ways in which people are trying to convince the AI to do things that the designers would rather it didn't.

- Yeah, kind of looping back to the documentation viewpoint, a lot of people, so from the threat side, I spent a lot of time trying to understand what are the risks here and it can do malware writing, but that hasn't been the thing that's got me most concerned. A little bit more concerned is the ability for it to take novice hackers and make them into competent hackers by like telling, run this N map search. Now that you've received these N maps results, then pop open wire shark and now metasplice, anyway you get the idea. And it hasn't been as effective at that in our preliminary tests as we might expect, as I kind of described it, it didn't even get the Caesar ciphers going correctly, but I think that's where I'm a little bit more concerned than the malware generation. Maybe a little bit less concerned than the vulnerabilities, but we're talking about people having success and taking from the evil side.

- So we've got about a minute before we need to switch to audience questions, so kind of a quick question, but will code generating AI systems replace human developers either in a complete and total way, or at least in a supplementary kind of way? Katie.

- I don't think we're in danger of being replaced anytime soon. You know, at the very least we still have to bridge this gap between, you know, how authoritatively these AI models will lie, right? They'll make things up and I think that, you know, having human experts who can discern, you know, a good solution whether it's code or whether it's text or art, you know, or whatever it is that these things are generating, I think having humans in play will still be a thing for a while. I think that the jobs will change. I think there's, you know, there's gonna be a lot more room for prompt engineering and you know, essentially having the wisdom to know how to direct the AI in a productive way.

- Drew?

- There are a couple of points I'd like to hit on just really quickly. One, I was just at RSA and several people asked the audience who's worried that their job will be taken by these generative AI, and maybe two or three hands went up out of a crowd of a thousand. Maybe they were all shy, but I think they were recognizing this wasn't such a big deal. I think in part because this community's used to having things written by them and they know that there's this big gap of jobs, so they're not we there. The other thing I'd like to foot stomp, Katie just kind of mentioned earlier, but I think deserves more attention is that ChatGPT is blown on the scene and everybody feels this sense of exploding technology, but this technology has kind of been around for a while. ChatGPT used GPT 3.5, It wasn't even a a full technological step to get there. It was a usability step. And so the technology isn't exploding the way that it feels like it is. The usability is. That kinda changes your perspective on whether your job is sustainable or not.

- Great. So we're gonna move on to audience Q and A here. We have a question immediately from Chris about AI generative code, and this is an interesting one and this may take a second to answer. Chris asks, as with the monkey who took a selfie, who owns AI generative code when you use it in a project? Katie, let's go to you first.

- Well, I think that's still being sorted out. You know, we've already had with, you know, with GitHub's, you know, use of AI that has been built in for quite a, I think it's been a couple years now, but you know, there were already issues with licensing of where it was getting the code samples. So I think the previous panel talked a little bit about sourcing and licensing and having that traceability. But I think when we're talking about code, you know, we're definitely talking about an area that hasn't been fully explored from a legal perspective or litigated, so we shall see.

- I think some of that litigation is happening in other domains like images and things. And that'll probably set not exactly a precedent, but that'll give you a line to follow as those rulings come down.

- Well we had, we actually just had a Supreme Court ruling in the art world about fair use and that actually got a bit of a surprise ruling. It was an Andy Warhol print of the artist formerly known as Prince and whether or not it was an infringement on the copyright of the original photographer's work and it was ruled that it was an infringement and the reason it was an infringement was because it was being, you know, I'm not a lawyer, but essentially it was because it was being used for very similar purpose, right? For commercial purposes, for display, et cetera. So I think images or text or code, we are gonna see some interesting slices of the fair use argument.

- Yeah, definitely. I have another one, this is an interesting one from Jane and it's to you Katie, and Drew, you can answer too obviously, but Katie, what do you tell your clients who have cybersecurity concerns but feel like by not using AI they're being left behind?

- Most organizations are in what I call the cybersecurity 99%, as in they are not concerned about falling behind because of their lack of use of AI. They're concerned about falling behind cybersecurity-wise because they have trouble tracking their own assets and, you know, what is running on their systems. That is almost everybody out there. Just the sheer cost of maintaining viable, secure lockdown systems that face the internet has been a big enough challenge and it's one that, you know, every government faces and we see evidence of them not being able to keep up at the highest levels. And I'd say there are very few organizations who are at the sophistication level where they really are feeling the pain of not using all the tools available to them, including AI.

- Drew, I know you're in slightly a different position, but do you have an answer for that one too?

- Yeah, I would say that start by just understanding how much of your code in whatever you're producing is written by AI and how well your developers know the code. Is it just automating a task that they would be able to write very quickly and so they can look at it and know exactly what it says or is this building things that they only partly understand? So just get a sense for help. How much exposure do you have would be a starting point.

- Gotcha. This one is from Kim. It's slightly tangential what we're talking about, but I think it's also very important. Will we still need drivers rather employees if we're able to perfect the AI and autonomous driving vehicles and how close are we to that?

- Well, I know I'm not gonna fall asleep behind my wheel of any of the vehicles that I own anytime soon. And that's mostly because, you know, I have been a software developer and I've been a hacker and I've gotta say the force is a little bit stronger on the hacker side. So even if the code is well-written, you know, chances are there are ways to, you know, there will always be ways to compromise the system and make it do what it wasn't quite designed to do. So from my perspective, I don't care how good the code is, you know, until we're in a world where we have a code that is generated and can be verifiably secure, which we are not in that computer science model, you know, universe, that is not the one in which we live, I would very much say that I'm still gonna stay awake behind the wheel.

- I drive old cars and I know that Katie knows that you don't have to have an AI driven car in order for the cam bus to be accessed. But I kind of use, right, I use the self-driving cars kind of as an analogy for understanding that these AI job takeovers or something like that. I was one of the ones that fell prey and I think a lot of people did several years ago, maybe 10 or five years ago, thinking that self-driving cars were just around the corner, that this was basically a solved problem, we just had to get them out on the road, and they are not as effective as I had thought they were. They're not as safe in a variety of ways and they're kind of constrained in what kind of hardware they can use. They have to use actually small, very, very quick models which don't perform as well as some of the ones that we see in the benchmarks. Anyway, for all those reasons, I think that the failure of self-driving cars to be actualized in the real world maybe is a cautionary tale about setting our expectations too high for code writing systems as well.

- Gotcha. Okay, let me ask one more and then we'll wrap. And the last question again, one for each of you guys, what is the most functional use of a large language model that you guys have ever seen? Katie first.

- You know, I've gotta say I personally have seen it write functional spec based on a description of user rules and scenarios and trust boundaries. I have seen it render a functional spec faster than any human I have worked with in the coding world. So for that I consider that to be a huge victory and I consider it to be a huge victory because if you can get the functional spec right, you can outsource the coding to an AI model, a human, a little of both, and you're in pretty good shape. But I think that, you know, in terms of its application to technology and software, it's going to allow more people to create more new applications than ever before. And I think that's an exciting thing. That's an exciting development and one that we should nurture.

- Yeah, so I spent a lot of time working on disinformation and things of that nature, automated disinformation. So I got early access to GPT three before it came out for academic purposes to understand what some of the risks were if you were to, and basically the way that works is that GPT just sucks up all sorts of information, everything that everybody has written and if you prompted the right ways, you can get the worst of everything that's been written to be back out, I say like dredges the internet to get information and you can pull that back out. And it started to weigh on my soul a lot. I remember at one point I asked it if it's read everything that everyone has written and it can do the other way too. So I was like, if there was an author that I could speak on behalf of all of humanity and for the rich and the poor and the downtrodden, and it could speak with the eloquence of a poet, what would that author write and had it write several different poems for me. I remember that was one of the most uplifting experiences, one of the best uses I thought, like it could speak for everybody, 'cause it's read everybody. I wish that I had the poem to memorized that I could tell you. But yeah, that's been my favorite application.

- Were they good poems?

- Some of them were pretty good, yeah. Yeah.

- Excellent. Well, Katie and Drew, thank you guys both for being here today. That is all the time we have. Really appreciate your answers. And now we are going to return things to the studio here and I will pass things off to Ken Fisher.

- Thanks.

- Good panel, Mr. Lee.

- Thanks, Ken.

- Thank you. You know, I loved, Katie and Drew are brilliant. Drew is 100% right that the usability issue is why you're seeing this mass adoption now. But I feel like when I talk to people, they're already hitting a wall, you know? They are gonna have to learn prompt crafting, they're gonna have to get really deep in order to get it to do whatever. And Katie is, I think, I totally agree. You have to have a human involved for anything serious. Cybersecurity, landing an airplane, driving a car. So I feel like the panel hopefully put a lot of people at rest.

- Good. I hope so too.

- Today. So I want to thank my co-host Lee Hutchinson here also I wanna especially thank all of you who tuned in. It was a long set of panels, but hopefully you enjoyed them and learned a few things along the way. I have deep appreciation for all of our moderators and the panelists who took time out of their day to help make this be such an informative, excellent event. And last but not least, I want to thank all of you who left good questions in the YouTube chat to keep the conversation lively and relevant and we couldn't get to all of them, but the staff at Ars are gonna look through, comb through, and when we do our follow up reporting later this week and early next week, we'll try to address as many as we can. So thanks again for tuning in. Cheers everyone. [upbeat electronic music]