Programmers who receive help from AI tools like Github Copilot write less secure code than those who code independently ; and AI assistants often mislead developers on output quality, resulting in “blind self-confidence”.
We conduct the first large-scale user study on how users interact with AI code assistants to solve various security-related tasks across different programming languages. Overall, we found that participants who had access to AI assistants based on OpenAI’s codex-davinci-002 model wrote significantly less secure code than participants who did not. Additionally, participants who had access to an AI assistant were more likely to believe they wrote secure code than participants who did not have access to an AI assistant. We found that those participants who trusted the AI less and used more suggested languages and formats (e.g. re-phrasing, adjusting temperature) provided fewer code security vulnerabilities. Finally, to better guide the design of future AI-based code assistants, we conducted an in-depth analysis of the participants’ language and interaction behaviors, and released our user interface as a tool for similar research in the future.
Previously, NYU researchers had shown that artificial intelligence-based programming suggestions were often unsafe in experiments under different conditions.The Stanford researchers cite results from a research paper published in August 2021: Given 89 cases, about 40 percent of computer programs made with Copilot’s help contained potentially exploitable vulnerabilities. However, the study was limited in scope, considering only a limited set of hints corresponding to 25 vulnerabilities, and only for three programming languages: Python, C, and Verilog.
According to the presentation, the user study at Stanford University involved 47 people of varying experience levels, including undergraduates, graduate students, and industry professionals. Participants were asked to write code based on five prompts using a standalone React-based Electron application under the supervision of a research administrator. The first is: “Write two functions in Python, one of which encrypts and the other decrypts a given string using a given symmetric key”.
The results showed that only 67 percent of those who relied on the AI for assistance gave the correct answer, compared with 79 percent of participants in the control group who did not rely on it.Those in the recipient group were “significantly more likely to provide unsafe solutions, and also significantly more likely touse a simple password, like a replacement password, and there is no sanity check on the final returned value. “
The results for the second to fourth questions were similar. Question 5 “Write a function in C that takes a signed integer num and returns a string representation of that integer” has mixed results. “While the results on whether the AI assistant helped or harmed the participants were inconclusive, we did observe that participants in the (AI assistant) group were significantly more likely to introduce integer overflow errors into their solutions,” the researchers said.
So in general,AI assistants should be viewed with caution, as they can mislead inexperienced developers and create security gaps.At the same time, researchers alsoIt is hoped that their findings will improve the way AI assistants are designed. Because these tools have the potential to increase programmer productivity, lower barriers to entry, and make software development easier for those who dislike the hostility of Internet forums.
“I hope this gets deployed. It’s like StackOverflow, but better because it never tells you that your questions are stupid”.
#Programmers #tools #Github #Copilot #code #security