AI Safety, Explained

AI safety tackles the technical, social and ethical concerns that come with artificial intelligence.

Written by Ellen Glover
Published on Sep. 17, 2024
An illustration of a robot in handcuffs
Image: Shutterstock / Built In

Concerns regarding the safety of artificial intelligence date back to early days of computing in the 1940s, when the concept of thinking machines was still largely relegated to science fiction. The field has since gained significant momentum following the release of ChatGPT and the subsequent generative AI boom, bringing both exciting possibilities and significant risks that captured the attention of AI experts, government officials and the general public alike.

What Is AI Safety?

AI safety is a field focused on building artificial intelligence systems that operate the ways they were designed to, without causing harm.

Now, action is being taken on multiple fronts. A global AI safety summit is held once a year, where industry experts and legislators discuss the safety and regulation of artificial intelligence. And governments around the world (including the European Union, the United States and Brazil) have started to regulate AI development and usage. Prominent AI companies like OpenAI, Microsoft and Google have also made safety a priority internally — with some even making voluntary commitments to the White House on how they will develop AI responsibly. 

But these efforts are far from complete. AI systems still exhibit unpredictable behaviors, make biased decisions and are vulnerable to exploitation. As this technology continues to get more sophisticated, new and unforeseen risks will likely come up. In short: ensuring AI safety is a continuous process.

“Perfect AI safety doesn’t exist,” Akshay Sharma, chief AI officer at healthtech company Lyric, told Built In. “There will always be attacks on systems, so you have to continue to improve them. It’s a game of constant improvements through threats.”

 

What Is AI Safety?

AI safety is an interdisciplinary field dedicated to ensuring that artificial intelligence is built and used responsibly. It encompasses several strategies, including government policies, ethical guidelines and technical safeguards.

The goal is to have AI models that are reliable, transparent and operate in ways they were originally intended to, while also mitigating risks like malicious use, biased decisions, privacy violations, and potential threats from superintelligent AI.

“Building an AI system and just letting it run is like driving a car in heavy rain with no wipers. You can drive, but it is not without risk,” Sharma said. “AI safety is all about putting up those guardrails, having that wiper when it’s starting to rain heavily.”

More on AIRead Bult In’s Artificial Intelligence Coverage

 

AI Safety Concerns

Artificial intelligence comes with many safety concerns, both in terms of how it operates and how it is used. These are some of the most common ones:

Lack of Reliability

AI systems often fail to perform their intended functions consistently under diverse conditions. This can happen for a variety of reasons, but it is typically due to something going wrong in the training process, when the model is fed a large corpus of training data so it can learn patterns and relationships between them. The training data may be incorrect or incomplete; or maybe the model was too closely trained on a specific dataset, causing it to perform poorly in unfamiliar scenarios.

“The core point is just getting these systems to do what you actually tell them to do. Right now, that in and of itself is an extremely difficult problem,” Leonard Tang, co-founder and CEO of AI safety company Haize Labs, told Built In. “Safety problems are complicated by how unreliable these underlying models are in the first place.”

Failures can have serious safety consequences, especially in critical areas like medical imaging and self-driving cars, where one mistake could cost someone their life. Compounding the issue even further is the opaque, “black box” nature of AI models, which makes it difficult (and in some cases impossible) to fully comprehend their decision-making process — a predicament that extends even to the people developing these systems.

AI Bias

Even when AI models work as technically intended, their outcomes can still contain harmful biases. If the data used to train a model is limited, or favors a particular group of people or scenarios, then it may struggle with new or diverse inputs, leading to inaccurate generalizations and problematic outputs. 

Biased AI can be particularly troubling in high-stakes fields like banking, job recruitment and criminal justice, where the decisions made by these systems can profoundly affect people’s lives. For instance, hiring algorithms used to screen applicants have shown bias against individuals with disabilities and other protected groups. Similarly, AI tools used by lenders have overcharged people of color seeking home loans by millions of dollars. In law enforcement, facial recognition software has led to several false arrests, disproportionately affecting Black men.

AI Hallucinations

Hallucinations occur when an AI system generates false, misleading or illogical information, but presents it in such a coherent and logical way that it can be mistaken as true. They are most commonly associated with AI chatbots and text generators, which produce copy that mimics human writing. But these tools have no real understanding of the reality they are describing — they predict what the next word, sentence and paragraph will be based on probability, not accuracy.

“It’s not a search engine, so it’s not telling you anything that’s checked against reality. It’s just telling you likely sentences, which are sometimes completely false,” Ted Vial, who oversees the development of the TRUST AI Framework at Iliff School of Theology, told Built In.

Some hallucinations have significant repercussions. For example, a New York attorney used ChatGPT to draft a motion that turned out to be full of fake judicial opinions and legal citations. He was later sanctioned and reportedly fined $5,000. In another case, ChatGPT fabricated a story about a real law professor, falsely alleging that he sexually harassed students on a school trip. 

Privacy Violations

AI systems require large amounts of data to work properly. These datasets often include individuals’ personal information like their contact details, financial records and medical history — all of which are susceptible to data breaches should the system be attacked.

Sometimes, AI developers scrape the web for data to feed their models, meaning everything — from the pictures people post on Instagram to the conversations they have on Reddit — are up for grabs. This practice raises concerns about privacy and consent, as people often find it challenging to opt out of such data collection or remove their information once it’s been gathered.

“In AI, the name of the game is to collect as much data as you can to build these giant models. And the people who are doing the collecting are not really concerned with who owns the data or with people’s privacy,” Vial said. “There are some good data collection protocols, but people have to insist on them. The companies that are making the money aren’t going to do it on their own.”

Malicious Use

AI systems (particularly generative AI systems) can be weaponized for all kinds of sophisticated forms of deception and harm. 

“AI is a risky technology,” Brian Green, director of technology ethics at Markkula Center for Applied Ethics at Santa Clara University, told Built In. “It is risky because humans use it for evil purposes.” 

Deepfakes — realistic fabricated videos and audio clips of real people — are used to manipulate public opinion on things like wars and political elections. And large language models are used to disseminate misinformation at scale, generating fake news articles and social media posts that amplify propaganda and misleading narratives. This technology helps to automate cyberattacks like phishing emails and robocall scams that are almost impossible to detect

Lethal Autonomous Weapons Usage

One of the more controversial uses of AI is in lethal autonomous weapons (LAWs), which are artificially intelligent systems that locate and destroy targets on their own based on pre-programmed instructions and constraints. This technology is being built by some of the most powerful militaries in the world — including the United States, China, Israel and Russia — and are already reportedly being used in conflicts around the world.

Some (like the U.S. government) argue that, if used appropriately. LAWs can save the lives of both civilians and soldiers, but others don’t think they’re worth the risk. This technology poses a major risk to civilian life, according to some experts, and may even violate International Humanitarian Law.

The United Nations has repeatedly called for an outright global ban on all LAWs that function without human control, deeming them “politically unacceptable” and “morally repugnant.” But countries like the United States, Russia, Israel and the UK have opposed such a ban. 

Existential Catastrophes

Some think that AI systems may one day achieve artificial general intelligence — and even superintelligence — meaning they meet and surpass humans’ ability to learn, think and perform tasks. While some believe this would bring about huge benefits for society, many worry it could ultimately lead to any number of other irreversible catastrophes. This kind of advanced AI could break free of all human control and take over the world.

These concerns have grown louder amid the rapid advancement of AI. A 2022 survey of AI researchers revealed that most believed there is at least a 10 percent chance that humans go extinct from our inability to control AI. And in 2023, hundreds of industry leaders signed a statement declaring that “mitigating the risk of extinction from AI” should be a “global priority,” comparing it to other large-scale risks like pandemics and nuclear war.

“The way to avoid it is to work on the nuts and bolts of ethical AI now, rather than try to create some system ten years down the road,” Vial said. “Focus on making them unbiased, make them better protect our privacy. And as it continues to build in a good way, those other things won’t be an issue anymore.”

Related ReadingWill We Ever Achieve Sentient AI?

 

AI Safety Solutions

AI safety concerns are being tackled in a variety of different ways:

Government Regulation

Governments around the world recognize the risks posed by AI, and have taken action to address them. This includes creating dedicated organizations like the AI Safety Institutes in the U.S. and the U.K., as well as enacting legislation.

The European Union’s AI Act classifies AI systems based on their risk levels and imposes stringent requirements to ensure their safe and ethical use. The United States doesn’t have any federal laws on the books explicitly limiting the use of AI or regulating its risks (at least not yet), but several federal agencies have issued guidelines for the use of AI in their specific industries, including the Federal Trade Commission, the Department of Defense and the U.S. Copyright Office. Meanwhile, dozens of states either have passed or are working on their own AI laws that pertain to everything from hiring to insurance. California is also on the verge of enacting what could be one of the most comprehensive pieces of AI safety legislation in the country.

Regulation is crucial for ensuring AI is developed and used safely, Vial explained, as it sets clear standards for accountability, ethical practices, and risk management, mitigating misuse and unintended harm. “Companies don’t have a good history of self-regulation, so I think that regulation is absolutely critical,” he said.

Guardrails

In some instances, companies have set up guardrails around their AI products, preventing people from using them in dangerous ways. 

For example, Anthropic devised a training method called “constitutional AI,” where ethical principles are used to guide its models’ outputs so that they’re less likely to be harmful or used maliciously. And image generators like DALL-E 3 will not create images of public figures by name, mitigating the risk of making misleading or deceptive content. OpenAI also has built-in detection systems that flag or restrict users based on their activity.

“No company in the world should be able to use an algorithmic system without evidencing that they’ve risk-managed it,” Emre Kazim, co-founder and co-CEO at AI governance company Holistic AI, told Built In. “It’s about having systems that don’t go AWOL.”

Explainable AI

Some AI researchers and developers strive to make AI models more transparent and explainable, providing clearer insights into how and why they arrive at specific outputs so they can correct unwanted behaviors.

Notably, a team at Anthropic reverse engineered several large language models and mapped out their internal neural networks to better understand the reasoning behind their generated responses, claiming significant progress.

Expert Oversight

Human oversight is an important part of maintaining AI safety, especially when those humans are industry experts. They rigorously test for instances of bias, inaccuracies and other problems that may slip by the average user, ensuring these systems are more robust and reliable.  

“You can quickly build a model that, for a layperson, looks smart enough. But an expert is able to find the mistakes that it makes,” Lyric’s Sharma said. “The only way these models and AI systems can improve is if there is an expert actually giving feedback.” 

And as AI models grow more complex and capable, expert oversight becomes even more essential to ensuring these systems stick to their original programming and align with societal values.

“I think there will come a time when AI surpasses some level of expertise in some places,” Sharma said. “That’s when safety becomes even more critical.”

Related ReadingAI Has a Climate Change Dilemma

 

Why Is AI Safety Important?

From personalized recommendations on streaming platforms to customer service chatbots, AI subtly shapes our everyday experiences. More critically, this technology is deployed in high-stakes areas like healthcare and national security.

The significance of AI safety lies in its potential to mitigate the unintended consequences that come with the technology’s ubiquity. An AI-powered recruitment tool could inadvertently skip over you for a job because of your gender or race; a self-driving car could make a split-second mistake that injures you; a convincing, AI-generated phishing email could scam you out of thousands of dollars. By handling safety issues like bias, hallucinations and malicious use, society can tap into all the benefits artificial intelligence brings while minimizing the harms it may cause.

“AI is going to be everywhere,” Green said. “If we want to live in a happy future and not a horrible one we need to be working hard on [safety] right now.” 

Frequently Asked Questions

The safety concerns of AI vary widely. The most immediate ones include biased algorithms, data privacy violations and the misuse of AI for malicious purposes, such as disinformation campaigns and phishing attacks. As AI systems become more autonomous and sophisticated, they may also act in unexpected (or even dangerous) ways that were not intended by the developers.

Policymakers, AI developers and AI researchers work on advancing AI safety. They create government policies, ethical guidelines and technical safeguards to help ensure AI systems operate the ways they were designed to, without causing harm.

Left unchecked, AI poses serious technical, social and ethical threats, ranging from biased decisions to malicious use. Eventually, superintelligent AI may pose an existential threat to humanity, potentially breaking free of human control.

Explore Job Matches.