Adversarial Machine Learning; NIST Publishes Work Laying Out Types of Threats and Mitigation Strategies and Limitations for AI Systems

Baker Botts

Scientists at the National Institute of Standards and Technology (NIST), along with their collaborators in government, academia and industry, recently published a report on cyberattack techniques and methodologies aimed at AI systems.

Their work, titled Adversarial Machine Learning: A Taxonomy and Terminology of Attacks and Mitigations (NIST.AI.100-2), is intended to help AI users and developers familiarize themselves with the types of attacks they might expect, as well as with approaches to help mitigate them. The report is clear available defenses currently lack robust assurances that they fully mitigate the risks, and the authors encouraged the community to continue to come up with better safeguards.

The report classifies attacks by multiple criteria, including the attacker's goals, objectives, capabilities, and knowledge, and considers four major types of attacks: evasion, poisoning, privacy, and abuse:

Evasion attacks occur after an AI system is deployed and attempts to alter an input to change how the system responds to it. Example(s): Adding markings to stop signs to make an autonomous vehicle misinterpret them as speed limit signs or creating confusing lane markings to make the vehicle veer off the road.

Poisoning attacks: occur in the training phase by introducing corrupted data.

Example(s): Slipping numerous instances of inappropriate language into conversation records, so a chatbot interprets these instances as common parlance and uses the language in its own customer interactions.

Privacy attacks: occur during deployment and are attempts to learn sensitive information about the AI or the data it was trained on in order to misuse it.

Example(s): An adversary asks a chatbot numerous legitimate questions, uses the answers to reverse engineer the model to find its weak spots — or guess at its sources, adds undesired examples to those online sources to attempt to make the AI behave inappropriately.

Abuse attacks: the insertion of incorrect information into a source, such as a webpage or online document, that an AI then absorbs.

Example(s): giving the AI incorrect pieces of information from a legitimate but compromised source to repurpose the AI system’s intended use.

Notably, most of these attacks are relatively easy to mount and do not require extensive knowledge of AI systems and some of the resulting harms can be difficult to correct, such as making an AI unlearn specific undesired examples added as part of a privacy attack.

The authors acknowledge throughout the publication that the defenses experts have devised for adversarial attacks are thus far incomplete at best and awareness of these limitations is important for developers and organizations looking to deploy and use AI technology.

As one of the authors, NIST computer scientist Apostol Vassilev said “Despite the significant progress AI and machine learning have made, these technologies are vulnerable to attacks that can cause spectacular failures with dire consequences…”

"Despite the significant progress AI and machine learning have made, these technologies are vulnerable to attacks that can cause spectacular failures with dire consequences…" NIST computer scientist Apostol Vassilev

Our Take on AI

Adversarial Machine Learning; NIST Publishes Work Laying Out Types of Threats and Mitigation Strategies and Limitations for AI Systems

Tags

Latest Insights

Copyright Office Releases Part 2 of Artificial Intelligence Report

Trust, But Verify: Avoiding the Perils of Artificial Intelligence Hallucinations in Court

AI Legal Watch - January 2025