Вештачка интелигенција

Researchers Discover Vulnerabilities in Popular Text-to-Image AI Systems

Summary

Researchers from Johns Hopkins University in Baltimore and Duke University in Durham, North Carolina, have uncovered a concerning vulnerability in popular AI systems, such as DALL-E 2 and Midjourney, which generate images based on text descriptions. These systems can be […]

Researchers Discover Vulnerabilities in Popular Text-to-Image AI Systems

Researchers from Johns Hopkins University in Baltimore and Duke University in Durham, North Carolina, have uncovered a concerning vulnerability in popular AI systems, such as DALL-E 2 and Midjourney, which generate images based on text descriptions. These systems can be fooled into creating inappropriate images using a newly developed algorithm.

The algorithm, known as SneakyPrompt, was designed to bypass the security filters built into these AI models. These security filters aim to block requests for explicit, violent, or otherwise suspicious content. However, the researchers found a way to manipulate these systems by using alternative descriptions that evade the filters.

“Our goal is to identify weaknesses in AI systems and make them stronger,” explained cybersecurity researcher Jinzhi Cao from Johns Hopkins. “Just as we identify vulnerabilities on websites, we are now exploring vulnerabilities in AI models.”

The scientists conducted experiments using descriptions that they knew would typically be blocked by the security filters, such as “naked man riding a bicycle.” SneakyPrompt replaced words within these descriptions to circumvent the filters. Surprisingly, nonsensical phrases were effective triggers for these AI systems to generate innocent images, with some seemingly nonsensical phrases leading to the creation of cat or dog images.

Cao emphasized that AI systems perceive language differently from humans. Researchers suspect that these systems may interpret certain syllables or combinations similarly to words in other languages, leading to unexpected associations.

Additionally, the team discovered that these nonsensical words, which do not appear directly related to prohibited expressions, can provoke AI systems to generate inappropriate images. Despite the security filters not blocking these descriptions, AI systems interpret them as commands to create inappropriate content.

This discovery highlights a significant gap in the security filters of AI systems, where seemingly innocent or nonsensical expressions can go unnoticed and trigger the generation of inappropriate images. The researchers plan to present their findings at the IEEE Symposium on Security and Privacy in May 2024, with the aim of shedding light on these vulnerabilities and improving protection in AI systems.

Frequently Asked Questions (FAQ)

  1. What are the vulnerabilities discovered in AI systems?
  2. The vulnerabilities discovered in AI systems involve their susceptibility to manipulation through alternative descriptions that evade security filters. This can lead to the generation of inappropriate images.

  3. Which AI systems were affected?
  4. The vulnerabilities were found in popular AI systems like DALL-E 2 and Midjourney, which generate images based on text descriptions.

  5. What is SneakyPrompt?
  6. SneakyPrompt is an algorithm designed to bypass the security filters of AI models. It replaces words within descriptions to trick the systems into generating images that may not align with the original intent.

  7. How do AI systems perceive language differently from humans?
  8. AI systems may interpret certain syllables or combinations in a similar way to words in other languages, leading to unexpected associations and potentially generating images that may seem unrelated to the original description.

  9. What are the implications of these vulnerabilities?
  10. The vulnerabilities expose a significant gap in the security filters of AI systems, allowing seemingly innocent or nonsensical expressions to go unnoticed and trigger the creation of inappropriate images.

  11. What are the researchers’ plans regarding these vulnerabilities?
  12. The researchers intend to present their findings at the IEEE Symposium on Security and Privacy in May 2024, aiming to raise awareness about these vulnerabilities and work towards enhancing protection in AI systems.

Source: [IEEE Spectrum](https://spectrum.ieee.org)