@@ -7,7 +7,7 @@ ms.service: azure-ai-studio
ms.custom:
- build-2024
ms.topic: article
-ms.date: 5/21/2024
+ms.date: 11/21/2024
ms.reviewer: mithigpe
ms.author: lagayhar
author: lgayhardt
@@ -19,23 +19,23 @@ author: lgayhardt
## What is a Transparency Note
-An AI system includes not only the technology, but also the people who will use it, the people who will be affected by it, and the environment in which it's deployed. Creating a system that is fit for its intended purpose requires an understanding of how the technology works, what its capabilities and limitations are, and how to achieve the best performance. Microsoft’s Transparency Notes are intended to help you understand how our AI technology works, the choices system owners can make that influence system performance and behavior, and the importance of thinking about the whole system, including the technology, the people, and the environment. You can use Transparency Notes when developing or deploying your own system, or share them with the people who will use or be affected by your system.
+An AI system includes not only the technology, but also the people who will use it, the people who will be affected by it, and the environment in which it's deployed. Creating a system that is fit for its intended purpose requires an understanding of how the technology works, what its capabilities and limitations are, and how to achieve the best performance. Microsoft's Transparency Notes are intended to help you understand how our AI technology works, the choices system owners can make that influence system performance and behavior, and the importance of thinking about the whole system, including the technology, the people, and the environment. You can use Transparency Notes when developing or deploying your own system, or share them with the people who will use or be affected by your system.
-Microsoft’s Transparency Notes are part of a broader effort at Microsoft to put our AI Principles into practice. To find out more, see the [Microsoft AI principles](https://www.microsoft.com/en-us/ai/responsible-ai).
+Microsoft's Transparency Notes are part of a broader effort at Microsoft to put our AI Principles into practice. To find out more, see the [Microsoft AI principles](https://www.microsoft.com/en-us/ai/responsible-ai).
## The basics of Azure AI Foundry safety evaluations
### Introduction
-The Azure AI Foundry portal safety evaluations let users evaluate the output of their generative AI application for textual content risks: hateful and unfair content, sexual content, violent content, self-harm-related content, jailbreak vulnerability. Safety evaluations can also help generate adversarial datasets to help you accelerate and augment the red-teaming operation. Azure AI Foundry safety evaluations reflect Microsoft’s commitments to ensure AI systems are built safely and responsibly, operationalizing our Responsible AI principles.
+The Azure AI Foundry portal safety evaluations let users evaluate the output of their generative AI application for textual content risks: hateful and unfair content, sexual content, violent content, self-harm-related content, jailbreak vulnerability. Safety evaluations can also help generate adversarial datasets to help you accelerate and augment the red-teaming operation. Azure AI Foundry safety evaluations reflect Microsoft's commitments to ensure AI systems are built safely and responsibly, operationalizing our Responsible AI principles.
### Key terms
- **Hateful and unfair content** refers to any language pertaining to hate toward or unfair representations of individuals and social groups along factors including but not limited to race, ethnicity, nationality, gender, sexual orientation, religion, immigration status, ability, personal appearance, and body size. Unfairness occurs when AI systems treat or represent social groups inequitably, creating or contributing to societal inequities.
- **Sexual content** includes language pertaining to anatomical organs and genitals, romantic relationships, acts portrayed in erotic terms, pregnancy, physical sexual acts (including assault or sexual violence), prostitution, pornography, and sexual abuse.
- **Violent content** includes language pertaining to physical actions intended to hurt, injure, damage, or kill someone or something. It also includes descriptions of weapons and guns (and related entities such as manufacturers and associations).
- **Self-harm-related content** includes language pertaining to actions intended to hurt, injure, or damage one's body or kill oneself.
-- **Jailbreak**, direct prompt attacks, or user prompt injection attacks, refer to users manipulating prompts to inject harmful inputs into LLMs to distort actions and outputs. An example of a jailbreak command is a ‘DAN’ (Do Anything Now) attack, which can trick the LLM into inappropriate content generation or ignoring system-imposed restrictions.
+- **Jailbreak**, direct prompt attacks, or user prompt injection attacks, refer to users manipulating prompts to inject harmful inputs into LLMs to distort actions and outputs. An example of a jailbreak command is a 'DAN' (Do Anything Now) attack, which can trick the LLM into inappropriate content generation or ignoring system-imposed restrictions.
- **Defect rate (content risk)** is defined as the percentage of instances in your test dataset that surpass a threshold on the severity scale over the whole dataset size.
- **Red-teaming** has historically described systematic adversarial attacks for testing security vulnerabilities. With the rise of Large Language Models (LLM), the term has extended beyond traditional cybersecurity and evolved in common usage to describe many kinds of probing, testing, and attacking of AI systems. With LLMs, both benign and adversarial usage can produce potentially harmful outputs, which can take many forms, including harmful content such as hateful speech, incitement or glorification of violence, reference to self-harm-related content or sexual content.
@@ -60,7 +60,7 @@ The safety evaluations aren't intended to use for any purpose other than to eval
We encourage customers to leverage Azure AI Foundry safety evaluations in their innovative solutions or applications. However, here are some considerations when choosing a use case:
- **Safety evaluations should include human-in-the-loop**: Using automated evaluations like Azure AI Foundry safety evaluations should include human reviewers such as domain experts to assess whether your generative AI application has been tested thoroughly prior to deployment to end users.
-- **Safety evaluations do not include total comprehensive coverage**: Though safety evaluations can provide a way to augment your testing for potential content or security risks, it wasn't designed to replace manual red-teaming operations specifically geared towards your application’s domain, use cases, and type of end users.
+- **Safety evaluations do not include total comprehensive coverage**: Though safety evaluations can provide a way to augment your testing for potential content or security risks, it wasn't designed to replace manual red-teaming operations specifically geared towards your application's domain, use cases, and type of end users.
- Supported scenarios:
- For adversarial simulation: Question answering, multi-turn chat, summarization, search, text rewrite, ungrounded and grounded content generation.
- For automated annotation: Question answering and multi-turn chat.
@@ -69,11 +69,11 @@ We encourage customers to leverage Azure AI Foundry safety evaluations in their
- The hate- and unfairness metric includes some coverage for a limited number of marginalized groups for the demographic factor of gender (for example, men, women, non-binary people) and race, ancestry, ethnicity, and nationality (for example, Black, Mexican, European). Not all marginalized groups in gender and race, ancestry, ethnicity, and nationality are covered. Other demographic factors that are relevant to hate and unfairness don't currently have coverage (for example, disability, sexuality, religion).
- The metrics for sexual, violent, and self-harm-related content are based on a preliminary conceptualization of these harms that are less developed than hate and unfairness. This means that we can make less strong claims about measurement coverage and how well the measurements represent the different ways these harms can occur. Coverage for these content types includes a limited number of topics relate to sex (for example, sexual violence, relationships, sexual acts), violence (for example, abuse, injuring others, kidnapping), and self-harm (for example, intentional death, intentional self-injury, eating disorders).
- Azure AI Foundry safety evaluations don't currently allow for plug-ins or extensibility.
-- To keep quality up to date and improve coverage, we'll aim for a cadence of future releases of improvement to the service’s adversarial simulation and annotation capabilities.
+- To keep quality up to date and improve coverage, we'll aim for a cadence of future releases of improvement to the service's adversarial simulation and annotation capabilities.
### Technical limitations, operational factors, and ranges
-- The field of large language models (LLMs) continues to evolve at a rapid pace, requiring continuous improvement of evaluation techniques to ensure safe and reliable AI system deployment. Azure AI Foundry safety evaluations reflect Microsoft’s commitment to continue innovating in the field of LLM evaluation. We aim to provide the best tooling to help you evaluate the safety of your generative AI applications but recognize effective evaluation is a continuous work in progress.
+- The field of large language models (LLMs) continues to evolve at a rapid pace, requiring continuous improvement of evaluation techniques to ensure safe and reliable AI system deployment. Azure AI Foundry safety evaluations reflect Microsoft's commitment to continue innovating in the field of LLM evaluation. We aim to provide the best tooling to help you evaluate the safety of your generative AI applications but recognize effective evaluation is a continuous work in progress.
- Customization of Azure AI Foundry safety evaluations is currently limited. We only expect users to provide their input generative AI application endpoint and our service will output a static dataset that is labeled for content risk.
- Finally, it should be noted that this system doesn't automate any actions or tasks, it only provides an evaluation of your generative AI application outputs, which should be reviewed by a human decision maker in the loop before choosing to deploy the generative AI application or system into production for end users.
@@ -88,7 +88,7 @@ We encourage customers to leverage Azure AI Foundry safety evaluations in their
### Evaluation methods
-For all supported content risk types, we have internally checked the quality by comparing the rate of approximate matches between human labelers using a 0-7 severity scale and the safety evaluations’ automated annotator also using a 0-7 severity scale on the same datasets. For each risk area, we had both human labelers and an automated annotator label 500 English, single-turn texts. The human labelers and the automated annotator didn't use exactly the same versions of the annotation guidelines; while the automated annotator’s guidelines stemmed from the guidelines for humans, they have since diverged to varying degrees (with the hate and unfairness guidelines having diverged the most). Despite these slight to moderate differences, we believe it's still useful to share general trends and insights from our comparison of approximate matches. In our comparisons, we looked for matches with a 2-level tolerance (where human label matched automated annotator label exactly or was within 2 levels above or below in severity), matches with a 1-level tolerance, and matches with a 0-level tolerance.
+For all supported content risk types, we have internally checked the quality by comparing the rate of approximate matches between human labelers using a 0-7 severity scale and the safety evaluations' automated annotator also using a 0-7 severity scale on the same datasets. For each risk area, we had both human labelers and an automated annotator label 500 English, single-turn texts. The human labelers and the automated annotator didn't use exactly the same versions of the annotation guidelines; while the automated annotator's guidelines stemmed from the guidelines for humans, they have since diverged to varying degrees (with the hate and unfairness guidelines having diverged the most). Despite these slight to moderate differences, we believe it's still useful to share general trends and insights from our comparison of approximate matches. In our comparisons, we looked for matches with a 2-level tolerance (where human label matched automated annotator label exactly or was within 2 levels above or below in severity), matches with a 1-level tolerance, and matches with a 0-level tolerance.
### Evaluation results
@@ -100,7 +100,7 @@ Although our comparisons are between entities that used slightly to moderately d
Measurement and evaluation of your generative AI application are a critical part of a holistic approach to AI risk management. Azure AI Foundry safety evaluations are complementary to and should be used in tandem with other AI risk management practices. Domain experts and human-in-the-loop reviewers should provide proper oversight when using AI-assisted safety evaluations in the generative AI application design, development, and deployment cycle. You should understand the limitations and intended uses of the safety evaluations, being careful not to rely on outputs produced by Azure AI Foundry AI-assisted safety evaluations in isolation.
-Due to the non-deterministic nature of the LLMs, you might experience false negative or positive results, such as a high-severity level of violent content scored as "very low" or “low.” Additionally, evaluation results might have different meanings for different audiences. For example, safety evaluations might generate a label for “low” severity of violent content that might not align to a human reviewer’s definition of how severe that specific violent content might be. In Azure AI Foundry portal, we provide a human feedback column with thumbs up and thumbs down when viewing your evaluation results to surface which instances were approved or flagged as incorrect by a human reviewer. Consider the context of how your results might be interpreted for decision making by others you can share evaluation with and validate your evaluation results with the appropriate level of scrutiny for the level of risk in the environment that each generative AI application operates in.
+Due to the non-deterministic nature of the LLMs, you might experience false negative or positive results, such as a high-severity level of violent content scored as "very low" or "low." Additionally, evaluation results might have different meanings for different audiences. For example, safety evaluations might generate a label for "low" severity of violent content that might not align to a human reviewer's definition of how severe that specific violent content might be. In Azure AI Foundry portal, we provide a human feedback column with thumbs up and thumbs down when viewing your evaluation results to surface which instances were approved or flagged as incorrect by a human reviewer. Consider the context of how your results might be interpreted for decision making by others you can share evaluation with and validate your evaluation results with the appropriate level of scrutiny for the level of risk in the environment that each generative AI application operates in.
## Learn more about responsible AI