Based on the intended use and audience for your product, a content policy defines what content is allowable and may specify safety limitations on producing illegal, violent, or harmful content. These limits should be evaluated in light of the product domain, as specific sectors and regions may have different laws or standards.
If you are new to value considerations in the development and deployment of AI, refer to the principles and guidance on risk management released by academic institutions and governing organization, such as:
Preparing data includes annotating a dataset. Below are helpful tips on what to include in a guideline document for annotators, that provides instructions on how to complete the task:
Consider what volume of data is required for your task.
Specify an appropriate interface for annotators
Engage with your privacy or legal partner to ensure data processing is in accordance with relevant privacy regulations.
Considerations might include whether first-party data needs to be passed to the annotators and whether additional privacy mitigations, such as image blurring, might be required.
Pay attention to how human feedback and annotation of data may further polarize a fine-tuned model with respect to subjective opinions. Take steps to prevent injecting bias into annotation guidelines and to mitigate the effect of annotators’ bias. Resources on this topic include:
In the meantime, we recommend using public benchmarking platforms and datasets:
As outlined in our Responsible Use Guide, you should deploy appropriate system level safeguards to mitigate safety and security risks of your system. As part of our responsible release approach, we provide open source solutions including: