The proposed EU AI Act spells out a wide-ranging set of regulations imposed on companies handling AI training data. The current draft of Article 10 pertains to “data and data governance” of data used to develop high-risk AI systems, as defined by the proposed regulation.
Quality and usage of data are pivotal, with the regulation mandating that training, validation, and testing data sets should fulfill specific quality standards and undergo appropriate data governance, ensuring their relevance and efficacy. For example, the draft Article 10 language mandates measures are taken to ensure that training data is appropriate for the “context of use as well as the intended purpose of the AI system.” The relevant measures concern:
- the relevant design choices;
- transparency as regards the original purpose of data collection;
- data collection processes;
- data preparation processing operations, such as annotation, labelling, cleaning, updating enrichment and aggregation;
- the formulation of assumptions, notably with respect to the information that the data are supposed to measure and represent;
- an assessment of the availability, quantity and suitability of the data sets that are needed;
- examination in view of possible biases that are likely to affect the health and safety of persons, negatively impact fundamental rights or lead to discrimination prohibited under Union law, especially where data outputs influence inputs for future operations (‘feedback loops’) and appropriate measures to detect, prevent and mitigate possible biases;
- appropriate measures to detect, prevent and mitigate possible biases
- the identification of relevant data gaps or shortcomings that prevent compliance with this Regulation, and how those gaps and shortcomings can be addressed;
Draft EU AI Act Article 10, including Jun 14, 2023, adopted Amendments.
One key takeaway is the strong emphasis on potential biases in data and, subsequently, in the decision-making processes of AI systems. The regulation demands an examination to identify biases that may negatively impact health and safety or infringe upon fundamental rights, particularly in scenarios where data outputs may influence future operations, creating the so-called 'feedback loops,' as mentioned above.
Moreover, the proposed text spells out that datasets, including labels, should be “relevant, sufficiently representative, appropriately vetted for errors” and as complete as possible, keeping the intended purpose in focus. Given the volume and complexity of data that most AI systems require, ensuring compliance with applicable regulations at each stage of data processing and making defensible decisions are pivotal.
When thinking of best practices to comply with these comprehensive regulations, one might consider the following:
- Implement Robust Data Governance: establish clear protocols for data collection, preparation, and processing operations while paying careful attention to transparency and the original purpose of data collection. Documentation of the data used to train an AI model and the steps taken to ensure its quality will help show compliance with the proposed Article 10.
- Implement Bias Detection and Mitigation Measures: Companies may consider integrating systematic bias detection, prevention, and mitigation measures throughout the AI system development lifecycle. A full reading of the proposed Article 10 language details the relevant factors to consider.
- Create Data Governance Roles: Complying with a regulation that mandates all datasets are “relevant, sufficiently representative, [and] appropriately vetted for errors” may be technically demanding and will require a deep understanding of your company’s AI systems. Ensuring the appropriate resources are in place to complete this task will go a long way in demonstrating compliance with this proposed regulation.
The latest amendments to the proposed EU AI Act can be found here: https://www.europarl.europa.eu/doceo/document/TA-9-2023-0236_EN.html