
STEP 4: MODEL PROTOTYPING, TUNING, ASSESSMENT
What are the accuracy / success metrics/ thresholds and who determines these?
Determining metrics and thresholds must be done before implementing a potential solution. To determine these, involving a diversity of stakeholders is key in order to develop fair and accurate measures. Methods to address false positives/negatives must be considered, along with accessible ways to appeal decisions in a manner that works for the target group. This section presents tools and resources which can help engage different stakeholders in the creation of accurate metrics. When doing so, one must always keep in mind the expertise differential that characterises each stakeholder group.
Please find below a legend of what can be found within the framework:
πResources - e.g. reports, articles, and case studies
π Tools - e.g. guidelines, frameworks and scorecards
πLinks - e.g. online platforms, videos, hubs and databases
βGap analysis - tools or resources are currently missing
π₯ List of stakeholders which should be included in the specific decision point

-
π₯ End users, local population, technical experts
π Process for Adapting Language Models to Society (PALMS) with Values-Targeted Datasets - Iterative process to significantly change model behaviour by crafting and fine-tuning on a dataset that reflects a predetermined set of target values
π Mitigating bias in AI - Handbook with guidelines for startups to help them navigate bias issues
ππ AI Accountability Gapβ end-to-end framework for internal algorithmic auditing - Document templates for teams conducting internal algorithmic auditing β a targeted approach focusing on assessing a system for potential biases
π AI Fairness Checklist - Checklist developed by AI practitioners that can be used to support the development of more fair AI products and services
π π What-if Tool - Tool to visually probe the behaviour of trained machine learning models, with minimal coding. Using WIT, you can test performance in hypothetical situations, analyse the importance of different data features, and visualise model behaviour across multiple models and subsets of input data, and for different ML fairness metrics
π Fairlearn: Improve Fairness of AI systems - Fairlearn is an open-source, community-driven project to help data scientists improve fairness of AI systems. It contains guides and use cases, as well as a Python toolkit that can be used to assess and mitigate fairness issues
See also step 2, decision 1: How is data selected and reviewed and who does it? and decision 3: Have all local languages been considered? Is there a language selection decision to be made?
-
π₯End users, local population, technical experts
π Metrics to evaluate ML Algorithm - Article covering different types of AI evaluation metrics available
π AI Platform Prediction: Viewing Evaluation Metrics - Interactive guide describing different types of evaluation metrics and how to view them
π Datasheets for datasets - Datasheets for datasets have the potential to increase transparency and accountability within the machine learning community, mitigate unwanted societal biases in machine learning models, facilitate greater reproducibility of machine learning results, and help researchers and practitioners to select more appropriate datasets for their chosen tasks
π Google Model Cards - Model Cards provide a structured framework for reporting on ML model provenance, usage, and ethics-informed evaluation and give a detailed overview of a modelβs suggested uses and limitations
ππ Know Your Data - Know Your Data helps researchers, engineers, product teams, and decision makers understand datasets with the goal of improving data quality, and helping mitigate fairness and bias issues
ππ Interpret ML models - A toolkit to help understand models and enable responsible machine learning