STEP 4: MODEL PROTOTYPING, TUNING, ASSESSMENT

What are the accuracy / success metrics/ thresholds and who determines these?

Determining metrics and thresholds must be done before implementing a potential solution. To determine these, involving a diversity of stakeholders is key in order to develop fair and accurate measures. Methods to address false positives/negatives must be considered, along with accessible ways to  appeal decisions in a manner that works for the target group. This section presents tools and resources which can help engage different stakeholders in the creation of accurate metrics.  When doing so, one must always keep in mind the expertise differential that characterises each stakeholder group. 

Please find below a legend of what can be found within the framework:

πŸ“šResources - e.g. reports, articles, and case studies

πŸ› Tools - e.g. guidelines, frameworks and scorecards

πŸ”—Links - e.g. online platforms, videos, hubs and databases

❌Gap analysis - tools or resources are currently missing

πŸ‘₯ List of stakeholders which should be included in the specific decision point

  • πŸ‘₯ End users, local population, technical experts

    πŸ“š Process for Adapting Language Models to Society (PALMS) with Values-Targeted Datasets - Iterative process to significantly change model behaviour by crafting and fine-tuning on a dataset that reflects a predetermined set of target values

    πŸ“š Mitigating bias in AI - Handbook with guidelines for startups to help them navigate bias issues

    πŸ“šπŸ›  AI Accountability Gap’ end-to-end framework for internal algorithmic auditing - Document templates for teams conducting internal algorithmic auditing – a targeted approach focusing on assessing a system for potential biases

    πŸ›  AI Fairness Checklist - Checklist developed by AI practitioners that can be used to support the development of more fair AI products and services

    πŸ› πŸ”— What-if Tool - Tool to visually probe the behaviour of trained machine learning models, with minimal coding. Using WIT, you can test performance in hypothetical situations, analyse the importance of different data features, and visualise model behaviour across multiple models and subsets of input data, and for different ML fairness metrics

    πŸ”— Fairlearn: Improve Fairness of AI systems - Fairlearn is an open-source, community-driven project to help data scientists improve fairness of AI systems. It contains guides and use cases, as well as a Python toolkit that can be used to assess and mitigate fairness issues

    • See also step 2, decision 1: How is data selected and reviewed and who does it? and decision 3: Have all local languages been considered? Is there a language selection decision to be made?

  • πŸ‘₯End users, local population, technical experts

    πŸ“š Metrics to evaluate ML Algorithm - Article covering different types of AI evaluation metrics available

    πŸ“š AI Platform Prediction: Viewing Evaluation Metrics - Interactive guide describing different types of evaluation metrics and how to view them

    πŸ“š Datasheets for datasets - Datasheets for datasets have the potential to increase transparency and accountability within the machine learning community, mitigate unwanted societal biases in machine learning models, facilitate greater reproducibility of machine learning results, and help researchers and practitioners to select more appropriate datasets for their chosen tasks

    πŸ›  Google Model Cards - Model Cards provide a structured framework for reporting on ML model provenance, usage, and ethics-informed evaluation and give a detailed overview of a model’s suggested uses and limitations

    πŸ”—πŸ›  Know Your Data - Know Your Data helps researchers, engineers, product teams, and decision makers understand datasets with the goal of improving data quality, and helping mitigate fairness and bias issues

    πŸ”—πŸ›  Interpret ML models - A toolkit to help understand models and enable responsible machine learning