Verantwortung der Informatik – Accountability In AI (AIAI 2021)

Prof. Dr. Andreas Polze
Kordian Gontarska, Marta Lemanczyk, Weronika Wrazen, Felix Grzelka
Tuesday, 13:30-15:00 Uhr, A-1.1

Abstract

In this seminar, we will talk about the accountability of computer science in the area of artificial intelligence. Each week a student will give a presentation, in which different perspectives of accountability, ethics, fairness, transparency, auditability, explainability, interpretability, and regulation are introduced. After each presentation, a group discussion about the presented topic will take place. The presentation should be based on literature and statements from recognized domain experts, however, it should also include an assessment of the arguments and the opinion of the presenter. Each 45-minute presentation should be single-handedly prepared by a participant using primary- and secondary literature. In preparation for the presentation, each participant will schedule a consultation with the supervisors and email a draft of the slides one week before the date of the presentation.

Prerequisites

Students are expected to have basic knowledge in the areas of statistics, machine learning, and deep learning.

Grading

To earn 3 ECTS points, students must hand in their slides after the presentation and a written report at the end of the semester. Additional 3 ECTS points can be earned by working on a project. Depending on the topic, this can include an implementation of a prototype, an application of the presented concept on an existing project, or data analysis.

Grading will be based on the quality of the presentation and the report. Active participation in the weekly discussions is highly encouraged.

Schedule

The following schedule is just preliminary and will be subject to change during the semester. All updates will also be announced on the course mailing list.

Topics

We suggest the following list of topics to choose from, however students are free to suggest and choose their own topic.

Federated Learning promises inherent data privacy among other things as compared to centrally trained ML models. This is supposedly done by training local models distributedly and updating a global model every so often. The data used for training stays local and is hence “safe”, or is it? Can we reconstruct the data used for training from the local model’s weights?

Literature:

Biases in data can lead to different behaviour resulting from various ethnic and/or socio-demographic subpopulations. Especially for applications in medicine and healthcare, this can have critical effects on underrepresented groups. Which metrics or methods can be used to detect biases in data collection, pre-processing and/or model performance? What can be done to remove biases? What consequences can biases have?

Literature:
Project Ideas:
  • Compare how Covid-19 genomes are sampled between different countries and which consequences this could have on public health
  • Application of fairness metrics on real-world data

To ensure quality for a product, so-called Audits are carried out to check if the product meets certain standards and regulations. For AI for health, the requirements for safety and efficacy are particularly strict and difficult to evaluate. These evaluations include the analysis of model performance, privacy, robustness, fairness and bias, and interpretability among other things. Which processes could the evaluation part of an audit include? Which regulations or guidelines are available? How does AI for health differ from other applications regarding the evaluations?

Literature:
Project Ideas:
  • Investigation of audit guidelines on a use-case (eg. Antibiotic Resistance Prediction) or existing project
  • Analyze which issues occur for auditing in AI for medicine/health with special focus on your use-case
  • (Optional) Propose methods/tools/metrics for analyzing evaluation and/or life cycle steps

Most popular (social media) websites and apps use recommender systems to individually filter content and provide users with suggestions of movies to watch, news articles to read, music to listen to, etc. Suggestions are based on previous user interactions and optimized to match the users' interests, maximizing user engagement. By providing a constant feed of interesting content recommender systems can lead to the excessive use (addiction) of internet applications. On the other hand, recommender systems also create echo chambers, reinforcing the opinions of users, by only showing content that agrees with a user's preexisting opinion. These virtual echo chambers or "bubbles" make critical discourse much harder, because a ground truth, that both parties can agree on no longer exists. How can recommender systems be built to be less addicting, while still providing relevant content? How to pop the "bubbles"?

Literature:
Project Ideas:
  • First, use a toy dataset to implement a conventional recommender system as a control condition. Run a small user study to measure user engagement and satisfaction. Then implement an improved version that is less addicting or not as prone to sustaining bubbles. Test your hypothesis by running a study with your improvements.

Big, state-of-the-art neural networks need big datasets for training. How does this fit with the principles of the European data protection laws (GDPR)? How can we ensure data protection and progress in AI at the same time? What restrictions are already imposed on AI research and products by the GDPR? How can requirements such as the principle of data economy (Grundsatz der Datensparsamkeit) and the right to be forgotten be implemented? How can we ensure the privacy of uninvolved third parties such as relatives, which share parts of their genome?

Literature:
Project Ideas:
  • choose one or more requirements of the GDPR and implement two or more ideas to fulfill these requirements

With the development of artificial intelligence (AI) the vehicles are becoming more and more autonomous. It implied the necessity of setting Legal and Regulatory Statements. The need to answer the following questions arose: how to test and develop driverless vehicles, who takes responsibility when an accident happens, what are the manufacture’s obligations, to name a few. The presentation would include a discussion of the above issues and presenting the approach in different countries

Literature:

Nowadays, machine learning (ML) techniques are gaining more and more popularity because of their performance. With them, we are able to model complex phenomena receiving high accuracy of results. One of the limitations of ML models is the fact the mean value of the output is estimated. It means that we do not know how our model is confident about the result. Consequently, we - end users- are not sure if the model is trustworthy and safe. To overcome that problem the uncertainty of the prediction can be estimated. Uncertainty tells us with the given probability how far from the mean value the real value is expected to be. Uncertainty estimation allows users to decide if they can trust the model and when there is a need for an additional human decision.

Literature:
Project Ideas:
  • Implement algorithm for different uncertainity estimation methods and compare the results for one of the following input data: tabular data, images, or text.

Artificial Intelligence (AI) is attracting the attention of more and more specialists, not only computer scientists and statisticians but also medical personnel, engineers, and economists. AI enables the modeling of complex phenomena receiving high accuracy of results. One of the limitations is the fact that AI models are “black boxes”. It means that it is difficult to explain the relationship between the input features and the output. Consequently, we - end users- are not sure if the model is trustworthy and safe. To overcome this problem different algorithms and approaches have been developed to explain predictions. It also allows estimating which features have the highest impact on the result. As a result, the users can assess the outcome with state-of-the-art knowledge and discover new patterns in an analyzed phenomenon.

Literature:
Project Ideas:
  • Implement algorithm for different explainable AI methods and compare the results for one of the following input data: tabular data, images, or text.

"Dual use goods are products and technologies normally used for civilian purposes but which may have military applications." [1] Like many technologies, AI has the potential to be used in military applications. Be it object detection used in autonomous killer drones to localize targets or deep fakes used to spread misinformation online. How can military use be avoided? Which areas should not be researched? Do we need a ban on "weaponizable" AI such as facial recognition?

Literature:
Project Ideas:
  • Implement a possible defense for a dual-use AI technology (see: Dodging Attack Using Carefully Crafted Natural Makeup)

Training deep neural networks is energy-intensive. It is estimated that training GPT-3 required about 190 000 kWh. [2] Are advances in AI worth this environmental cost? Which experiments are worth running? How can we reduce the energy consumption or the number of experiments that need to be run?

Literature:
Project Ideas:
  • Compare the energy consumption of different ML algorithms on the same problem
  • Implement ideas to reduce energy consumption and benchmark them

Presentation

Your presentation should contain the following parts:

  • What is the topic?
  • How is it defined?
  • Are there multiple, different definitions?
  • Why is it important?
  • Present a method/paper/tool which addresses the problem
  • Check topic description/literature section for some suggestions
  • Explain the main idea
  • Highlight benefits and potential shortcomings
  • Provide 2-3 points/questions to start the interactive discussion

Report

For some tips see here .

Literature