Moral Machine

Moral Machine is an online platform, developed by Iyad Rahwan's Scalable Cooperation group at the Massachusetts Institute of Technology, that generates moral dilemmas and collects information on the decisions that people make between two destructive outcomes. The platform is the idea of Iyad Rahwan and social psychologists Azim Shariff and Jean-François Bonnefon, who conceived of the idea ahead of the publication of their article about the ethics of self-driving cars. The key contributors to building the platform were MIT Media Lab graduate students Edmond Awad and Sohan Dsouza.

The presented scenarios are often variations of the trolley problem, and the information collected would be used for further research regarding the decisions that machine intelligence must make in the future. For example, as artificial intelligence plays an increasingly significant role in autonomous driving technology, research projects like Moral Machine help to find solutions for challenging life-and-death decisions that will face self-driving vehicles.

Moral Machine was active from January 2016 to July 2020. The Moral Machine continues to be available on their website for people to experience.

The experiment
The Moral Machine was an ambitious project; it was the first attempt at using such an experimental design to test a large number of humans in over 200 countries worldwide. The study was approved by the Institute Review Board (IRB) at Massachusetts Institute of Technology (MIT).

The setup of the experiment asks the viewer to make a decision on a single scenario in which a self-driving car is about to hit pedestrians. The user can decide to have the car either swerve to avoid hitting the pedestrians or keep going straight to preserve the lives it is transporting.

Participants can complete as many scenarios as they want to, however the scenarios themselves are generated in groups of thirteen. Within this thirteen, a single scenario is entirely random while the other twelve are generated from a space in a database of 26 million different possibilities. They are chosen with two dilemmas focused on each of six dimensions of moral preferences: character gender, character age, character physical fitness, character social status, character species, and character number.

The experiment setup remains the same throughout multiple scenarios but each scenario tests a different set of factors. Most notably, the characters involved in the scenario are different in each one. Characters may include ones such as: Stroller, girl, boy, pregnant, Male Doctor, Female Doctor, Female Athlete, Executive Female, Male Athlete, Executive Male, Large Woman, Large Man, homeless, old man, old woman, dog, criminal, and a cat.

Through these different characters researchers were able to understand how a wide variety of people will judge scenarios based on those involved.

Analysis
Analysis of the data collected through Moral Machine showed broad differences in relative preferences among different countries, and correlations between these preferences and various national metrics.

The data was synthesized by a conjoint analysis to compute the average marginal component effect (AMCE) of each attribute that the Moral Machine tested. These attributes tested nine factors: sparing humans (versus pets), staying on course (versus swerving), sparing passengers (versus pedestrians), sparing more lives (versus fewer lives), sparing men (versus women), sparing the young (versus the elderly), sparing pedestrians who cross legally (versus jaywalking), sparing the fit (versus the less fit), and sparing those with higher social status (versus lower social status). Some characters possessed other attributes (such as pregnancy, doctors, criminals, etc.) That did not fall into these tested factors.

Globally, participants favored human lives over lives of animals like dogs and cats. They wanted to spare more lives than less and also wanted to spare younger lives as compared to older. Babies were most often spared with cats being the least spared. In terms of gender variations, male doctors and old men were spared moreso than female doctors and old women. While female athletes and larger females were spared moreso than male athletes and larger men. All three clusters shared the preference to spare pedestrians over passengers and law-abiders over criminals.

Cultural clusters
Because the experiment was run on a global scale, researchers were able to further breakdown data to see what separate cultures and regions value. To conduct this detailed analysis, researchers looked at 130 countries with at least 100 people who gave data to the Morale Machine.

Researchers were able to separate out similar findings into multiple groups of regions on earth which they termes ‘cultural clusters’.

The first cluster, which researchers dubbed the Western cluster, contains North America and European countries of Protestant, Catholic, and Orthodox Christian cultural groups. The second cluster, termed the Eastern cluster, contains eastern countries such as Japan and Taiwan as well as Islamic countries such as Indonesia, Pakistan, and Saudi Arabia. The third cluster, termed the Southern cluster, consists of the Latin American countries of Central and South America as well as some countries with French influence such as French overseas territories, and ones that were at some point under French leadership.

Being able to show cultural clusters of information suggests that there are regional and cultural specific moral patterns that may allow groups of territories to have a shared standard of ethics when it comes to machines.

Clusters By Region
Researchers found that cultural clusters varied in a few ways. The Eastern cluster, for example, did not have as much preference to spare younger humans compared to the other two clusters and had a higher preference for sparing law-abiding humans. The Western cluster had a higher preference for inaction on the part of the driver and thus had less of a preference for sparing pedestrians as compared to other clusters. The Southern cluster had a higher preference for sparing females, the young, the fit, and those of higher status, but a lower preference for sparing humans over pets or other animals.

Individual vs Collectivist Cultures
Participants from individualistic cultures had a higher preference to spare the greater number of people. This may be due to an individualistic society’s emphasis on the value of each individual. On the other hand, respondents from cultures that are more collectivist had a stronger preference to spare old lives over younger ones. This is likely explained by collectivism’s priority on group well-being over individual value, as well as the collectivistic culture’s tradition of valuing and respecting the elderly population. For instance, China ranked far below the world average for preference to spare the younger over elderly, as well as sparing more lives over less. On the other hand, the average respondent from the US exhibited a much higher tendency to save younger lives and larger groups.

Developed vs Undeveloped countries
Participants from countries that are less wealthy and have weaker institutions showed a higher tendency of sparing pedestrians who crossed illegally compared to those from more wealthy and developed countries. This is most likely due to their experience living in a society where individuals are more likely to deviate from rules due to less stringent enforcement of laws.

Economic Inequality
The extent of economic equality in a country is an accurate predictor of whether they are more likely to prefer sparing those of high versus low status. Countries with a higher Gini Coefficient– used by the World Bank to measure economic inequality in a country– are more likely to spare higher-class individuals. In other words, an individual from a country of higher economic inequality would be more likely to spare an executive over a homeless person. The same relationship can be observed for the preference of sparing wealthy lives over less wealthy ones– countries of higher economic inequality overwhelmingly prefer to save richer lives over poorer ones.

Source data and code to reproduce results of the analysis can be found on the existing Morale Machine site. The data can be used by other researchers to draw different conclusions and analysis. Be sure to check the source for licensing concerns.

Applications of the data
The findings from the moral machine can help decision makers when designing self-driving automotive systems. Designers must make sure that these vehicles are able to solve problems on the road that aligns with the moral values of humans around it.

This is a challenge because of the complex nature of humans who may all make different decisions based on their personal values. However, by collecting a large amount of decisions from humans all over the world, researchers can begin to understand patterns in the context of a particular culture, community, and people.

Other features
The Moral Machine was deployed in June 2016. In October 2016, a feature was added that offered users the option to fill a survey about their demographics, political views, and religious beliefs. Between November 2016 and March 2017, the website was progressively translated into nine languages in addition to English (Arabic, Chinese, French, German, Japanese, Korean, Portuguese, Russian, and Spanish).

Overall, the Moral Machine offers four different modes (see Supplementary Information), with  the focus being on the data-gathering feature of the website, called the Judge mode.

This means that the Moral Machine, in addition to providing their own scenarios for users to judge, also invites users to create their own scenarios to be submitted and approved so that other people may also judge those scenarios. Data is also open sourced for anyone to explore via an interactive map that is featured on the Moral Machine website.