Get smarter, faster. Subscribe to our daily newsletter.
Why aligning AI to our values may be harder than we think
Can we stop a rogue AI by teaching it ethics? That might be easier said than done.
- One way we might prevent AI from going rogue is by teaching our machines ethics so they don't cause problems.
- The questions of what we should, or even can, teach computers remains unknown.
- How we pick the values artificial intelligence follows might be the most important thing.
Plenty of scientists, philosophers, and science fiction writers have wondered how to keep a potential super-human AI from destroying us all. While the obvious answer of "unplug it if it tries to kill you" has many supporters (and it worked on the HAL 9000), it isn't too difficult to imagine that a sufficiently advanced machine would be able to prevent you from doing that. Alternatively, a very powerful AI might be able to make decisions too rapidly for humans to review for ethical correctness or to correct for the damage they cause.
The issue of keeping a potentially super-human AI from going rogue and hurting people is called the "control problem," and there are many potential solutions to it. One of the more frequently discussed is "alignment" and involves syncing AI to human values, goals, and ethical standards. The idea is that an artificial intelligence designed with the proper moral system wouldn't act in a way that is detrimental to human beings in the first place.
However, with this solution, the devil is in the details. What kind of ethics should we teach the machine, what kind of ethics can we make a machine follow, and who gets to answer those questions?
Iason Gabriel considers these questions in his new essay, "Artificial Intelligence, Values, and Alignment." He addresses those problems while pointing out that answering them definitively is more complicated than it seems.
What effect does how we build the machine have on what ethics the machine can follow?
Humans are really good at explaining ethical problems and discussing potential solutions. Some of us are very good at teaching entire systems of ethics to other people. However, we tend to do this using language rather than code. We also teach people with learning capabilities similar to us rather than to a machine with different abilities. Shifting from people to machines may introduce some limitations.
Many different methods of machine learning could be applied to ethical theory. The trouble is, they may prove to be very capable of absorbing one moral stance and utterly incapable of handling another.
Reinforcement learning (RL) is a way to teach a machine to do something by having it maximize a reward signal. Through trial and error, the machine is eventually able to learn how to get as much reward as possible efficiently. With its built-in tendency to maximize what is defined as good, this system clearly lends itself to utilitarianism, with its goal of maximizing the total happiness, and other consequentialist ethical systems. How to use it to effectively teach a different ethical system remains unknown.
Alternatively, apprenticeship or imitation learning allows a programmer to give a computer a long list of data or an exemplar to observe and allow the machine to infer values and preferences from it. Thinkers concerned with the alignment problem often argue that this could teach a machine our preferences and values through action rather than idealized language. It would just require us to show the machine a moral exemplar and tell it to copy what they do. The idea has more than a few similarities to virtue ethics.
The problem of who is a moral exemplar for other people remains unsolved, and who, if anybody, we should have computers try to emulate is equally up for debate.
At the same time, there are some moral theories that we don't know how to teach to machines. Deontological theories, known for creating universal rules to stick to all the time, typically rely on a moral agent to apply reason to the situation they find themselves in along particular lines. No machine in existence is currently able to do that. Even the more limited idea of rights, and the concept that they should not be violated no matter what any optimization tendency says, might prove challenging to code into a machine, given how specific and clearly defined you'd have to make these rights.
After discussing these problems, Gabriel notes that:
"In the light of these considerations, it seems possible that the methods we use to build artificial agents may influence the kind of values or principles we are able encode."
This is a very real problem. After all, if you have a super AI, wouldn't you want to teach it ethics with the learning technique best suited for how you built it? What do you do if that technique can't teach it anything besides utilitarianism very well but you've decided virtue ethics is the right way to go?
If philosophers can't agree on how people should act, how are we going to figure out how a hyper-intelligent computer should function?
The important thing might not be to program a machine with the one true ethical theory, but rather to make sure that it is aligned with values and behaviors that everybody can agree to. Gabriel puts forth several ideas on how to decide what values AI should follow.
A set of values could be found through consensus, he argues. There is a fair amount of overlap in human rights theory among a cross-section of African, Western, Islamic, and Chinese philosophy. A scheme of values, with notions like "all humans have the right to not be harmed, no matter how much economic gain might result from harming them," could be devised and endorsed by large numbers of people from all cultures.
Alternatively, philosophers might use the "Veil of Ignorance," a thought experiment where people are asked to find principles of justice that they would support if they didn't know what their self-interests and societal status would be in a world that followed those principles, to find values for an AI to follow. The values they select would, presumably, be ones that would protect everyone from any mischief the AI could cause and would assure its benefits would reach everyone.
Lastly, we could vote on the values. Instead of figuring out what people would endorse under certain circumstances or based on the philosophies they already subscribe to, people could just vote on a set of values they want any super AI to be bound to.
All of these ideas are also burdened by the present lack of a super AI. There isn't a consensus opinion on AI ethics yet, and the current debate hasn't been as cosmopolitan as it would need to be. The thinkers behind the Veil of Ignorance would need to know the features of the AI they are planning for when coming up with a scheme of values, as they would be unlikely to choose a value set that an AI wasn't designed to process effectively. A democratic system faces tremendous difficulties in assuring a just and legitimate "election" for values that everybody can agree on was done correctly.
Despite these limitations, we will need an answer to this question sooner rather than later; coming up with what values we should tie an AI to is something you want to do before you have a supercomputer that could cause tremendous harm if it doesn't have some variation of a moral compass to guide it.
While artificial intelligence powerful enough to operate outside of human control is still a long way off, the problem of how to keep them in line when they do arrive is still an important one. Aligning such machines with human values and interests through ethics is one possible way of doing so, but the problem of what those values should be, how to teach them to a machine, and who gets to decide the answers to those problems remains unsolved.
- Cognitive Biases and AI Value Alignment: An Interview with Owain ... ›
- AI Value Alignment isn't a Problem if We Don't Coexist | Dan Faggella ›
- How Do We Align Artificial Intelligence with Human Values? - Future ... ›
- The Value Alignment Problem ›
- Artificial Intelligence, Values and Alignment | DeepMind ›
What is human dignity? Here's a primer, told through 200 years of great essays, lectures, and novels.
- Human dignity means that each of our lives have an unimpeachable value simply because we are human, and therefore we are deserving of a baseline level of respect.
- That baseline requires more than the absence of violence, discrimination, and authoritarianism. It means giving individuals the freedom to pursue their own happiness and purpose.
- We look at incredible writings from the last 200 years that illustrate the push for human dignity in regards to slavery, equality, communism, free speech and education.
The inherent worth of all human beings<p>Human dignity is the inherent worth of each individual human being. Recognizing human dignity means respecting human beings' special value—value that sets us apart from other animals; value that is intrinsic and cannot be lost.</p> <p>Liberalism—the broad political philosophy that organizes society around liberty, justice, and equality—is rooted in the idea of human dignity. Liberalism assumes each of our lives, plans, and preferences have some unimpeachable value, not because of any objective evaluation or contribution to a greater good, but simply because they belong to a human being. We are human, and therefore deserving of a baseline level of respect. </p> <p>Because so many of us take human dignity for granted—just a fact of our humanness—it's usually only when someone's dignity is ignored or violated that we feel compelled to talk about it. </p> <p>But human dignity means more than the absence of violence, discrimination, and authoritarianism. It means giving individuals the freedom to pursue their own happiness and purpose—a freedom that can be hampered by restrictive social institutions or the tyranny of the majority. The liberal ideal of the good society is not just peaceful but also pluralistic: It is a society in which we respect others' right to think and live differently than we do.</p>
From the 19th century to today<p>With <a href="https://books.google.com/ngrams/graph?year_start=1800&year_end=2019&content=human+dignity&corpus=26&smoothing=3&direct_url=t1%3B%2Chuman%20dignity%3B%2Cc0" target="_blank" rel="noopener noreferrer">Google Books Ngram Viewer</a>, we can chart mentions of human dignity from 1800-2019.</p><img type="lazy-image" data-runner-src="https://assets.rebelmouse.io/eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpbWFnZSI6Imh0dHBzOi8vYXNzZXRzLnJibC5tcy8yNDg0ODU0My9vcmlnaW4ucG5nIiwiZXhwaXJlc19hdCI6MTY1MTUwMzE4MX0.bu0D_0uQuyNLyJjfRESNhu7twkJ5nxu8pQtfa1w3hZs/img.png?width=980" id="7ef38" class="rm-shortcode" data-rm-shortcode-id="9974c7bef3812fcb36858f325889e3c6" data-rm-shortcode-name="rebelmouse-image" />
American novelist, writer, playwright, poet, essayist and civil rights activist James Baldwin at his home in Saint-Paul-de-Vence, southern France, on November 6, 1979.
Credit: Ralph Gatti/AFP via Getty Images
The future of dignity<p>Around the world, people are still working toward the full and equal recognition of human dignity. Every year, new speeches and writings help us understand what dignity is—not only what it looks like when dignity is violated but also what it looks like when dignity is honored. In his posthumous essay, Congressman Lewis wrote, "When historians pick up their pens to write the story of the 21st century, let them say that it was your generation who laid down the heavy burdens of hate at last and that peace finally triumphed over violence, aggression and war."</p> <p>The more we talk about human dignity, the better we understand it. And the sooner we can make progress toward a shared vision of peace, freedom, and mutual respect for all. </p>
With just a few strategical tweaks, the Nazis could have won one of World War II's most decisive battles.
- The Battle of Britain is widely recognized as one of the most significant battles that occurred during World War II. It marked the first major victory of the Allied forces and shifted the tide of the war.
- Historians, however, have long debated the deciding factor in the British victory and German defeat.
- A new mathematical model took into account numerous alternative tactics that the German's could have made and found that just two tweaks stood between them and victory over Britain.
Two strategic blunders<p>Now, historians and mathematicians from York St. John University have collaborated to produce <a href="http://www-users.york.ac.uk/~nm15/bootstrapBoB%20AAMS.docx" target="_blank">a statistical model (docx download)</a> capable of calculating what the likely outcomes of the Battle of Britain would have been had the circumstances been different. </p><p>Would the German war effort have fared better had they not bombed Britain at all? What if Hitler had begun his bombing campaign earlier, even by just a few weeks? What if they had focused their targets on RAF airfields for the entire course of the battle? Using a statistical technique called weighted bootstrapping, the researchers studied these and other alternatives.</p><p>"The weighted bootstrap technique allowed us to model alternative campaigns in which the Luftwaffe prolongs or contracts the different phases of the battle and varies its targets," said co-author Dr. Jaime Wood in a <a href="https://www.york.ac.uk/news-and-events/news/2020/research/mathematicians-battle-britain-what-if-scenarios/" target="_blank">statement</a>. Based on the different strategic decisions that the German forces could have made, the researchers' model enabled them to predict the likelihood that the events of a given day of fighting would or would not occur.</p><p>"The Luftwaffe would only have been able to make the necessary bases in France available to launch an air attack on Britain in June at the earliest, so our alternative campaign brings forward the air campaign by three weeks," continued Wood. "We tested the impact of this and the other counterfactuals by varying the probabilities with which we choose individual days."</p><p>Ultimately, two strategic tweaks shifted the odds significantly towards the Germans' favor. Had the German forces started their campaign earlier in the year and had they consistently targeted RAF airfields, an Allied victory would have been extremely unlikely.</p><p>Say the odds of a British victory in the real-world Battle of Britain stood at 50-50 (there's no real way of knowing what the actual odds are, so we'll just have to select an arbitrary figure). If this were the case, changing the start date of the campaign and focusing only on airfields would have reduced British chances at victory to just 10 percent. Even if a British victory stood at 98 percent, these changes would have cut them down to just 34 percent.</p>
A tool for understanding history<p>This technique, said co-author Niall Mackay, "demonstrates just how finely-balanced the outcomes of some of the biggest moments of history were. Even when we use the actual days' events of the battle, make a small change of timing or emphasis to the arrangement of those days and things might have turned out very differently."</p><p>The researchers also claimed that their technique could be applied to other uncertain historical events. "Weighted bootstrapping can provide a natural and intuitive tool for historians to investigate unrealized possibilities, informing historical controversies and debates," said Mackay.</p><p>Using this technique, researchers can evaluate other what-ifs and gain insight into how differently influential events could have turned out if only the slightest things had changed. For now, at least, we can all be thankful that Hitler underestimated Britain's grit.</p>
We’ve mapped a million previously undiscovered galaxies beyond the Milky Way. Take the virtual tour here.
See the most detailed survey of the southern sky ever carried out using radio waves.
Astronomers have mapped about a million previously undiscovered galaxies beyond the Milky Way, in the most detailed survey of the southern sky ever carried out using radio waves.
A new study shows our planet is much closer to the supermassive black hole at the galaxy's center than previously estimated.
Arrows on this map show position and velocity data for the 224 objects utilized to model the Milky Way Galaxy. The solid black lines point to the positions of the spiral arms of the Galaxy. Colors reflect groups of objects that are part of the same arm, while the background is a simulation image.
Apple sold its first iPod in 2001, and six years later it introduced the iPhone, which ushered in a new era of personal technology.