Reading List

Is “ethical AI” an oxymoron? from hiddedevries.nl Blog RSS feed.

Is “ethical AI” an oxymoron?

It depends on who you ask. But the current wave of generative AI has unpleasant side effects that are hard to ignore: large-scale copyright infringements, environmental impact and bias.

(In this post, I'll use “AI” as a shortcut for generating or modifying content, like text, code, images and videos, and asking questions using LLM-based systems.)

Our industry seems divided on generative AI. Some are loudly excited, others use it begrudgingly. Some hear from management that they must use more AI. We probably all use LLMs secondhand, as they are under the hood of tools like translation services and auto captioning. In this post, I'm specifically talking about chatbots that generate responses to prompts.

They seem to save people time. At the same time, many people avoid the tools altogether. Their results are often underwhelming, as they hallucinate and yield cringy results. And then there are the aforementioned ethical side-effects.

As someone who likes AI in principle, I did some in university, the promise of “ethical AI” sounds good to me. And necessary. But “ethical” is not a very precise qualification. It's a promise of sorts.

How can we tell if the promise is fulfilled? If this isn't clear, the phrase is a free for all, that marketing departments will like more than the rest of us.

What's ethics?

Ethics, and I'm going to simplify it a bit here, studies how to act and what to do. It is practiced as a field in academia, but everyone can practice it, really. You survey the facts about a given situation and weigh them against your course of action.

You can personally weigh them asking yourself: would I like to be on the receiving end of these specifics? Do I think the world at large benefits more than it is harmed? Am I sure?

There's no standard ethical answer. Who is ok with varies (see also: politics).

Corporations, including AI vendors, do this too, but they don't always match their statements with their actions. The ethics departments are usually different from the sales or product departments.

With that in mind, let’s survey the facts about generative AI. A lot needs to happen to make it, but I'll focus on three things: training models, sourcing electricity and crafting a system that can respond to prompts with even-handed and reasonable-looking answers.

Training language models

What's involved

To make generative AI work, engineers automatically show the system enormous amounts of labelled examples of stuff, such that the machine is at some point able to start recognising patterns in the stuff, without being told what’s what. That way, the machine ‘learns’ to represent the stuff, plausibly. For instance, to reproduce text, the system needs to be fed very large amounts of text. The amount is key here, generally won't work without a lot of so-called “training data”.

Major LLM providers train their tools on content they have taken from the web, including copyrighted material. Meta, for instance, trained their Llama 3 model on Library Genesis data, which includes millions of pirated academic papers and novels, from Sally Rooney's to Min Jin Lee. Open AI, the company that built ChatGPT, told a UK House of Lords investigation committee that they cannot “train their AI models without using copyrighted materials”.

The amount of copyright-free data is not large enough to build language models as large as they need, and the AI vendors can't afford or arrange to pay all the copyright holders whose content they want. OpenAI operates at a loss with ChatGPT Pro. So the companies preemptively take what's not theirs, while attempting to have governments change the rules.

There is a lot to unpack regarding these practices, but maybe my biggest concern is with the perceived value of all that original content. Art in particular. Novels, paintings, records and cinema, unlike LLMs, are the bones of our society. They can move us, make us think, take a stance and capture complex human experiences in ways that resonate with us, humans. There’s something ironic about taking that work, in order to make systems that don’t do these things (creativity cannot be computed).

OpenAI CEO Sam Altman, who's part of the subset of tech folk that is happily still on Twitter, is worried too. He said they will introduce limits. On March 27, he posted, one of his first posts since he tweeted the new president “will be incredible for the country”:

it's super fun seeing people love images in chatgpt.

but our GPUs are melting.

we are going to temporarily introduce some rate limits while we work on making it more efficient. hopefully won't be long!

He shows concern with server costs, and remains silent about stealing art.

They then ‘finetune’ the models. This work is often outsourced to low wage workers in the global south, and this work traumatised some of these workers.

Making it more ethical

Training models, as it happens today, is not ethical, in three ways:

  1. large amounts of content are taken without permission.
  2. the work of writers, musicians, photographers, illustrators and other artists is treated as merely input, rather than treasured as valuable in itself. That's bad for the artists, their audiences and the world at large.
  3. some workers are traumatised and underpaid during finetuning.

For ethical training, it seems to me that vendors should at least train models without stealing, and finetune models without traumatising and underpaying workers.

Regarding valuing art: the tools can simply refuse to resemble original works. To the prompt “Make this in the style of Van Gogh“, an ethical model could return “No, not happening, go make art of your own.”

They do this, to some extent, and the “refusal” is actually mentioned in OpenAI's system card for GPT-4o on native image generation (2.4.2). But when all my social media feeds are full of images that do resemble original art, and even Sam Altman's profile picture is in Studio Ghibli style, that's just promises. Promises that are clearly and obviously not fulfilled.

Using electricity

What's involved?

Generative AI also won't work without a lot of electricity. Most of it is used during the ‘learning’ phase, and some more during the usage stage. The International Agency expects electricity demand from AI-optimised data centres projected to more than quadruple by 2030.

Aside from electricity usage for doing the actual training, there is also the collection of data for the training sets that adds to electricity usage. See Xe Laso's account of how Amazon's AI crawler makes their server unstable, Dennis Schubert who said 70% of the work his server does is for AI crawlers, Drew DeVault of SourceHut who spends 20-100% of his time in a given week on mitigating hyper-aggressive crawlers and Wikimedia, who saw a 50% increase of traffic due to crawlers on their images. More traffic = more emissions.

The emissions increase associated with AI contributes to faster global warming, because not all energy is green all the time. Period. Sometimes, in some locations, there is 100% green energy, which has very low emissions. There are data centres that take advantage of this. Also, Amazon, Microsoft, Alphabet and Meta are the four largest corporate purchasers of renewable energy, the latter three even buy 100% of their energy renewable. Some of these big tech companies are also using AI to increase their efficiency, which is nice.

But as this paper by Alexandra Sasha Luccioni, Emma Strubell and Kate Crawford, that compares the “AI bad for climate” and “AI actually good for climate” perspectives concludes:

We cannot simply hope for the best outcome.

The consequences of global warming are too big to ignore.

Making it more ethical

The electricity usage of AI is not ethical in these ways:

  • data centre electricity usage has exponentially (!) increased, due to AI model training and usage, with insufficient consideration for utility (use cases beyond “it's super fun”).
  • AI crawlers that gather content to train on are increasing the energy consumption of individual websites.

To make it more ethical:

  • We should all use AI only if we are sure that it is strictly necessary and more helpful than less energy-intensive technologies. Not “just because we can”, which seems to be the current mode.
  • Vendors should develop their models more efficiently and reduce emissions at scale (they are doing both, which is great!).
  • AI crawlers should respect websites's preferences around being crawled (many websites and services don't want to be crawled) to avoid increase of their electricity usage. IETF is looking at standardising building blocks for consent.

Representing the world accurately and fairly

The last thing generative AI needs, is to train models specifically in ways that lead them to accurately and fairly represent the world. Today's Large Language Models struggle with that. Without exceptions, they have a bias problem.

Bias disproportionately affects minorities, it further marginalises the marginalised. Practical examples include front-end code generators that output inaccessible code, automated resume checkers that statistically prefer men, medical diagnosis systems that perform worse on black people and chatbots that assume female doctors don't exist. Some extremist LLM-vendors, like Elon Musk’s X, specifically train their models to include more bias, in attempts for them to be ‘anti-woke’ (whatever that means, if not a lexical tool for hatred). Even AI vendors that try hard to reduce bias, struggle to keep it out, as it is ingrained in large parts of the training data.

There can also be a problem of bias in people employing AI. For instance, when a product manager who is not disabled or familiar with the disabled experience decides AI-generated captions will do. Sidenote: I also hear from disabled friends and colleagues that various AI-based tools remove barriers, that is great.

More ethical implications

There are more aspects to generative AI that pose ethical questions than I can cover here.

Other surveys of the ethics of generative AI include the Ethical Principles for Web Machine Learning by the W3C's Machine Learning Working Group, and UNESCO's Ethics of Artificial Intelligence.

Summing up

If you're not sure if or how you want to use generative AI, I hope this post provides a useful overview of possible considerations. Everyone will need to decide for themselves how much AI usage they find OK for them. I do hope we'll stop pretending not to know how the sausage is made.

For me personally, the current ethical issues mean that I default to not using generative AI. To me, “ethical AI” is currently an oxymoron. Still, “defaulting to” is the keyword here. I don't think all use of AI is problematic, it is specifically the throwing AI at things “just because we can” that is at odds with the ethical considerations I laid out in this post.

Your mileage may vary. And in case it was not clear from my post: I recognise everyone's needs are different. People share use cases that they have and I didn't think of all the time. Sometimes utility could outweigh downsides. I'm curious to hear yours.

We're not currently weighing the ethics of AI with its utility, not everyone got the memo. If you are able to, please help in talking about AI ethics with your colleagues. Whichever part of it makes sense to you in this post, or in any of the sources listed in this post or elsewhere. Cheers!


Originally posted as Is “ethical AI” an oxymoron? on Hidde's blog.

Reply via email