24. Artificial Intelligence: What Is It? What Is It Not? (feat. Susan Farrell, Principal UX Researcher at mmhmm.app)

NN/g UX Podcast

The Nielsen Norman Group (NNg) UX Podcast is a podcast on user experience research, design, strategy, and professions, hosted by Senior User Experience Specialist Therese Fessenden. Join us every month as she interviews industry experts, covering common questions, hot takes on pressing UX topics, and tips for building truly great user experiences. For free UX resources, references, and information on UX Certification opportunities, go to: www.nngroup.com The Nielsen Norman Group (NNg) UX Podcast is a podcast on user experience research, design, strategy, and professions, hosted by Senior User Experience Specialist Therese Fessenden. Join us every month as she interviews industry experts, covering common questions, hot takes on pressing UX topics, and tips for building truly great user experiences. For free UX resources, references, and information on UX Certification opportunities, go to: www.nngroup.com

Transcribed podcasts: 41
Time transcribed: 22h 36m 34s

results.

Mention graph

This graph shows how many times the word ______ has been mentioned throughout the history of the program.

This is the Nielsen Norman Group UX Podcast.
I'm Therese Fessenden.
We're finally back with a new episode to kick off the first Friday of the new year.
But before we get into the details about today's episode, I wanted to share some exciting news.
We're hiring.
We're hiring user experience specialists, which means we're looking for designers and
researchers to join our ranks, both entry level and experienced professionals.
So whether you're graduating grad school and just getting into the UX field, or if you've
worked in UX for the past few years, we encourage you to apply.
The deadline to submit applications is Monday, January 30th.
To learn more and apply, check out the announcements on our website, www.nngroup.com.
Now onto today's topic, artificial intelligence.
The term artificial intelligence, AI, is having a bit of a boom with the explosion in popularity
of tools like ChatGPT, Lenza, Dolly, and many others.
This is naturally an exciting time with prospects of increased productivity, creativity, and
more interestingly, automation of tasks that in the past would have been considered drudgery,
tedious, or to put it bluntly, boring.
These praises have been equally met by skepticism and criticism with cautionary tales about
AI misinformation, plagiarism, and other risks.
To sort through the mixed feedback, I spoke with Susan Farrell.
Some of you may recognize this name because Susan is what one might call an energy alumnus.
But to be honest, that would really sell short her 18 years of researching and consulting
work with Nielsen Norman Group.
She's authored many articles and reports with us over those years.
But since 2017, she's been principal UX researcher at AllTurtles, an early stage product studio
where she's been spending much of her time researching various products powered by AI.
Susan and I discuss what AI is, what it isn't, and the benefits and risks that come with
these new systems.
Susan, welcome to the podcast.
So to start, could you tell us a bit about yourself and how your journey in the field
of user experience got you where you are now, researching AI products?
Sure.
Well, I started computing in 1983 when I went to college and needed a word processor.
But it wasn't until 1991 that I started working in multimedia on computers, first at Georgia
Tech, then at SGI and Sun Microsystems.
I became fascinated by user interface design and taught myself by reading books and magazines
and by going to conferences like CHI.
When I found Jacob's newsletter, I got interested in web usability testing, and I started working
with Jacob in 1999.
After 18 years of interesting client consulting in 2017, I moved to a product studio called
AllTurtles.
At the time, they had several chatbot-based products in the studio, which gave me a chance
to test some of those with users.
It turns out that people like talking to machines in some situations, but it's complicated to
make a chatbot that meets people's needs and expectations.
At least in my testing experience, people especially don't like computers pretending
to be people or people pretending to be computers.
Yeah, I can imagine it can be hard to overcome the uncanny valley.
So for those of you who don't know the uncanny valley, it's that expansive product fidelity,
for lack of a better word, whether that's images or text, where it's high enough fidelity
to seem human-like but not high enough to be convincingly human.
So it can kind of feel uncanny or it feels kind of creepy, and I guess more importantly,
that creep factor kind of prevents trustworthiness because it seems like an impersonation of
some kind.
Either that's an impersonation of a computer or an impersonation of a human.
Well, AI is an interesting term in that we've used it for lots of things for a century or
so, and what it actually is keeps kind of receding into the future.
So whatever we have now is not AI, but it's coming soon, coming soon, coming soon, kind
of the same way that VR has kind of been doing that.
But various technologies have emerged on the way.
It's important, for starters, it's important to understand that it's actually not intelligent.
So at least not yet, not in the foreseeable future.
Machines don't so much create as they remix and find patterns and relationships between
things.
What we call AI today is a mixed bag that includes machine learning, machine vision,
language models, image recognition, game playing systems, and other types of automation and
augmentation.
So we're not talking about Star Trek here, or intelligent robots, we're still talking
about machines that do clever things, some of which can learn to do things in a more
clever way.
You mentioned that they're not intelligent.
Could you elaborate a little bit more about what intelligent means in this context?
Well, that's a hot topic, and I don't know if I want to wade into it.
The main problem is that we don't know how to define human intelligence.
So it's very difficult to establish some kind of baseline and then apply that to machine
systems.
But there's a sense that in order for a machine system to be intelligent, it would have to
have its own motivations, and it would have to have a sense of itself.
I see.
So it seems like the key distinction is artificial intelligence, as we currently have it, doesn't
necessarily have its own motives.
So the AI machines, as you're saying, are capable of doing clever things and maybe doing
them quickly, but that's about it.
Is that a fair assessment?
Yes, they're augmenting things that humans do or want to do, and automating that, hopefully
to take the drudge work away from humans.
Yeah, I bet that's really top of mind for a lot of people.
I mean, obviously there are a lot more current examples like ChatGPT or Lenza and many other
organizations working on tech like this to remove the drudgery of some common tasks.
I'm wondering though, while there's a lot of promise in maximizing productivity among
other things, there are probably some trade-offs as well.
As you may know, we teach a course on design trade-offs, where we talk about how every
design decision comes with costs in some way.
So what do you think are the biggest trade-offs when using AI in a design?
Well, the biggest risk is probably the usual, over-promising and under-delivering.
What people often want and expect from these systems is beyond the technologies that we
have today.
Just as you alluded to, the science fiction has kind of set us up for failure already.
With chatbots in particular, you have to choose between making a system with a narrow purpose
and controllable output that takes lots of scripting, or using a large language model
that learns but tends to absorb things you might not want in your product, such as racism
or X-rated language.
I see.
So it seems like there is a lot of potential for unintended consequences in how a machine
learns, basically.
The input seems to be a pretty important factor.
That's right.
One of the first rules I learned about computers is garbage in, garbage out.
Because computers can act only on the data and instructions you give them, it's very
important to have the data and instructions be of very high quality.
We see a lot of quality shortcuts today in how large language systems are being trained.
You can't have a system just ingest convenient parts of the internet and then expect it to
output something great.
We also see a lot of mediocre output from these systems being deployed on the web.
So it's important not to be a naive consumer of that.
And in a continuation of garbage in, garbage out, we're in a tricky time in terms of bad
information outputs becoming the next wave of bad information inputs.
And it's not just quality of data input that you have to worry about, it's the consent
of the people who made that data that's often lacking.
The rush to train these systems has left a lot of human beings as well as companies in
legal and ethical dilemmas.
Yeah, I bet.
And we've seen some awful things come out of these systems too, such as racism, sexism,
violent imagery, and other amplifications of big problems that human societies have.
I don't know how to mitigate some of these risks, but it seems worth spending the time
to do threat and risk assessments before designing.
For example, what could possibly go wrong?
Yeah, often the famous last words if not taken seriously.
Yeah.
What are the likely consequences of that?
Who could be harmed?
How might we prevent that?
For large language models, organizations tend to train them to a certain point and then
clone them and then deploy one and see what happens when it's in the wild.
And if it fails, they fall back on a clone and try to learn from their mistakes in training
the next one.
And in that way, they mitigate the enormous cost in developing these models because they
often fail and then have to be redone.
And you don't want to redo them from scratch.
Right.
It seems like when teams start building these systems, there's a lot of learning that happens.
And in a way, I mean, it sounds a lot like when you're training or like educating a person.
Like you don't necessarily want to throw out all of the potty training in elementary school
along with whatever advanced topics you're teaching now.
I imagine that's got to require a lot of work to retrain some of those more basic pieces
of knowledge that help it function.
So is version control sort of what is helping to keep this technology moving forward while
still keeping unwanted behaviors, for lack of a better term, in check?
I don't use these big systems on myself because they are extremely expensive.
Only a few companies are able to develop them.
But from what I read, they are extremely expensive to develop.
They use a lot of energy, a lot of people time.
And it just takes time, you know, time elapsed to train them and get them ready to go.
So that's why companies try to mitigate their risk by cloning them and having base models,
I guess I would call them, ready to go so that they can afford to try and try again.
But it's really important to realize that you can't just inspect one of these big talking
language models to find problems to fix like you can with debugging code.
You pretty much have to throw them away and start over.
That's because the complexity is ridiculous.
They have zillions of connections among zillions of data points and nobody's able to do surgery
on that.
So preventing problems of input and looking for problems in output is what UX folks have
to do.
We should try to look to the academic researchers and organizations that are publishing on ethics
and AI for guidance, for example.
Yeah, it definitely seems like a risk or threat assessment would be especially important now.
Granted, they've always been important for designs in general, but especially with what
you mentioned earlier, the fact that AI tends to be automating the drudgery of certain tasks,
but in doing so, that automation ultimately means that a design is being implemented or
that actions are happening at a scale much faster and much greater than any individual
person would ever do.
So it seems like there are a lot of risks that a team would really need to assess because
the damage is really magnified when operating at this scale.
Yeah, and nobody wants to talk about that when you're developing products.
So it often falls on UX people or privacy professionals, or if you happen to have some
in your company, which is kind of rare, the ethicists, so that people can really look
at the risks properly, security people are often quite helpful in this regard to team
up with because they're used to doing threat analysis of various kinds, and they're used
to thinking about bad people doing bad things with computers.
And that's kind of the mindset you need to be in because it's not just what could go
wrong in the development, and you have to figure that not all people that encounter
the system are going to have good intentions, and with some of these systems, that interaction
with people is how they learn, and so you may need to be careful how you allow people
to interact with them.
So we see a lot of systems today where the access is pretty tightly controlled and where
there are constraints in place about what you can talk to the system about or what kind
of images it will accept, and so on, because people have seen abuses in the past and they
want to prevent them in the future.
I can definitely see that as being a crucial element, whether that's a chatbot or any other
form of AI to get it up to the level it would need to be to operate effectively.
Now these risks, they do seem pretty significant, but what are your thoughts on the benefits
of AI?
We mentioned automation being a big one, but what other benefits can we see from AI?
Well, in a way, these are just better tools.
So not only are they automating some of the boring stuff for us, but they're also extending
our reach in terms of what we can do.
But today we still have what we call humans in the loop, and these are often sweatshops
where people do image labeling, word recognition, content moderation, and so on to correct and
augment the machine output so that products seem smarter than they are.
So we're a long way from robots taking our jobs in most cases, but we need to look out
for those exploited workers.
But it's great to have machines help with medical issues like cancer detection, drug
discovery, and precision surgery.
And today we also enjoy the first-pass machine translation captioning and transcription services.
So machines can help with writing tasks, but you don't know what you don't know.
So it's hard to be sure that the output's correct and that references, quotes, or history
haven't just been invented.
So a lot of these systems just make stuff up.
And so you have to be kind of skeptical of these tools.
They're very shiny but flawed.
That's really interesting.
This makes me think of this term.
Actually, we tend to use it in user testing a bit, but it's a method you might know of
called Wizard of Oz type of testing, which is where there's a prototype, you might be
showing it to someone, but there's an actual person who's working some magic, quote, unquote,
behind the scenes, taking the inputs that a user gives and preparing a screen or output
in real time that looks convincingly real based on what the user did.
So even though it's not necessarily a functional prototype, it seems to work in a convincing
way, even though it's a human being doing it and not the system doing it.
Now, with humans in the loop, is this the same thing?
Is this human in the loop?
Is this happening during the testing phase or is this happening live?
Well, there are several different ways that humans in the loop can work with systems.
One is fake it till you make it, and that may be testing prototypes, for example, like
we've talked about with Wizard of Oz method, or it could be that as the system's developed,
it needs some help, and so humans jump in.
For example, when the post office in the United States started using optical character recognition
to recognize addresses and route mail automatically, they touted this as a wonderful breakthrough,
but it turned out that if your handwriting was messy like mine, it would kick your mail
out of the system and send it to a sorting center where people would transcribe the address
into the machine, and they kept this secret for a couple of years because they didn't
want anybody to know that their system wasn't smart enough to just do the job, and these
people who are in these sorting centers were under a lot of pressure to move as fast as
the machines move, so there's a fine line between helping the machine and having the
machine make your life miserable, I suppose.
Yeah, I could definitely see that being a really difficult position to be in, basically
needing to keep up with or even compete with the machine, and that can be incredibly unrealistic
for a human being to be able to do that behind the scenes.
Okay, so humans behind the scenes, these humans in the loop, they're working, as you mentioned,
often in these tough conditions, but then there's the other side of it, which is making
sure that the people who interact with AI don't abuse the system and that the quality
of the data going in doesn't result in, as you say, garbage in, garbage out.
Yeah, and I think there's also, it's always a temptation to try to make the system appear
smarter than it is so that you can get funding, so a lot of these startups will blur the line
between the system and the human augmentation of the system in order to sell the dream of
the system to funders.
We saw a really interesting food delivery robot in the Bay Area that looked like autonomous
little robots tooling along on the sidewalk.
They looked like, I don't know, robot coolers or something like that, right, these little
square things with wheels and googly eyes or something similar on the front.
They were cute and everything, and people wanted to interact with them, but they were
uncannily good at navigating people and sidewalk and stuff.
It turned out that they were actually being driven by people in a different country who
were looking through front cameras, so there are a lot of situations like that where people
are trying to get their startup going and the technology isn't quite ready to go when
they need to launch.
I see.
So once we do get to realize the benefits of AI-driven tools, they certainly seem like
they're going to be a great tool for us, but for now it also seems like there's quite a
bit of labor in getting AI to the point where it's considered smart enough to work on its
own.
Now, I don't know if there's a straightforward answer to this next question, but do you think
the benefits that AI promises that we might see in the future, do you think that these
benefits outweigh the risks that you've mentioned?
Well, I think some of the benefits outweigh some of the risks, but we have a long hard
road ahead in preventing awful machine-written news websites and art theft and deep fake
impersonation and many other kind of fraudulent abuses of the technology.
Privacy violations need a lot of scrutiny and probably new legal protections as well.
So should you buy your child a talking bear?
Maybe not yet.
Some systems and their organizations just can't be trusted because your private data
is too valuable.
So we have kind of perverse incentives going on in this realm where some companies and
organizations are taking advantage of people and their data or their creative output, people's
artwork and people's music because they can scoop it up, they do scoop it up and there's
a lot of legal problems around that and cultural problems around that and privacy problems
around capturing people's voices and faces and so on and we really haven't dealt with
that effectively in law or society.
So right now we're having a lot of very unhappy people who are experiencing their creative
output being scooped up by machines and then resold as products.
Really sounds, and honestly I hate using this word, it sounds unprecedented.
I hate that word because I feel like we've really exhausted it over the last few years,
but it really is an unprecedented frontier legally speaking as well as in the realm of
intellectual property.
It sounds like it's honestly only going to get more interesting I guess and probably
a bit more of an involved process when designing a solution that uses AI output.
Yeah, it's unstoppable I think.
The question is who's going to get to steer it and where are we going to draw some lines.
So yeah, on the topic of drawing lines, do you have advice on how teams can figure out
where those lines are?
What I'm thinking is teams are probably trying to push the envelope, right?
They're trying to maximize those benefits that we mentioned earlier, whether that's
creating the illusion of benefits or whatnot, that's up to the teams obviously.
But do you have advice on how teams can push that envelope while still keeping these ethical
boundaries in mind and reducing harm?
Well, I think UX people should try hard to prevent problems with AI products and when
that's not possible, we need to stand up and call it what it is.
I think there's a lot to be said for listening for outside critique of your products too.
From what we've seen so far, it seems difficult to get traction in ethical issues from inside
the organization that pays you.
So I think as UX people, we need to test and interview and do research the things that
we do and bring that back to the company and be careful what you measure and go in with
a keen eye for what can go wrong.
You don't want to assume that revenue is the only thing you should be measuring, for example,
or that engagement is the end-all be-all because a lot of times engagement means people are
mad at you.
So you have to have some nuance in your measurement, otherwise you won't understand what's actually
going on with your product.
And doing social listening is part of that too, because people are always going to be
talking about your products and I think that a lot of social listening is hard.
And so a lot of companies don't do enough of it.
But somewhere on the internet, people are talking about your product and if you don't
know what they're saying, then you're really missing some important information.
And you don't want your product to fail big in public and then wonder if that's because
you didn't ask the hard questions.
So I think that we're tasked with something big here in our jobs, something that before
was probably only in the weaponry or medical fields where you have terrific harms that
you have to think about and guard against.
Yeah, absolutely.
So I think you had a lot of really insightful points here, one being measurement, right?
Obviously measurement is a tool we use in UX quite extensively, or at least teams that
are really aware of their progress, they tend to use metrics extensively.
And if you're not, then certainly something you should be doing.
But the key is that you're not just measuring one thing, but rather having a nuanced approach
and maybe picking a couple different criteria, right?
Not just revenue, not just engagement, but maybe other criteria like how pleased are
people with the service?
Or are we combining metrics with customer satisfaction or net promoter score?
Are we thinking of both the positive and negative experiences in that evaluation?
Right.
One of the things that people don't measure well enough, I find, is the people that didn't
use the product, right?
The people that looked at it and they went, nope, and they're out of here.
We really need to know more about that.
It's one thing to measure the engagement of your customers or how many customers you have
as opposed to visitors or people trying things out, but it's the people that went away that
can tell you the most about the money you're leaving on the table or the functionality
that you didn't offer or the need you didn't fill.
Also competitor analysis.
Companies often don't want to pay to do competitor analysis, but some of the most important stuff
you might learn is from understanding who your competitors are and why they're competitive
with you.
Working with brand new products, as I do, we tend to look more at those because new
products fit into the gaps that existing solutions don't fill.
It's important to understand what will make somebody switch from what they're using today
to something new.
What are their pain points and what can you offer that will solve problems that they know
that they have?
I think these things don't change when we're talking about AI systems.
It's just that we need to look at different kinds of competitors because they might not
all be AI systems.
Just like any new product, your competitor might be the old way that everybody's always
done it or whatever, right?
There's a combination of tools or workarounds or human labor that go into solving whatever
problem that is today, and the new system might disrupt many of those things or might
augment some of those things or might replace some of those things, and so kind of getting
a grip on that is important.
I really appreciate what you said there that your competitor isn't always going to be,
like you said, another AI system.
It might be something like the current way or the old way of doing things, and I often
encounter it in the courses I teach.
I'm often teaching the Emerging Patterns in Interface Design course, and it's a common
question.
What if I'm working on something that's so new or doesn't actually have competitors?
And I don't think there's a such thing as not having competitors because there's always
another way of doing an activity, right?
There's always an alternative to whatever it is we're designing, and I think you're
right.
It absolutely is important to spot the gaps in those existing ways as well as the gaps
in our current system.
So a good example of this is travel sales.
So back in the bad old days, you had to know a travel agent, and you would call them up
on the phone and tell them where you wanted to go, and then they would do a bunch of research
and call you back in a few days with some kind of a plan, and it cost quite a bit on
top of the trip to pay this person to do that work.
And they were a privileged group who knew the airport codes and really knew the ins
and outs of many systems, and they had special access to them.
So then along comes the web, and the airlines want to sell you tickets directly, and now
they have voice chat systems that sound just like people and do a pretty good job.
So I've been pretty impressed by these voice assistants and their ability to sell me airplane
tickets and to answer my questions.
But I was looking at the other day what it would take for me to visit one of my co-workers,
and he lives on an island off of the southern coast of Brazil, and it was a 44-hour flight,
according to Google Flights, because they only look at one airline end-to-end.
And I thought, wow, a travel agent would have done some research and figured out how to
make that fewer than four flights and maybe combine different airlines and different layovers
and airports in order to shorten that journey for me, but the computer's not doing a great
job at that because they have other incentives for making that travel plan.
And so I think when we're looking at systems like that, disrupting a system like that,
we have to ask ourselves, can we do a better job than the human can do?
And if so, what are the parameters of that, and how can we also work with the existing
infrastructure?
So today we still have travel agents, and they still do those same kind of jobs, but
they also use these new web systems to do that with.
And so their jobs have become easier in some ways because of the systems that have been
put in place, so that's the automation piece, but the traveler still has to try to figure
out when is a good time to get a human involved.
And so do systems, right?
I'm not going to buy that 44-hour flight plan ever if there's some possibility that I could
get there as the crow flies piecing together something else.
Right.
And I think it also speaks to the value of those human-to-human interactions as well.
It's often easy to, at least nowadays, it's easy to default to a solution that is technically
identical in nature, when maybe actually the best solution to incorporate in the moment
is one that involves human intervention in some way.
So you're absolutely right that it does require us to be a bit more reflective of what solutions
we're coming up with, whether that's creating a better version than reality or an augmented
sort of version with the current infrastructure.
And back with travel, there's a lot of human things involved, like how are you going to
get your luggage from there to there and do it before the next one takes off?
Do you have children and what does that mean about eating or sleeping on a long journey?
What does that mean about time between flights?
So I think that we can't just look at time on task.
Yeah.
Time on task is probably not going to be the most helpful.
I mean, certainly can be insightful, but probably not going to be the end all be all.
Anyway, long story short, it sounds like humans aren't going to be replaced anytime soon,
but it does seem like there is some promise in our future, maybe an augmented one, as
long as we're critical of the work that we're putting together with our teams and cognizant
of the potential harms that could come with systems like these.
So it seems like as long as we keep these things in mind, then we're off to a decent
start.
Yes.
That's what I think too.
I think that we have to be careful of these technologies just like we do with any new
technology, but especially keep in mind that we can do harm at scale with some of these
big systems and that we need to really understand that before developing things to the point
where they can't be steered or mitigated.
Absolutely.
Well, Susan, I've learned a lot in this conversation and I think others will find this incredibly
valuable as well.
So thank you for joining us today.
I really appreciate you taking the time to let me get to know a little bit more about
you and the work that you do.
It's been a real pleasure.
Thank you.
I really appreciate you asking.
That was Susan Farrell, Principal UX Researcher.
She's more recently started working at mm-hmm, yes, that's the name of the app.
It's mm-hmm spelled M-M-H-M-M.
It's a meeting app looking to make meetings and online education more engaging.
So you can learn more about that at mm-hmm.app and you can also find her on LinkedIn.
So if you're on the Fediverse, she also happens to be on Mastodon at joytrek at HCI.social.
And lastly, she's graciously shared some links to recommended readings.
So if you're interested in learning more about AI, machine learning and large language models,
LLMs, so those would be the systems powering the likes of chat GPT, you can find all of
those reading recommendations and more in the show notes.
On the next First Friday on February 3rd, we'll be sharing an interview that we had
with NNG UX Specialist Evan Sunwall.
In that episode, we talk all about expert reviews and heuristic evaluations.
So if you want to get notified when that or other new episodes get released, the best
way to do that is to subscribe on your podcast platform of choice.
And while you're on our show page, please leave us a rating.
It really helps us expand our reach and spread the knowledge of UX all over the world.
And of course, don't forget that you can find thousands of free articles and videos on UX
by going to our website at www.nngroup.com.
That's n-n-g-r-o-u-p.com.
This show was hosted and produced by me, Therese Fessenden.
All editing and post-production is by Jonas Zellner.
Music is by TINY Music and Moments.
That's it for today's show.
Until next time, remember, keep it simple.