Transcript: Is Design Metrically Opposed? — UIE’s All You Can Learn Library

Jared Spool: ...Let's make this happen. In 2010, an Australian designer named Luke Stevens decided that he was passionate about using Data to make design decisions. So passionate that he was going to do two things.

The first thing he was going to do is write a book about it and the second thing he was going to do, which is always the first thing you do after you decide to write a book, is he was going to redesign his website.

He decided, since his thing was about designing with data, he would make his website a laboratory for designing with data. He decided he was going to do a sweet little AB test. Variation one would be a design that would tell you all about the book he was writing, give you all sorts of information about it and have a little box where you could type in your email address and let it rip.

Variation number two would be a page that just said, "Are you a designer? Then you're going to be interested in this book. Put in your email address." No real description about the book itself.

Now, his hypothesis that he was testing was that the variation with the book description would be more interesting to people than the variation that didn't have the book description, but then he collected some data.

The variation with the description came in with 33 email addresses. The variation without the description accumulated 77 email addresses. This was surprising because he expected it to be the other way. He expected to get more people interested by describing the book than by leaving it blank.

He wrote a blog post about this talking about how he was surprised by the results, but completely pleased because it proved that the data often surprises us. That the data can tell us something that we can't just tell as an experienced designer, but there were some interesting things about this.

First, the assumption that more email addresses is the right way to measure. Second, that all email addresses are equal. Let's take that apart for a minute.

This idea that email addresses is the success metric for those two designs is an interesting idea. The idea that these two statements -- that it doesn't matter whose email address it is, as long as you get it it's got to be a good one -- that is an interesting statement, too.

We can actually label these two things. The actual collection of email addresses, that's what we would call an observation. Those assumptions about what the data is trying to tell us, we call that an inference.

These are really important things to understand because observations lead to inferences. We draw the inferences from the observations.

It is from those inferences that we go off and make design decisions. We decide what we should do better. We decide which variation we're going to make live.

We decide all these different things. We have this path from observation, to inference, to design decision.

That path in itself is really interesting. This idea that since the second variant collected more email addresses that, in fact, more email addresses are better, and, because of that, we're going to use that second variant from now on.

This is interesting because we don't know for a fact that more email addresses are better. In fact, what he really measuring? Is he measuring collecting email addresses or is he measuring selling the book?

When he then would subsequently take those email addresses and send them promotions to actually buy the book once it was done, which group would be more likely to buy that book? The people who had signed up reading the description or the people who had signed up not having any clue what the book was going to have?

Which group would produce more sales? That was not his measure of success, sales. It was collecting email addresses.

It makes me wonder, what would have happened? Unfortunately we'll never know. He never wrote the book.

[laughter]

Probably because he spent all his time redesigning the website.

What we have here, basically, is this idea of observation, inferences, and design decisions. What did we see, why do we think it happened, and what will we do differently in the design?

Back in the mid-2000s Wells Fargo came out with a brand new version of their website. This was actually a remarkable website design because this was that transition that was around the time when everything went from that grey background of all the early sites to things that were actually touched by designers.

They put this thing up and they were really intrigued by the results they were getting. They were getting all sorts of amazing results, and they started to dive deep into their data and figure out what was going on.

One of the questions they wanted to understand was what were people doing with the search engine box, what was happening there. They went and they started looking at the log file of every search term that people entered.

The most popular search term that was in the database of the log file was not, "How do I find my nearest ATM?" It was not, "What are the mortgage rates?" It was not, "How do make a deposit?"

The most popular search term was nothing. The log file was filled with blank strings.

That's interesting, right? You've got this log file that's filled with empty search terms. What is it trying to tell you? What is that data trying to say?

The team came up with a bunch of hypotheses. One was that button-focus was broken.

This idea that maybe what was happening was people were entering their user name, their password, and then hitting the enter key. Because the focus was on search it was actually going off and doing a search query and not logging you in, thus creating a log of empty search terms. That might have been one thing.

Another theory was that they didn't enter any text at all. That, in fact, because this was a new pattern, this idea that you have a box to fill in, and then you press search.

Previous to this all the website, you press search first and then you've got a screen with boxes in it to figure out how you wanted to search. This was new for people, and maybe they didn't understand that that was the pattern you were supposed to use.

Another theory that floated around was that people really wanted advance search. In the previous version that search screen was an advanced search screen.

There was a big concern amongst the team that once you take away advanced search you're going to upset the entire world. They were thinking, "Well, OK, maybe they're just pressing search so they can get to the advanced search screens, since there doesn't seem to be any affordance to catch them there."

Then there was a small contingent that actually thought it was the log software that was broken. That, in fact, people were typing stuff in, but for some reason it wasn't getting into the log file.

Here we have four different theories, and they're all based on one observation, the observation that the log file contained blanks. From that one observation we could draw four separate inferences.

From those four separate inferences the things we would do to make the design better are actually completely different. Depending on which inference we choose, we're going to go in a completely different direction.

How do we know which is the right inference? I guess we could do all four of these design decisions and hope we've covered all the bases.

Here's the thing, I've been running an experiment for years. What I do is I show them that screen. I get a roomful of designers like you and I'll show them the screen.

I'll have everybody in the room write down why they think it's happening. Write down the inference that comes to mind.

Sure enough, those four inferences will be represented in a group this big about equally. Interestingly enough, every single one of those people will have jumped to the conclusion that whatever inference they came up with is, in fact, the right one.

One of the things we've been learning as we've been studying great design teams is that the best designers never stop at the first inference. They use the first inference to create a set of tests to see if there are other explanations.

What they'll do is more research. For instance, they'll do some usability tests.

In the tests maybe they discover that users didn't know that they were supposed to type a query in the box. And as soon as you have that data, three of those inferences just drop away. And we only have one left.

So it turns out that the research we do is all about turning our inferences back into observations so that we can make better decisions. I want to show you an interesting trend that I recently discovered. I have these four companies, and on their home page I've discovered these four trends. Snapchat is the worst. Facebook is the best.

I want you to tell me what do you think it is? What guesses do you think cause this distribution? What could it be about a home page?

What do you think?

Audience: Time on site?

Jared: Time on site. That's a good thing. Maybe it's that. What else?

Audience: User registration?

Jared: User registration. That's an excellent thing. What else could it be?

Audience: Fail rate?

Jared: Failure rate, yeah.

Audience: User log ins?

Jared: User log ins. OK. What else?

Audience: Total page load time?

Jared: Total page load time. That's a great one. Performance is critical. It could be that. What else?

Audience: Tasks are different?

Jared: The tasks are different. Yeah.

Audience: How blue it is?

Jared: How blue it is. You're getting closer.

[laughter]

Jared: Here. I'm going to give you a hint. If we compare valuations, they line up almost perfectly. What is this data that could correlate so nicely with valuations?

Audience: Active users.

Audience: Ads.

Jared: Ads. Effective users. Yeah. None of those things. It's the number of E's on the home page.

[laughter]

Seriously. Snapchat has virtually no E's whatsoever, whereas AirBnB has more. Uber has about the same, and Facebook, they "E" out the wazoo.

[laughter]

Right?

Now here's the thing. Counting E's is a stupid metric.

[laughter]

Right? There isn't anybody here who thinks this is a good idea. But let's take that apart for a second. Let's take apart this word "metric," because we use this word all the time. I don't think it means what we think it means.

[laughter]

In order to understand this word, we sort of have to put it in contrast with two other words -- "measure" and "analytic." Two other words we use all the time. A measure is something we can count. So counting the E's is a measure. Is it a good measure? Is it a bad measure? It turns out there are no bad measures. Measures are just things we can count. There are hard measures. There are easy measures, but measures are measures.

Metrics are things that we track. We want to see some change in that. If in fact my hypothesis is correct, that the number of E's leads to higher valuations, increasing the number of E's on the Snapchat site could make the owners very rich.

So let's track how many E's show up on the site.

An analytic is something that we can get the computers to count. Something they can track.

What's missing from any of these definitions is whether these things are useful. We know the E thing is probably not useful, but what about a variable like "time on page?" Time on page is something that we can collect very easily. The software does it for us, so it's an analytic, and we can track it over time, so it's a metric. It's all three of those.

What does it tell us? This is a two-month period for an article that I wrote. Time on page is interesting, because it follows this cycle of people spend more time reading my articles during the week than on weekends, except for December 17th. What the hell happened on December 17th?

[laughter]

It took me a little bit of research to figure it out, but I figured it out. The reason that number is so much higher is because December 17th was National "I Have to Pee" Day. More people were peeing on that day, and the time on page just went through the roof. That's the only explanation I have.

Why does time on page change? Certainly, the number of words isn't changing. Is it just slower readers on December 17th? Is that just because we are so tired of Christmas shopping, we're just going to stare at our screen and not know what we're looking at? What is it about December 17th?

Here's the thing -- time on page never tells us anything useful. We can't tell if people are confused. We can't tell if people are interested. We can't tell anything based on time on page. It's something that the software will collect for us and Google Analytics is happy to produce a chart for, but it doesn't actually tell us anything. That's not the only thing Google does that doesn't tell us anything. They've got a whole list of things that doesn't tell us anything.

We can look at these numbers all we want, but there's nothing here. Bounce rate. Bounce rate is the most-cited statistic by people who are trying to validate their content decisions. "Our bounce rate is high, so we need to write better content." Or, "Our bounce rate is high, which means people are coming and finding out exactly what they want. Our content's good enough." You pick which side of that argument you're on, and then you can interpret bounce rate to support any argument you want.

This is not just an analytic. It's an agenda amplifier. Bring your agenda to the room, and we'll support it with the data we got. That's the problem, because when we look at something like this, we have no clue what we're supposed to do differently, so we come to the meeting, already know what we want to do differently, and we just read the data to support our decision.

Google Analytics and all the analytics tools, for the most part, there's a raft of things they can't tell you. They cannot tell you what's useful. They cannot tell you who is spending the most money on your site. They cannot tell you how to improve your content. They can't even tell you why somebody clicked on something. We can collect all the observations we want, but the inferences are left open, because Google Analytics will not tell you why.

Without knowing why, we really can't make good design decisions. We do not need more analytics. What we need are metrics that help us improve our user experience. We have a tool at our fingertips to do just this. It's called a "journey map." Those of you who've been hanging around with Chris Risdon in the last two days have seen variations on this theme.

A journey map, basically, takes a series of things that the user does as they use our design, and we put it on a scale from "extreme frustration" to "extreme delight," and we map out whether they are frustrated or delighted.

This is a basically tool of user experience, and if you're not using it in every meeting, you are missing a huge opportunity, because this allows us to bring the user to the table and talk about the experience they're having.

One of the things that it's so powerful about is it let's us hone in on what is frustrating, because that's where the biggest design opportunities are.

The biggest design opportunities are what frustrates our users, and there are so many things that can frustrate our users. We have confusing content, incomplete information, they're having issues with their password, features they can't find, navigation they can't find, navigation they can find but it doesn't make any sense, and error messages.

We can hone in on each of these, like error messages. So many error messages. Phone numbers can't have dashes or spaces. It takes 10 lines to right the code that validates and puts up the error that says, "You can't have spaces or dashes in your phone number."

It takes one line of code to take out the damn dashes and spaces.

[laughter]

Why does this even exist?

[laughter]

That stupid credit card security code, that is erased as long as there's some other error on the page. Then you correct the other error and submit it, [?] and it says, "But you didn't put in your security code." I said, "I did put in my security code. You have the memory of a goldfish."

[laughter]

And the all-time worst error message ever -- "User name and password do not match." Whose problem is that? I have logged into your site every day. You have seen me log into your site every day.

Yet, one day, I forget a letter or keep my shift key down when I shouldn't, and suddenly, you're all over my case about it.

[laughter]

Let me tell you a story about that. We were doing a study for a large e-commerce vendor -- huge e-commerce vendor, one of the top 10.

They had asked us to come in and figure out how to deal with the fact that people, during their checkout process, were dropping off. We asked them, "Show us the statistics for the checkout process."

They produce the page view data for the checkout process, and sure enough, as you moved your way through it, there was a slight drop-off.

We're like, "OK. Let's look at this. We'll figure out why this is happening. We'll run some usability tests and do this." But one of the things we know about e-commerce sites is that you can't ask people to pretend to buy something.

When they pretend to buy something, they actually behave very differently than when they're buying something they actually want. We asked them if we could run the test with real customers buying real stuff that they wanted.

They said, "Sure. But to do that test, what we'd have to do is we'd actually start before the shipping information page." We said, "OK. Show us the data from what happens before the shipping information page."

Sure enough, when we put that in context, the story was really different. You'd shop for the product, and, by definition, the way we were looking at the data, there wasn't any drop-off, because we were only looking at people who were getting to checkout.

But then, when they clicked on the "Shopping Cart Review" page, pressing that button that said, "OK. I'm ready to buy," a lot of people weren't buying. More than 50 percent of the people who got to that page weren't buying.

We said, "OK. We can work on the rest of checkout, but this seems like a huge drop-off here." They said, "Oh, we've got that covered." "What do you mean?" "We've got that covered because this happens on all e-commerce sites. Everybody does this.

People are always putting things in their cart, but then they decide not to buy. We're going to fix that problem with marketing."

[laughter]

"We're going to send them emails saying, 'Hey, you've got stuff in your cart. Do you want to buy it?' We'll do that every couple of hours till they buy it."

[laughter]

We're like, "OK. You guys have been doing this for a long time. I'll trust you that that's it. We'll keep working on the checkout thing. That does seem a little weird to me."

Then we get into the usability test. We have the first customer come through, and what we learn is that it actually doesn't go from "Review Shopping Cart" straight to shipping information, that, in fact, there's this page where you have to log in, and then another page where, if you can't log in, you reset your password, and then another page that you have to get an email for to then click on to reset, and then another page to change your new password, and then you get to go to your shipping information.

One usability participant after another, because we were in the lab, was not remembering their email address, not remembering their user name, and we kept seeing these screens come over and over and over again.

We were fascinated by the fact that these screens kept showing up over and over and over again, so we wanted to figure out what was going on here.

Now, the data that we were originally given said this, but it didn't have any of those pages in it.

We asked them, "Can we get the page views for those intermediate things?" They said, "Absolutely, on Monday."

On Tuesday they said, "We can't," because it was Monday evening that they learned that they've never collected any statistics on any of those pages. They have to go and rig up their analytics tools to do that, which was complicated because those pages are actually owned by the security and fraud people and not the design people.

It turned out this was a difficult ordeal, but we finally started to get some data, and when we go the data it was fascinating. It turns out that three times the number of people who go to the Review Shopping Cart page get the Login to Account page.

One inference from this is they're bookmarking the Login to Account page, but there is no other way from the site to get to that page other than clicking on that, "I'm Ready to Buy." Why would it be three times as big?

As we looked at the further data we saw that the Request Password Reset was not three times as big as the Review page, but almost as big as the Review page. Almost everybody who was going to that page was resetting their password.

A bunch of them, but not all of them, in fact only about two-thirds of them, were clicking on the "Email to Reset." A handful of those people were actually making it and changing their password, and a whole bunch of people were not shopping after that.

This is a very different story. This is a very different picture, and so we did something else.

We asked them to go into their database and add up the dollar value of all the shopping carts where people were not getting to the Shipping Information page, for that part of the thread. It turns out that that was worth $300 million a year.

Why was that Login page three times as much? Our usability testing told us the answer to this.

People were putting in their user name and password wrong three times. The second two times were the error message, "User name and password does not match." They were seeing that page three times, and then resetting their password.

It turns out that most people these days have multiple email addresses and they can't remember which one they signed up for. They would type in an email address and a password.

Maybe they got the password wrong. Maybe they got the email address wrong. They don't know which one and the system damn right ain't going to tell them because "bad guys" could figure out their email address and password, "So we'll just let the money go away."

[laughter]

Within two weeks of identifying this problem the team decided to build a guest checkout capability and they recovered all $300 million the next year.

What we did was we used qualitative findings to drive the quantitative analysis. This is completely different than opening up Google Analytics and saying, "What the hell does time on page tell us?"

We had specific questions and we were asking specific answers. We were able to get them with a bit of work.

It turned out that the team made a whole bunch of crappy inferences. Their checkout process steps were not what they thought they were.

They thought it was normal to lose customers when it wasn't. They didn't know that there were additional steps because they didn't have the analytics installed properly. They thought everything was instrumented right.

Here's the thing, we could look at this data and compare it to the journey map we saw in our usability tests and it matched up perfectly. We knew where the frustration was.

We knew what the issues were. We were under the assumption that things in the lab were not the same as at home.

We would come up with these rationales like, "Well, it's not their own computer. Because it's not their own their own computer they don't have their setup right," which actually is true. They did have a cookie that would remember your user name and password for six months or something, and then you'd have to login again.

We didn't know how huge this problem was, yet the usability study told us everything. The lesson we took from this is that if we're going to do quantitative research we should start by focusing on all the frustrating bits in our apps.

Start there. Figure out how often does this happen in the real world.

Every time I start talking about this stuff with my clients I get these excuses, and they are no longer acceptable. One of them is, "Well, we don't have control over the analytics.

"That's a different group, in a different building, under a different vice president." That's not an acceptable thing anymore.

This idea that we own the experience and they own the quantitative data is the wrong answer. We've got to own the quantitative data, at least that data around behavior.

I couldn't give a crap about the data around attitude. But the data around behavior, what people are actually doing, that quantitative data, that's got to become part of the UX team's effort, and it is.

This is a guy named DJ Patil, and he just got a new job. About a month and a half ago, he became the chief data scientist for the US government. He works in the White House. His boss is the president, and his job is to look at the nation's analytics and figure out what it's trying to tell us.

Now, if the White House is smart enough to have a data scientist, we should have one too. Data science is now an essential skill for every UX team. If you don't have people who understand how to do data science, you cannot create great designs.

Another excuse I hear -- "I don't understand what the metrics mean." That might be because the metrics mean nothing. It could very well be that they are just gibberish. You should question what they mean.

Another one I hear all the time is, "I'm a designer. I ain't good with the numbers." It's not hard stuff. This is not rocket science. I know this is not rocket science, because NASA's one of our clients and they have very strict definitions as to what rocket science is, and they have told us this is definitely not it.

[laughter]

We have to start thinking in terms of adding qualitative research and quantitative research into our toolkit. These are essential tools, and they have to be blended together.

We've started referring to design as the rendering of intent. We have an intention in the world, and we're going to render it. That's what we do.

If you go to Twitter and you look up Terry Virts's Twitter account, pretty much every hour, part of the NASA effort to tweet these amazing pictures. The interesting thing is, the pictures show up in the tweets. You don't have to do anything special. You can just go to the Twitter account, and you can scroll through and you can see all the amazing pictures. Of course, if you click on them, you'll get them in higher resolution, and they're absolutely gorgeous. I highly recommend you do this.

Recently, my friend Mike Monteiro went to Japan, and he took a ton of pictures, but his pictures weren't showing up. They were just links. In order to see his pictures, I had to click on the link -- and it was worth doing. It's always worth doing.

[laughter]

Why wasn't the picture just showing up? Why did I have to keep clicking on the links? It's because Instagram doesn't want you to see the picture. They want you to go to instagram.com, because as soon as you go to instagram.com, you are what they call a monthly average user, and you've just made them more valuable. Turns out that Silicon Valley has convinced themselves that the more monthly average users you have, the more valuable a company you are.

Now, keep in mind, Silicon Valley is desperate to try and figure out what makes companies valuable, because they keep creating companies that have no actual business model. Since you can't use business to figure this out, they've made something up -- what they call MAUs. MAUs. They're chasing the MAUs. It's a completely arbitrary, silly metric. The day Facebook acquired Instagram, they turned off the in-line pictures in Twitter, because they wanted the MAUs. That was what's important. When you hear Facebook report their annual earnings, their quarterly earnings, the first statistic they give is how many more monthly average users they had than last quarter.

The thing is that this isn't just for bookkeeping purposes. It's changing design. LinkedIn's group capability, which is this discussion forum, used to give you the entire conversation in your email. Now they give you the first eight words and you have to click a link.

Facebook does the same thing. You'll get a comment on the post, but they actually won't tell you which post it was a comment on. You have to go to Facebook to figure it out. That ups their MAUs. Their metrics that they are using to value their company is driving the way the design works and creating a worse experience.

The medium of design is behavior. That's what we use as our craft, and what we're doing is letting the behaviors be dictated by these crazy metrics.

Now, not all of them are crazy. Some of them have some merit. Medium, for example, is letting us actually see not only how many people come and read an article we've written, but how many have actually read it all the way through. They separate out this idea of views and reads. A read is someone who's made it to the bottom of the piece. Again, something the software can calculate easily, but it turns out to have a lot of meaning. You can actually see the difference between views and reads, and you can start to make decisions about the quality of your writing based on that.

This is not an accident. This is intentional. This is designed. By focusing on that, they are driving their metric collection by thinking about the experience they want their users to have, not the other way around.

Now, I lied to you -- I do care about attitude. I'm going to give you a list of attitudinal words. I got a bunch of them here. I want to see if you can pick out the one that's a little different than the rest. I've got delightful, amazing, awesome, excellent, remarkable, incredible, satisfactory. Which word is different than the others?

[audience member speaks]

Yes, satisfactory. Why is that word different? Well, if we were talking about a restaurant, a restaurant we loved, would we say, "Wow, that was incredibly satisfactory"?

[laughter]

No. We'd use something like delicious. Satisfactory is this neutral word. It's like edible. No one raves about a restaurant that is edible. "Oh my God, you should've gone to this place we went to last night. It was extremely edible!"

Nobody says that. We don't strive for edibility in our restaurants. We should not strive for satisfactions in our designs. We have set ourselves a low bar. We can do so much better.

It's about how we create the scale. We start with this neutral point in our scales. This is how a five-point Likert scale works. We add two forms to it -- in this case, satisfied and dissatisfied -- and then, because we think that people can't just be satisfied or dissatisfied, we're going to enhance those with adjectives that say "somewhat" or "extremely." OK, but extremely satisfied is extremely edible. It's not that meaningful a term.

What if we made that the neutral, and we built a scale around delight and frustration? Now we've got something to work with here. Now we've got something that tells us a lot more. We should not be doing satisfaction surveys, we should be doing delight surveys. We need to change our language at its core, to make sure we're focusing on the right thing. Otherwise, we get crap like this.

If you fill out a survey that has 45 10-point scales about every attribute of the product or the service, that company is basically telling you they could give a shit about what your experience was. Because there's absolutely no way they can tell the difference between a seven or a six. If suddenly you were giving sevens last week and this week you're giving sixes, what do they need to change? Can't tell.

I was so disappointed when the people at Medium sent me this. "How likely are you to recommend writing on Medium to a friend or colleague?" It's not even a 10-point scale. It's an 11-point scale, because 10 was not big enough.

This is called a Net Promoter Score, and Net Promoter Scores, if you look at the industry averages that everybody wants to compare themselves to, the low end is typically in the mid-60s and the high end is typically in the mid-80s. You need a 10-point scale because, if you had a 3-point scale, you could never see a difference. Anytime you're enlarging the scale to see higher-resolution data it's probably a flag that the data means nothing. Here's the deal. Would a net promoter score for a company say like United catch this problem?

Alton Brown bought a $50 guest pass to the United Club in LA and had to sit on the floor. I wonder what his net promoter score for that purchase would be? It probably wouldn't tell anybody at United what the problem is. But that's a negative. What about the positive side?

What's actually working well? Customers of Harley-Davidson are fond of Harley-Davidson, so fond that they actually tattoo the company's logo on their body. This is branding in the most primal of definitions.

[laughter]

Would you be able to capture that from net promoter score? I mean, that's loyalty.

Or would you catch the idea that people want your product so much that they will line up at four in the morning to be the first ones to buy it, the same people who are lining up are the same people who were complaining four months earlier while watching the live feed of the keynote how unsatisfactory the new features were?

You can't get that data from this, so where do we get it from? Turns out that there's a whole bunch of ways to do it. One of my favorites actually showed up in this article a while back called, "The Constant Customer." It was put out by the folks at Gallup, the people who do the presidential polls.

It turns out they did all this study on what makes a customer engaged with a company. They came up with this 11-question survey that they divide into five categories and it's brilliant.

It starts, actually, down at the bottom with the loyalty category, which basically says, "I'm satisfied, I'm likely to do this again, I would definitely recommend this to a friend."

Then it goes beyond that, "I can trust this company, they always deliver what they promise." I'm thinking Alton Brown would not agree with those statements.

They go to integrity, "The company always treats me fairly, I can count on a fair resolution to any problem I might have." That's a big question.

That's really huge because if I know that I can count on the company taking care of me I have a completely different attitude whenever a problem comes up.

I am much less frustrated by problems that I know will get good resolution than ones where I feel like now I'm going into the customer service pit of hell.

I'm proud to be a customer of this product's company, so proud I will forever wear their logo on my ass.

[laughter]

How many of you have customers at your company who will go out and tattoo your logo on their ass?

"This company always treats me with respect," that's huge. The best one, "This company is perfect for people like me." Ever see an Apple fanboy get defensive about Apple when Apple has done something incredibly stupid? "Oh yeah, well they..." [grumbling noises]

"I can't imagine a world without this product's company," this is a bit extreme. It's way out there. But think about it for a second.

Think of the best restaurant you've ever eaten at, the one you would go to in a heartbeat. If you learned today that they were closing their doors forever would you be sad? Would that really disappoint you? There are people...Apple customers. If Apple disappeared they'd all dress in black. They already do.

[laughter]

There are Microsoft customers who believe these statements too. There's just not quite as many. That's the 11, customer engagement. The interesting thing is that even though it's an 11-question survey, each question just has three measures, "I agree, I'm not sure, I don't agree."

What we can do is we can quantify this by giving each one of those a score. Now we have 11 questions. If they answer all -1 it's -11, if they answer all +1 it's +11. We have a range. A nice 23-point range.

But it's interesting because we can hone in on exactly where the problem is by looking at the different questions. This is exactly what we've done.

We went out and studied five different e-commerce websites and had people shop for things they were ready to buy on those sites and we measured every step of the shopping process with this survey just to see what would happen.

It didn't annoy anybody that we kept asking those 11 questions because once you've done them like twice it takes you like 10 seconds to go through and rank them.

But what was really interesting was Amazon started at a 6.2, remember it goes -11 to +11. 6.2, that's really high. But that was before people started shopping on the site. When we asked them again when they were done with their purchase it had dropped to 5.5.

Not only that, because we watched them all we knew what was causing that problem.

Best Buy started at 4.5 but at the end of shopping had dropped just a little bit to 4.3, not nearly as frustrating. But it didn't start as good. People were not as engaged with the brand to begin with.

Dell started at 3.0 and dropped amazingly down to 1.4. HP started at 1.4 and dropped all the way down to -1. Extremely frustrating sites.

One of the ones that was most interesting to us was Walmart.com which started at 0.5 because no one believed anyone shopped on Walmart.com. Yet by the end of the shopping experience it had gone up to 1.1. Not a lot, but not the direction we expected.

That was interesting. It was like, "Hey, this actually can work. I'll give it better scores." But not as good a score as what Amazon ended up with.

Now we can do that metric thing of comparing baselines against other things using this score but at the same time we can hone into specific problems and start to look at what's causing these things.

Here's the other interesting thing about the CE 11. It's a set of stairs. You can't get to confidence until you have gone through loyalty.

If you're not happy enough to recommend the product, you're not going to be confident and you're not going to feel it has good integrity if you're not confident and you're not going to have pride in it unless they have good integrity, and you're definitely not going to be passionate about them unless they do everything else.

This is actually a statistical basis known as a Guttman scale. Guttman scales are a series of questions that build on each other. In fact we don't have to know the answers to all the questions to know where we stand. It turns out to be a really useful tool.

We can use metrics to drive us, to eliminate where frustration's coming from and figure out how to deliver delight by having the right scales.

Now one of the metrics that is used all the time is conversion rate. Conversion rate is the number of people who purchase/the number of people who visited. You will hear conversion rate come up all the time.

Some people even use it on non-e-commerce things, like Luke Stevens kept referring to conversion rate for the number of people who actually put in email addresses. That's not really a conversion if you want to count sales as the final target.

In his case it should have been sales but because he never wrote the book he could never collect that data. We can take this apart.

We can say that the number of people who visit, let's say we have a million visitors to our site. The number of people who purchase, let's say it's 10,000. Well, that means our conversion rate is one percent.

We can then look at this in more depth. The problem with conversion rate is it's a ratio. Ratios make horrible metrics because there's actually two variables we're manipulating.

For example, we can up ourselves by saying we have 20,000 purchases with a million visitors, that gives us a two percent conversion rate. Definitely better than the one percent conversion rate we just had.

But we can also get to a two percent conversion rate by keeping the same number of purchasers and just halving the number of visitors. If we spend less on marketing we will actually raise our conversion rate and this is hard for people to get their heads around.

I stand in front of rooms of executives saying, "I can tell you how to raise your conversion rate tomorrow, just chop your marketing budget by 90 percent. They look at me like I just told them to throw out their child. But, that's how it works.

If we start measuring this against something real, like the average purchase price, suddenly we see that a one percent conversion rate in this instance gets us a million dollars. A two percent conversion rate, when we have 20,000 purchasers over a million visitors is $2 million. When we have 10,000 purchasers over a million users, we're back at a million dollars, even though we have the higher conversion rate.

I would tell them, I'll say, "You've got two doors to choose from. One, I can give you a better conversion rate, the other one, I can get you more revenue but a worse conversion rate, which one do you want?" And they have to think about it.

[laughter]

Because they have been so trained to chase the conversion rate that they're not parsing the question. And finally, after they've given it some solid thought they'll go, "I guess we want the more money."

[laughter]

At which point you're like, "Why are we talking about conversion rate?"

Here's some other great problems with conversion rate. Let's say I've got a customer who comes to the site, check out the product. Comes back, checks it out again. Comes back, checks it out again, and then finally makes a purchase. Is this one purchase to four visits, or is this one customer finally making their purchase?

Do we optimize for only first time purchases, or do we actually let our customers have the time they need to make a decision, to print out the information, to take it to a higher authority for approval, like their boss or their spouse?

It seems that we're optimizing for the wrong things. Conversion rate is time on page. It could mean anything you want it to mean. If you torture data long enough, it will confess to anything you want it to.

[laughter]

In other words, I don't understand what the matrix means. Let's figure out what they really are trying to tell us. We can look at this and we can figure this out.

Imagine an e-commerce site that has $4 million of revenue a day. That's actually not very hard. That comes to about $1.4 billion, and that would put it in the top 30 e-commerce sites right now. They have a million visitors a day, which means the average revenue per visitor is $4.

Let's draw this out. This circle represents all of those million people who visit everyday. This site has a conversion rate of 1.6 percent which means they have 16,000 buyers everyday. Where do they appear on this circle? That little blue dot on the center.

Here's a question for you. That blue dot on the center that represents 1.6 of all the people visiting the site, they make $4 million a day. How much revenue is that blue dot responsible for?

[audience member speaks]

$4 million. Everything that's not blue has no value to the company. None. Zero.

Here's the other interesting thing. This is a real company that we work with. When we got to know them, we learned something. We learned that of that blue dot, 20 percent of them are their top buyers. Their top buyers produce 80 percent of that revenue, 3,200 of those buyers everyday.

We could draw them in too, it was hard. My software doesn't like a circle that's only two pixels big.

[laughter]

That's it. 80 percent of the revenue. That means they are basically responsible for $3.2 million of that $4 million, that tiny dot there.

Lets say we're going into our analytics. Looking at the data, we're just picking out users and trying to figure out they're doing. The odds of us hitting one of those people who bought, another one, one of those people who were the top buyers, there's no way we're going to hit that target.

It's crazy to think that our analytics tool can tell us anything about those people. In fact, our odds for a buyer is 16 in 1,000, for a top buyer it's 3 in 1,000. Would you go to the bank making design decisions where you took random data and hoped that you got the 3 in 1,000 that's important?

Would you bet your one year salary when you made the right decision? What if we did this differently? What if we turn it upside down? What if we actually focus on the blue dot?

We started there. They make up 80 percent of our buyers, they contribute 20 percent of our revenue. What if we then look at that yellow dot? They represent 20 percent of our buyers that make up 80 percent of our revenue.

What if we just study them? Guess what? In e-commerce, we know how to do that because we have their username and password. We can see when they log in. We can pull out just that data and look at them. When we do that, what we're looking at is a large amount of money.

Every year, those 20 percent that represent 80 percent of our revenue, they account for $1.1billion of revenue to the company. Those blue folks, they are almost another $300 million.

What happens if we start to map out their experiences? We create a little journey map. We figured out, for those yellow folks, what are they doing? We learned that there are 10 things that happen when they come and spend their money on our site. Of those 10 things, some of them are delightful and a couple of them are frustrated.

What if we picked two of those frustrating things? We worked really hard and just for the attitudes, trying to get those two frustrating things to be delightful that has the capabilities of increasing revenues by $385 million. Do you think you can find a budget in that? How do we make that happen?

That's just from the yellow dudes, we're not even talking about the of the same fixes who actually make things better for the blue dudes or if those same fixes that turn the gray dudes into purchasers. That would be even more money. What we've done here is we've started with the most valuable people on our site and said, "What's the qualitative experience they're having?" Then went to the metrics to figure out how we measure whether we're improving that experience.

In fact, design is not metrically opposed. As designers, we need to accept and embrace the world of metrics and use their amazing powers to change the way we're doing things.

This is what I came to talk to you about. First, pay close attention to the inferences you're making. Anytime you talk about research you've done, separate the observations from the inferences. Create them into separate columns on the whiteboard and on the report. Do not confuse them. Ask the question, "How do I not pick the first inference?"

Second, don't just accept the metrics and analytics that the software gives you. Go and find what you need. Match it up to the actual experience. Look for the frustrating points.

Here's a quick pro tip. Count the error messages. Go through every error messages on your design and make sure that it has account. Even if you're using Google Analytics, it's one line of code in your Google Analytics to count that someone has just gotten another message.

It's amazing how much time we spend studying bounce rate and how little we know about the things that cause frustration in our product. You start producing statistics on your most generated error messages? You're going to see a completely different response from the people who pay attention to data.

Then we have to get smart about data science. We have to understand it. We have to make it work. Finally, we need to drive our quantitative agenda from our qualitative research. We need to inverse the process and make it happen.

Ladies and gentlemen, that is what I came to talk to you about. Thank you very much for encouraging my behavior.

[applause]

Transcript of Is Design Metrically Opposed? with Jared Spool

Get Access Now

Subscribe &Watch It All $29/month

Subscribe & Watch It All $29/month

Get Accessfor 48 Hours $19/seminar

Get 48-Hour Access $19/seminar

Already Have Access? Sign In Now

Come On In

Sign In Now

Or Subscribe Now $29/month

Not Yet Subscribed? $29/month

Come On In

Sign In Now

Not Yet Subscribed? $29/month

Not Yet Subscribed? $29/month

Subscribe &
Watch It All $29/month

Get Access
for 48 Hours $19/seminar