Florian_Cheval

How do you measure it?

Blog Post created by Florian_Cheval Employee on Nov 10, 2015

Yesterday, we tried to come up with a working definition of "good software", and we arrived at:

 

"Good software does what you hope it will. (And is not ugly)"

 

I said I would talk about how to create it, but we're doing Agile Development here, and our users have asked a pressing question.

 

Users first, let's answer the question: "Ok, but how do you measure it?"

 

maxresdefault.jpg

So let me propose the highly scientific patent-pending empirical method to measure the quality of a piece of software: The Good Software Evaluation.

 

As we discovered, the quality of a piece of software needs to be evaluated against a particular goal. There is no "good" in general, there's only "good at this".

 

So the first thing you do is decide upon the goal. That's right, you arbitrarily choose a task in your head and you decide that this is the one you're going to measure against. To be of any use, I advise you choose a task that you think users typically try to complete when they use this piece of software. Of course choosing the right task/goal is where you need to be smart, and you can cheat if you want. I can probably state that all software are *perfect* at some irrelevant task. Yes, let's make that a first law:

 

1st Law of the Good Software Evaluation: "All software scores perfectly at some irrelevant task"

Kim.png

 

So choose an irrelevant task, and your Product will score 100%! That should feel good, but will probably not help your users much.

 

Ok, so you chose a task/goal that you believe relevant for the user(s) of the software. It can be anything really:

  • The user needs to sign up
  • The user needs to fill basic profile information
  • The user creates a project
  • The user finishes a project
  • The user is laughing
  • The user is sleeping
  • The world is a better place
  • A new cat video is posted on youtube
  • Both users get married an happily live ever after

 

Now, let's start some measuring!

Find a user. Yourself. Your coworkers. Your wife and kids. Your customers. Strange people on the Internet.

Put them in front of your software, and ask them to reach the goal/complete the task.

Start a timer.

If the user cannot complete the task, the software gets 0, you can stop the timer now, let's make that the second law:

 

2nd Law of the Good Software Evaluation: "If users fail to complete the chosen task, the software scores 0"

Cat_FAIL.jpg

 

Once the user completes the task, stop the timer and mark down the result X. Yes, it took your test subject 103 seconds to sign up. Yes, it took him 3 months and 2 days to finish a project. It only took 2 weeks for these 2 to fly to Vegas and get married.

 

So, now you compute the grade for this particular test: 1 / ( X + 1 )

 

If it takes a very very long time, X will be very big, and the score will be close to 0.

 

If the task has been accomplished "instantaneously", X=0, and the score is equal to 1.

 

Of course you're going to tell me that I didn't specify in which unit I should count X. In seconds? In hours? In microseconds? The truth is it has no importance, you're not trying to compute an abstract value for your software here, you're actually trying to measure how good it does at accomplishing a particular task for a user *against an alternative solution*. So the unit doesn't matter, choose one and stick to it. If you really don't know, measure in seconds.

 

Now run the same test with other users, each time you get a grade.

 

The grade for your software is now the mean of all individual test grades.

 

3rd Law of the Good Software Evaluation: "The software grade is the mean of individual test grades, where each test grade is equal to 1/(X+1), where X is the time it takes for the user to complete the chosen task"

Mean.png

We're done with the math! Let's look at a few applications and how it relates to my definition.

 

I'm going to put you in front of my software.

I'm going to ask you to try and do something with it. Create a project, upload a video, write a blog entry, send a message to Wendy, whatever.

As soon as those words are out of my mouth and reach your brain, suddenly you form an expectation that the software can do the very thing I just asked you to do, so I'm going to start the timer.

Now you're going to try and accomplish the task.

Once you're done, I'll stop the timer. Did the software do what you thought it would do? Apparently yes because you just completed the task that I was measuring. Was it easy/intuitive/pleasant? Let's look at the chronometer. Did you really hope it would take approximately this amount of time or less? If the answer is yes, then the software is pretty good. If the answer is no, then you're disappointed, and you think it should be faster/easier.

 

Ergo: "Good software does what you hope it will" (and is not ugly), and you now have a highly scientific method to measure it

 

Now we're going to compare 2 Products against one another.

The chosen task is "I want to know what time it is"

 

On the left my crappy old swatch, on the right my shiny smarphone:Battle.png

 

Swatch, pull up my sleeve a little, lower my eyes, look at the screen, internalize the numbers, total time: About 1 second.

 

Smartphone: Take it off my pants, arf, why are these pants so tight, take it off my pants, why does it have to hurt so much to take a phone out of one's pocket? Pfff, finally got it, press the button, look at the screen, be distracted by the reminder, finally see what time it is: About 5 seconds.

 

And the winner is my good old fashioned faithful watch. In this test, it is a *better* product than my phone to know what time it is.

 

Tomorrow, we'll talk about an interesting implication.

Outcomes