I tip up front. Actually, I know a lot of people who
occasionally do this, but to my knowledge I’m the only one with a managerial
principle of statistical validity to back up ‘why’.
It stems, originally, from a an article by W. Edwards Deming
called “On Probability As a Basis For Action”, (The American Statistician,
November 1975, pg. 150).
Attractive title huh? Well, it’s profound, if misunderstood.
In it Deming explains how data can be gathered in different ways, and used to
different ends, and how people are not well enough versed in statistics to
understand when they are using one type of data to show something that cannot
be shown. He calls these two types of data ‘enumerative’ and ‘analytic’.
Enumerative
data is the information you gain after the fact. The census is a great example.
We survey people after an event and then tabulate the data to show what they
said.
Analytic
data is information about a process. It shows, often with math, ‘how’ something
is done, how the process works, what steps cause effects, and what those
effects then cause in a chain of reasoning.
In my own words I like to think of enumerative data as
showing ‘what’ happened, and analytic data as showing ‘how’ and/or ‘why’ it
happened.
Back to the census. To make my point I did a quick google
search of 1999
census. I immediately found that 12.4 percent of American people, (33.9
million) reported income in 1999 that was below the poverty line. I scrolled down
and quickly found a chart that shows States and Regional Poverty Rates: 1989
and 1999. The poverty rate fell the most in Mississippi, -5.3%. The biggest
rise in poverty was in Washington D.C., +3.3%.
Interesting numbers to be sure. But what can we glean from
these numbers? Here are a few examples of silly conclusions: [These conclusions
are false!]
1.
Poor people from Mississippi must have moved to the D.C. area
in large numbers.
2.
Poor people in D.C. must be killing the wealthy people in very
large numbers thus resulting in a shift in the ratio of those in poverty.
3.
Mississippi’s new job creation initiative must be working for
them to show such substantial gains.
Now of course, these conclusions are patently false. But
what if we find some ‘other’ set of data that ALSO relates to this issue. Then
can we tie the two together and show they support each other?
No. The fact that they don’t make sense is not ‘why’ the
conclusions are false. Rather, they are false because they are using enumerative
data to show the ‘cause’ of poverty, analytic conclusion.
In other words, we used data from type A, to make a
conclusion of type B. This CANNOT be done. It means the conclusions above are
false on their face, without any need of analysis. The same kind of error can
be made in reverse. We can perform an analytic study which we then use to claim
an enumerative conclusion. We can’t do that either, but we won’t discuss it
here as it doesn’t relate to ‘why I tip up front’.
Okay, we’ve shown how enumerative data, collected after the
fact, CANNOT be used to show the ‘cause’ of the data.
What if we collected the data more often? THEN could we use
it to show a trend of causes? No. No matter how often we collect enumerative
data it CANNOT show the cause of the results. It is not simply ‘difficult’,
it’s actually impossible.
So, in what other arenas do we collect data after the fact
and then blame the result on a cause?
Public schools of course. We stick little Johnny in a class,
ask him to perform, then we test him at the end. He gets a score, and we say
“Wow, Johnny did great, or “Oh my, Johnny is not too bright is he?” You see? We
attribute his ‘score’ to his ‘intelligence’. Directly. Without question.
In December of 2001, Valerie Strauss, writing for the
Washington Post, wrote an article called ‘Revealed:
School board member who took standardized test’.
It describes the experience of a school board member who
took the test and scored dismally despite having 3 degrees and educational
experience. At the end of the article he mentions how the questions on the test
are simply not valid for today’s ‘working life’. The test is not testing skills
and knowledge actually used in real life.
He may be right. But only to a point. Even if the test had
been perfectly aligned with knowledge and skills kids actually need in order to
function, the test is still not valid. Why?
Because it is attributing ‘cause’ to data collected ‘after the fact’.
It’s claiming to be ‘analytic’ when the data gathered are ‘enumerative’.
This cannot be overstated.
We ‘blame’ the kid taking the test, 100%, for his/her
performance on the test.
Recall, perhaps, the same author’s article on principals and
staff evaluations which tie pay to
student performance.
Here is the article
showing how New York educators are up in arms about a system of evaluation that
ties teacher AND principal pay to student performance on standardized tests.
Hang on to your booties, here is where it gets interesting.
The staff justifiably, as we now know, argue that connecting
pay to performance in this way cannot be correct. Student performance is
affected by so many different factors that blaming staff and administrators for
the outcome is just not reasonable in any way.
Did you get that? Student performance varies so widely, from
factors out of the control of staff and administrators that educators cannot be
expected to be evaluated on how their students perform on the test.
Huh.
And yet, correct me if I’m wrong, these very same teachers
and administrators say nothing of subjecting these students to these tests, and
then evaluating the students on their performance. Hmmm. Johnny did great, see?
It says so right here! Oh my, Johnny
lost some ground this year compared to last year, so he’s really going to need
to buckle down this coming year to make up HIS losses.
That CAN’T be right.
They are essentially saying that the school has little or
nothing to do with how well Johnny did on his test. OR, at best, they are
saying that it is impossible to tell which part of Johnny’s grade had to do
with the school, and which had to do with Johnny’s effort.
That last part’s true. The differentiation of A-school,
B-Johnny’s effort, his C-home life, the D-bus ride home, the E-weather, or any
sort of F-learning difficulty he might have, it’s impossible to tell which
factors play how much of a role in his academic success.
Why is it is impossible? Let’s see, on the face of it it’s
easy to grasp. We are trying to determine why Johnny is successful. Let’s say
his success is a result of the equation:
A+B+C+D+E+F = 100%
Now B is Johnny’s effort. No one can take an equation with
two or more variables and solve the value of the variables. Let’s simplify this
some to make the point really clear.
Lets call X Johnny’s effort, and Y the school’s effort to
help Johnny. So:
X+Y+XY = 100. That is, Johnny’s effort plus the school’s
effort, plus the result of Johnny working with the school’s system, equals the
output, or level of Johnny’s success.
Okay, now we have one equation with two variables. How do we
figure out X and Y? We can’t. It’s an algebraic law, you cannot solve an
equation with two variables if you only have one equation. No one can. (Even my
9th grader interrupted me to point this out).
Quickly, let’s reflect. The staff in New York then are
claiming something like this. You can’t evaluate us on Johnny’ performance
because you can’t solve for Y. Yep. That’s true.
But they then turn around and pretend there is no Y. They
blame Johnny for 100% of his success or failure.
Put another way. If a teacher has a year when her students
do more poorly than usual, she can say to herself, ‘my, this year the kids just
weren’t as smart as last year’. Almost plausible. It even sounds typical. But
why couldn’t the teacher say, ‘Oh my, my system did not work as well as I’d
hoped. I’d better see if I can improve the way I do things to better meet their
needs’?
In one she’s attributing success or failure to the students.
In the other she attributes it to herself. Two completely different invalid
conclusions based on exactly the same data.
Wait! Invalid? Why invalid?
Well, remember that thing about ‘enumerative’ data being
used to show analytic results? That can’t work. Here we are doing that same
thing, either way she looks at it she’s using enumerative data to support an
analytic conclusion. That CAN’T be correct.
Now, remember the school board member who suggested that new
and better questions were needed in order for the testing to be valid? He
obviously does not grasp the meaning of the difference between enumerative data
and analytic studies. If he did, he’d say ‘Standardized tests are not valid
evaluations of achievement because they are not VALID.’
It is impossible for the tests to show the ‘cause’ of the
student’s success. Did the student do well because he was smart? Or because he
went to a great school? Or because his parents both went to college? (This is
the number one most accurate predictor of academic success by the way).
Or did the student do badly because his family’s first
language is not English? He received insufficient help with his disability (as
mandated by law?). Was he improperly evaluated early in school and then passed
on to grade after grade without the skills to succeed due to a broken system?
Did his parents complain too loudly about a teacher who then chose not to spend
the time with him he/she could have? Is he suffering from an economic or racial
bias from the staff that may not be measurable but may be having an effect
anyway?
The same argument can be made against performance
evaluations at work. It is difficult, at the very best, to form an evaluation
that has any meaning, and whose time it takes to perform could not have been
better used in providing training, coaching and leadership to help employees
improve their work.
Oh? Really? People at work, who get evaluated, often in
order to get a raise? Those are invalid too? Absolutely invalid.
And that brings us right round, doesn’t it?
Why do I tip up front?
Because judging my waitress cannot be valid. I cannot
correctly blame my waitress for the quality of my service.
So, I calculate 20%, an estimate that is, and I pay him/her
that many dollars when I sit down. I tell them, ‘It doesn’t really matter ‘why’
I do it this way. If you’re really curious later you can ask me and I’ll
explain.’ They smile, quirky crazy guy on table 6 already tipped me. Then they
go about their business.
I can hear it being said, (actually I’m remembering all my
friends), “What if you get crappy service? Then what?”
If you’re asking this, then you didn’t really grasp the
point. I don’t do it to get better service. I do it because it ‘helps’ the
system. I am going to tip her. If I wait, she has the ‘threat’ that I might tip
her badly hanging over her the whole meal. If I tip her up front, at least she
knows she doesn’t have that to worry about. She’s then a little bit less
stressed out about her tips, and everybody’s service improves, a teeny bit.
More importantly, I release her from the fear that as her
customer, ‘I’ am going to make an impossible (and unfair), and mathematically
flawed judgment of her performance and then punish her for it (unjustly).
Try it a few times. See what happens. Regardless, let go of
the idea that you are ‘rating’ your server for her performance. Because that’s
just not possible.
For more information on performance appraisals:
Or grading in schools: