Back in the late 1860s, the legend of John Henry was born. As the story goes, he was the railroad’s best steel driver and could drill through more rock than anyone.

Despite John Henry’s prowess, it was not enough to satisfy his employer. When the crew was partway through a West Virginia mountain, the boss purchased a steam-powered drill to finish the job more quickly.

The men were concerned that a machine might replace them permanently, so they issued a challenge and pitted John Henry against this drill to see who was better. John was victorious, but his victory was short-lived. The legend states that he died right after.

Most hiring managers I have met are like John Henry. Despite overwhelming evidence to the contrary, they compete with “the machine” when selecting people. Unfortunately, for some, the belief that they need to compete with technology, fueled by the fear computers will replace them, may have the same impact on teams and careers as on John Henry.

Granted, hiring decisions are more complicated than drilling holes. Some nuances and subtleties make candidate evaluation difficult. Nonetheless, it’s time to stop competing with technology and start collaborating.

Are you convinced?

Here is my case for why collaborating with technology is more effective for the hiring manager and fairer for the candidate.

I’ll first show evidence known since the 50s that even simple models and mechanical processes make better people decisions than humans.

Next, I’ll show that success rates increase when humans provide the decision criteria and technology measures and models them.

Finally, I’ll discuss the biggest obstacle to a successful process; accurate candidate data. I’ll share a few options for acquiring the relevant information to feed the formula.

The Evidence

1954, Paul Meehl reviewed twenty studies in which researchers compared a clinical judgment to mechanical prediction (reference at the end of this article). Almost 70 years ago, he concluded that even simple models were superior to human judgment when evaluating others.

Consider the following logic. If humans were reliable, a small group with first-hand experience of a person would likely rate that person consistently. A perfect example is the multirater review (360). This method requires a group of people with daily experiences of an individual, often over several years, to rate them on several characteristics. These raters have exponentially more direct exposure to a person than an interviewer.

At Psynet Group, we have delivered over 300 multirater assessments. Each instance has 8 to 15 raters who know the subject well. Considering that each group of raters answers about 30 questions, there are 9,000 chances for everyone to agree. However, not once have all the raters agreed.

This example should raise some doubt about the accuracy of the interpretation generated by a 60-minute interview. If a group with hundreds of hours of observation cannot agree on their ratings, can we trust an interview process where candidates present the best version of themselves for an hour?

It’s worse than you think.

Jason Dana, a researcher at Yale, reviews research results that further build Meehl’s case.

In his NY Times article, Jason told a story about Texan students who were admitted to medical. Those who passed an interview to gain admission did not outperform students initially rejected by it but received entry due to a government mandate.
Raters asked to predict future work performance did more poorly when interviewing the candidates than when they only had background data.
Subjects who responded randomly to interview questions were rated higher by evaluators than those who responded honestly.
In the same instance, once the raters learned the researchers duped them, they still believed the fake interview was better than resume data.

The reasons for our inability to judge other humans are manifold and beyond the scope of the blog post. However, when evaluating others, we are no John Henry.

Why are humans so bad at evaluating other humans?

There are several explanations, potentially enough for an entire book, but I’ll focus briefly on two.

Humans integrate information inconsistently. Different people value characteristics differently. Some of us overvalue traits that are similar to ours. Some value traits based on their current needs. Others value traits based on what they recently read or heard from others. The situations are many and varied, but all fall under context.

Furthermore, the same person values characteristics differently on different days. We had several meetings with a financial client with a long list of important candidate characteristics. In the middle of the process, the executive committee fired one of its directors for creating a toxic culture. At our next meeting, filtering out “assholes” became the managing director’s top priority.

Starting with a long list of variables and spontaneously adding to it multiplies our inconsistency. As humans, we are under the illusion that adding complexity increases accuracy. Science indicates that the gains from subtle individual rules in human judgment cannot compensate for inconsistencies’ detrimental effects.

Spontaneous variable accumulation occurs naturally because the candidate presents additional information during the interview. Just like the kid who didn’t know she wanted ice cream until she walked past the ice cream parlor, we may not have known we wanted a candidate to have a characteristic until it came up in the process. I experienced this as a candidate for a job as a psychologist in my early 30s. The initial small talk revealed that I was a decent softball player and that this institution did not win a game the previous year. Six months after hiring me (while enjoying a beer after the opening day softball win), I learned that my ability to hit with power to the opposite field factored into the hiring decision. Like a kid with ice cream, they did not realize softball skills would be important until it came up spontaneously.

The second obstacle to accurate evaluation is the data. The data received is provided by a candidate whose primary goal is filtering out the negative. Chris Rock summarized this issue by saying, “when you meet somebody for the first time, you’re not meeting them, you are meeting their representative.” The information is filtered a second time internally as we consciously and unconsciously choose what to pay attention to. No one is immune to this. I interviewed a woman who told me during the interview that she wanted to take over my role if I hired her, but I ignored the statement. A month later, I learned she had started backchanneling with other team members to undermine me. I so badly wanted another strong woman on our team that I filtered out one of essential pieces of information.

A Twist: A Model of You is Better than You at Hiring.

Humans have a crucial advantage over formulas and machines, judgment. Algorithms and technology inform decisions but cannot make them.

If the advantage of our humanity is judgment, and our disadvantage is inconsistency and data distortion, why not use technology to improve data accuracy and remove inconsistency?

The idea has been introduced previously. According to Daniel Kahneman, a review of fifty years of research revealed that models of judges consistently outperformed the judges they modeled. In his book “Noise” he used researcher Lewis Goldberg’s study as an example to show how this pairing was possible and successful. Goldberg modeled the characteristics important to the hiring manager and applied that model to decision-making. He then pitted the model of the hiring manager against the original, and the model won.

Kahneman says replacing the hiring manager with a model based on the hiring manager eliminates two forms of inconsistency. The first he called subtlety (individual interpretation of data and data combinations), and the second pattern noise (personal biases).

Garbage In/ Garbage Out

If you are still reading, I hope it is because I convinced you that human judgment is good at identifying success factors, and technology is good at managing inconsistency. Furthermore, integrating human judgment and technology brings the best of both worlds.

I have worked with dozens of Human Resource Professionals and Hiring managers whose excellent judgment created these models. These are the top ten variables they included in their models (based on work with 32 clients, presented alphabetically).

Attitude– They want positive people who are proactive and take responsibility.
Agility– They want someone who can pivot in response to a constantly changing economic climate.
No Assholes– They want to minimize the impact of toxic behaviors on their culture.
Collaborative– They want people who enjoy working in teams, especially cross-functional ones, but can also work autonomously.
Critical Thinking and Complex Problem Solving– They want intelligent people who can think through a plan and solve issues.
Curiosity– They want people with a desire to learn and grow.
Energy– They want candidates who get things done.
Perfectionism (in moderation)- They want someone who manages details but still has a bias to action.
Self Awareness and Emotional Intelligence– They believe these are core leadership skills from which other skills arise.
Sensitivity (in moderation)- They want empathetic candidates who are not so sensitive that they are distracted or distracting.
Stress Management– They want people who rise up when things are difficult
Unconventional (but not too much)- They want someone who provides a different perspective but will still conform to the culture.

Several more characteristics, like entrepreneurialism or rule-based thinking, fit with some companies but not others. However, this list is almost universal to our clients.

Now it should be easy. All we need is a model of these attributes to make better decisions.

But where do we get the candidates’ data? Remember when we discussed how candidates filter information and hiring managers misinterpret it? The data source is a problem.

We could improve data accuracy by reviewing someone’s background and history. Unfortunately, this solution bumps into two problems. First, the data provided by the candidate is often inaccurate. According to Daniel Steingold, 34% of linked-in profiles contain significant lies, and 11% are almost entirely made up. (It’s worse when you consider resumes where 55% of Americans admit to lying.) Second, even if the data is accurate, we infer the characteristics from their history. When humans make inferences, the problems of subtlety and pattern noise reduce accuracy.

We could invest in background checks, including the veracity of diplomas and certifications. But this is expensive and time-consuming and solves the accuracy issues but not the inconsistency of interpretation.

Many companies solve the problem by assessing personality. This trend is rising:

About 80 million people complete a personality test yearly.
80% of Fortune 500 companies use personality assessments for hiring.
Industry analysts expect personality testing to be a $6.5 billion industry by 2027.

This approach is an excellent application of technology to reduce (but not eliminate) human inconsistency. However, a recent NYT article highlighted how outdated most popular personality assessments are and the need for an update, a sentiment echoed by Adam Grant of Wharton in his article criticizing the MBTI. Even modernized personality tests require humans to infer behavior from the results. This inference is complex because only some of the criteria from our list are related primarily to personality.

I believe a Psychometric, an advanced cousin of the personality test, is the best way to pair technology with human judgment to improve hiring decisions. A good Psychometric takes years to build, and few effective ones exist. It took me 15 years and required volunteers to invest 14,000 hours in developing Psybil, Psynet Group’s Psychometric platform.

For a psychometric to be successful, there are a few basic requirements:

The items should be selected using a principle components analysis. This statistic uses similar math as artificial intelligence. This process increases validity and reliability and is much harder for the candidate to game or trick.
The scales should directly predict the candidate’s attributes and not infer them from a secondary personality trait unless the two are highly correlated.
The reporting should be direct about strengths and flaws. Many psychometric developers temper bad news, which has the unintended consequence of misinterpretation.
It should be customizable to take advantage of humanity’s core decision-making strength: Judgment.

Conclusion

I assume you have read this far because you know how important hiring decisions are to the success of an organization. It is also likely that you have been surprised by how different the employee who came to work is from the candidate who interviewed.

I also assume you are unwilling to hand over hiring decisions to your laptop or a mathematical formula.

Hopefully, this article has given you a path forward. Your hiring decisions will improve significantly by doing two things:

Creating a model of you that takes advantage of your judgment while minimizing your inconsistencies.
Implement a tool or process that provides the most accurate candidate data.

Curious about how to make this happen at your company? Reach out to us at Psynet Group at [email protected].

References that could not be linked.

Kahneman, Daniel; Sibony, Olivier; Sunstein, Cass R.. Noise. Little, Brown and Company.

Paul E. Meehl (1954). Clinical versus statistical prediction: A theoretical analysis and a review of the evidence. University of Minnesota Press.