Our Assessment Methodology

Ten modules ordered by what predicts success. Structured methodology with confidence scoring. Honest about what we know and what we are still learning.

Schmidt and Hunter's meta-analysis, the most comprehensive study of hiring predictors ever published by the University of Iowa, established that multi-method assessment outperforms any single hiring method. Structured interviews predict job performance better than unstructured interviews. Work sample tests predict better than references. And combinations of methods predict better than any method alone.

We built our methodology on this principle. Ten assessment modules, ordered not alphabetically but by what research says predicts failure earliest. Motivation and character first, because 89% of new hire failures are attitudinal. Cognitive ability and skills second, because these determine what someone can learn and do. Candidate-company fit third, because even a motivated, skilled person will struggle in the wrong environment.

Why does assessment order matter?

Most assessment platforms organize their modules by category: personality tests in one section, skills tests in another, cognitive assessments in a third. The order is arbitrary, driven by product organization rather than hiring science.

Our assessment order is deliberate. It follows the research on what predicts new hire failure, from most predictive to least:

Tier 1 assesses motivation and character: conscientiousness, work drive, coachability, emotional regulation, and professional integrity. These are the dimensions that account for 89% of new hire failures in the Leadership IQ study. If a candidate does not pass this tier, their skills are irrelevant.

Tier 2 assesses cognitive ability and skills: reading comprehension, arithmetic reasoning, situational judgment, and AI literacy. These dimensions predict what someone can learn and how they solve problems. Frank Schmidt's research at the University of Iowa found that general mental ability is the single strongest predictor of job performance across all occupations.

Tier 3 assesses candidate-company fit: the Big Five personality profile calibrated against the target company's work environment, with separate measurement of fit preference and fit readiness.

The order is not arbitrary. It follows what research says predicts failure earliest.

What are the ten assessment modules?

Tier 1 — Motivation and Character: Big Five personality assessment (using dimensions from Costa and McCrae's model, developed at the National Institutes of Health), work drive measurement, and an integrity and detail check that measures conscientiousness through behavioral indicators rather than self-report.

Tier 2 — Cognitive Ability and Skills: Reading comprehension, arithmetic reasoning, situational judgment tests (SJTs), and AI familiarity. The SJTs present realistic workplace scenarios and measure decision-making quality. AI familiarity is a newer module that assesses whether candidates can effectively use AI tools in their work, an increasingly relevant skill across all roles.

Tier 3 — Candidate-Company Fit: Writing assessment (measures communication clarity and professional tone), emotional intelligence scenarios, and the fit preference and fit readiness evaluation described in our approach to fit over credentials.

Four of these modules are scored algorithmically: cognitive, Big Five, work drive, and reading. The remaining six require human evaluation: situational judgment, writing, integrity and detail check, AI familiarity, and the emotional intelligence scenarios. We use human scoring for these because the responses require contextual judgment that automated scoring cannot reliably provide.

What is confidence scoring and why does it matter?

Every assessment result we produce includes a confidence score. This is our most significant departure from how other assessment platforms present results.

Most platforms give you a number: a score of 78 out of 100, or a percentile ranking. That precision implies certainty that the underlying measurement does not support. A candidate who scores 78 on a 15-item conscientiousness scale could plausibly score anywhere from 70 to 86 if they took the same assessment again. Presenting 78 as a definitive number is misleading.

Our confidence scoring makes the uncertainty explicit. You see the score AND how reliable that score is. A candidate with a conscientiousness score of 78 with high confidence (based on consistent responses across multiple measurement points) is a very different signal than a score of 78 with low confidence (based on inconsistent or limited data).

No competitor that we have found publishes confidence ranges alongside their scores. We believe this transparency is a competitive advantage, not a weakness. Hiring managers who understand the reliability of their data make better decisions than those who treat imprecise numbers as precise.

We show you how reliable each score is, not just a number. No competitor publishes confidence ranges.

How do we handle cross-cultural validity?

Cross-cultural validity is one of the hardest problems in psychometric assessment. The Big Five personality model has been validated across cultures by researchers including Geert Hofstede at Maastricht University, Robert McCrae at the National Institutes of Health, and research teams across more than 50 countries. The dimensions exist across cultures. But their expression differs.

Filipino pakikisama, the deep cultural value placed on interpersonal harmony, can suppress scores on assertiveness scales. Latin American personalismo, which emphasizes personal relationships in professional settings, can inflate agreeableness scores. Eastern European directness can appear as low agreeableness on Western-normed instruments.

We calibrate our instruments to account for these cultural patterns. This is not the same as adjusting scores, which would introduce its own biases. It means interpreting results within cultural context: a Filipino candidate who scores in the 40th percentile on assertiveness relative to Western norms may be perfectly assertive within their professional context.

We are honest about the limitations. Cross-cultural psychometric calibration is an active area of research, not a solved problem. Our calibration is based on published research and our own experience across four regions, but it is not perfect. We label our calibration approach as a work in progress and update it as we gather more data.

What are we still learning?

Assessment platforms rarely talk about their limitations. We think that is a mistake. Transparency about what you do not know is a stronger trust signal than claiming perfection.

Our cross-cultural calibration is based on published research and operational experience, but we have not yet conducted our own validation study with sufficient sample sizes to publish independent results. This is a goal, not a current capability.

Confidence scoring is our best current tool for expressing measurement uncertainty, but it does not capture all sources of error. A candidate having a bad day, misunderstanding a question due to language nuance, or gaming the assessment through social desirability bias can all affect results in ways that confidence scoring cannot fully detect.

The six human-scored modules introduce inter-rater variability. Two evaluators scoring the same writing sample will not always agree. We mitigate this through structured scoring rubrics, but we cannot eliminate it. We track inter-rater reliability and disclose it when asked.

We believe that acknowledging these limitations makes our results more trustworthy, not less. A platform that claims 99% accuracy is either lying or measuring something trivial. A platform that tells you exactly what it can and cannot measure gives you the information you need to make good hiring decisions.

What this means for your team

You get a structured, multi-dimensional assessment of every candidate, ordered by what research says matters most. You see confidence levels alongside every score so you know how much weight to put on each signal.

The assessment platform is included for every Get Claude client at no additional cost. You are not paying for software licenses. You are paying for our expertise in applying structured methodology to your specific hiring needs.

Every candidate gets the same evaluation regardless of background. No name-brand degrees required. No resume keyword filtering. Just demonstrated ability, measured through validated instruments from industrial-organizational psychology.

Frequently Asked Questions

How long does the full assessment take?+

The complete ten-module assessment takes approximately 90 to 120 minutes for a candidate to complete. It can be completed in multiple sessions. Most candidates complete it in two to three sittings. The time investment is deliberate: shorter assessments sacrifice measurement quality for convenience.

What is the scientific basis for these modules?+

The personality dimensions come from the Five Factor Model developed by Paul Costa and Robert McCrae at the National Institutes of Health. Cognitive assessment is based on general mental ability research by Frank Schmidt and John Hunter at the University of Iowa. Situational judgment tests follow the methodology established by Michael McDaniel at Virginia Commonwealth University. Each module traces to peer-reviewed research.

How do you prevent candidates from gaming the assessment?+

We use several approaches: situational judgment tests that present realistic trade-offs rather than obvious right answers, behavioral indicators that cross-validate self-report responses, consistency checks across related questions, and time-based analysis that flags unusually fast completion. No anti-gaming system is perfect, but structured assessment is significantly harder to game than unstructured interviews.

Can I use just some of the modules?+

Yes. While the full assessment provides the most comprehensive picture, you can select specific modules based on your hiring priorities. We recommend always including the Tier 1 motivation and character modules because they are the strongest predictors of success. Adding Tier 2 and 3 modules increases the reliability and breadth of the assessment.

How does this compare to other assessment platforms?+

Most assessment platforms offer modular testing with standardized scoring. Our key differences are: the deliberate assessment order based on failure prediction research, confidence scoring on every result, cross-cultural calibration for distributed teams, and transparency about limitations. We also include the platform as part of our service rather than licensing it separately.

What does the assessment cost?+

The assessment platform is included for every Get Claude client. There is no separate licensing fee, per-candidate charge, or module pricing. You pay for our hiring and team development services, and the assessment platform is the tool we use to deliver them.

Ready to hire differently?

Tell us what you need. We will follow up within two business days.

Request Invite