Adaptive test
An adaptive test adjusts the difficulty of subsequent questions based on your answers to previous ones. Get a question right and the next one is harder. Get it wrong and the next one is easier. SHL Verify Interactive and Talent Q Elements are the best-known adaptive tests in hiring.
Adaptive tests converge on your ability level faster than static tests, typically within 20 to 30 questions. They also change the strategic calculus: skipping or leaving blank is usually treated as wrong, which eliminates the skip-and-return tactic that works on static tests.
Raw score
Raw score is the count of correct answers. It is the simplest scoring method and the least informative in isolation. A raw score of 28 on a 50-question test is meaningless without the norm group: it could be the 60th percentile or the 90th depending on who else took the test.
Always convert raw scores to percentile or role-mapped target scores before trying to interpret them.
Percentile
Percentile is your rank against the norm group. The 70th percentile means you scored better than 70 percent of the comparison population. This is the number employers use for most hiring decisions because it is comparable across tests and across norm groups.
Percentile scores compress at the top. The difference between the 95th and 97th percentiles represents a much larger raw score gap than the difference between the 50th and 52nd. Keep that in mind when interpreting high scores.
Norm group
Norm group is the reference population against which your raw score is compared to generate a percentile. Norm groups can be general applicants, role-specific applicants, country-specific applicants, or custom subgroups defined by the employer.
The norm group choice matters enormously. A 70th percentile against a general applicant norm group is not the same as a 70th percentile against an MBA-only norm group. When comparing scores, always ask about the underlying norm group.
Cutoff score
Cutoff score is the minimum score required to advance in the hiring process. Below the cutoff, your application ends. Cutoffs are usually proprietary and not disclosed to candidates, though vendor-published role-family bands provide a reasonable estimate.
Cutoffs can be set at raw score thresholds, percentile thresholds, or role-mapped target scores. The underlying logic is the same: a gate you must clear to continue.
Situational Judgment Test (SJT)
A Situational Judgment Test presents workplace scenarios and asks you to rank or select responses. SJTs are common in consulting, professional services, and graduate recruitment. They are technically not cognitive ability tests, though they often appear alongside them.
SJTs usually have defensible correct answers developed through expert consensus, though the scoring can reward partial correctness when you rank an acceptable but not optimal response.
Speeded test
A speeded test is one where most candidates cannot finish within the time limit. CCAT, Wonderlic, and PI Cognitive are all speeded. The design intent is to measure speed as well as accuracy.
Speeded tests reward pacing strategy, skip discipline, and rapid triage. Trying to finish a speeded test without the time to spare is usually a losing strategy because accuracy collapses at pace.
Power test
A power test is one where time is generous but question difficulty increases. The design intent is to measure maximum capability rather than speed. Watson-Glaser trends power-test-like because the time limit is more generous than most speeded tests.
Pure power tests are rare in hiring because of their length. Most cognitive tests are predominantly speeded, with a power element at the hardest questions.
Validity
Validity measures how well a test predicts what it claims to predict. In hiring, the most important variety is predictive validity: how well the test score predicts on-the-job performance.
Cognitive ability tests have the highest predictive validity of any hiring tool, with validity coefficients around 0.51 across meta-analyses. Interviews sit around 0.4. Reference checks are below 0.3. Personality tests vary from 0.1 to 0.4 depending on the trait and the role.
Reliability
Reliability measures how consistently a test produces similar scores for the same candidate across separate attempts. The major vendors typically publish reliability coefficients above 0.85, which means test scores are stable across reasonable time windows.
A test cannot have predictive validity without reliability. Low-reliability tests produce noisy scores that cannot predict anything useful.
Proctored and unproctored
Proctored tests are supervised, either by a live human or by AI monitoring. Unproctored tests are self-administered, usually at home, with no live supervision. Unproctored tests are often followed by proctored verification to confirm results.
The distinction matters because proctored tests have stricter environment requirements, and unproctored tests have higher distraction risk. Both come with scoring integrity considerations.