Glossary

Plain-language definitions of key terms used in psychological assessment — validity, reliability, Cronbach's alpha, subscales, and more.

Validity

The extent to which a psychological assessment measures what it claims to measure.

Reliability

The consistency of an assessment's results across repeated administrations and across items.

Cronbach's alpha (α)

A statistical measure of how internally consistent the items on a scale are; ≥ 0.7 is typically acceptable.

Subscale

A group of items within a larger assessment that measures one specific facet of the overall construct.

Likert scale

An ordered response format (often 5- or 7-point) for rating agreement, frequency, or intensity.

Internal consistency

The degree to which items on a scale correlate with one another, suggesting they measure the same construct.

Test-retest reliability

The consistency of an assessment's results when the same person takes it twice, separated by time.

Construct validity

Evidence that an assessment actually measures the abstract concept (construct) it claims to measure.

Self-report

An assessment format in which respondents rate their own thoughts, feelings, or behaviors.

Cutoff score

A specific value on an assessment used to classify respondents into categories. Not a diagnosis.

Face validity

Whether an assessment appears, on the surface, to measure what it claims to — based on inspection rather than statistical evidence.

Content validity

Whether the items on an assessment cover the full content domain of the construct being measured.

Criterion validity

Whether scores on an assessment correlate with a meaningful external outcome — the criterion.

Convergent validity

Evidence that an assessment correlates strongly with other measures of the same construct.

Discriminant validity

Evidence that an assessment does NOT correlate strongly with measures of unrelated constructs.

Inter-rater reliability

The consistency of scores when different raters or observers evaluate the same responses, behaviors, or material.

Standard error of measurement (SEM)

An estimate of how much a person's observed score is expected to vary from their true score due to measurement error.

Norms

Reference distributions of scores from a defined population, used to interpret an individual's score in context.

Standardization

Administering, scoring, and interpreting an assessment under uniform conditions, with reference norms from a defined sample.

Percentile rank

The percentage of scores in a reference group that fall at or below a given score.

Raw score

The direct, untransformed score from an assessment — typically the sum of item responses.

T-score

A standardized score with a mean of 50 and a standard deviation of 10, commonly used in psychological testing.

Z-score

A standardized score expressing how many standard deviations a raw score is from the mean.

Normal distribution

A symmetric, bell-shaped probability distribution that describes how many psychological traits are spread across a population.

Factor analysis

A family of statistical methods for identifying the underlying latent factors that explain correlations among observed items.

Social desirability bias

The tendency of respondents to answer in ways that present themselves favorably rather than accurately.

Acquiescence bias

The tendency of respondents to agree with statements regardless of content, inflating scores on positively worded items.

Response style

Systematic ways respondents answer items that aren't driven by item content — patterns like agreeing-with-everything or always picking the middle.

Reverse-scored item

An item worded so that endorsement indicates the opposite of the construct being measured; its score is flipped during calculation.

Forced-choice item

An item format that asks respondents to choose between options designed to be equally desirable, reducing social desirability effects.

Ceiling effect

When too many respondents score at or near the maximum, the assessment can't distinguish between them or detect further increase.

Floor effect

When too many respondents score at or near the minimum, the assessment can't distinguish between them or detect further decrease.

Sensitivity

The proportion of true cases that an assessment correctly identifies — also called the true positive rate.

Specificity

The proportion of true non-cases that an assessment correctly identifies as negative — also called the true negative rate.

Screening

Using a brief assessment to identify people who may benefit from further evaluation — not to diagnose.

Effect size

A quantitative measure of the magnitude of a difference or relationship — independent of sample size.