Inter-rater reliability

The consistency of scores when different raters or observers evaluate the same responses, behaviors, or material.

Inter-rater reliability is the consistency of scores or judgments when different raters evaluate the same material. It matters most for assessments that require human scoring — clinical interviews, behavioral observations, projective tests, or coding of open-ended responses.

Common statistics for inter-rater reliability include Cohen's kappa (for categorical judgments), intraclass correlation coefficients (for continuous ratings), and percent agreement. Self-report assessments don't have inter-rater reliability in the usual sense, because there's only one rater: the respondent.

Reliability — The consistency of an assessment's results across repeated administrations and across items.
Test-retest reliability — The consistency of an assessment's results when the same person takes it twice, separated by time.
Self-report — An assessment format in which respondents rate their own thoughts, feelings, or behaviors.

Inter-rater reliability

Related terms