Assessment Glossary

Core Assessment & Evaluation Terminology

Terminology from the National Council on Measurement in Education (NCME)

(Selected terms and their definitions are provided below.  For the full glossary of terms, see )

  • Assessment - A tool or method of obtaining information from tests or other sources about the achievement or abilities of individuals. Often used interchangeably with test
  • Authentic assessment - An assessment containing items that are judged to be measuring the ability to apply and use knowledge in real-world contexts.
  • Bias - Systematic errors in test content, test administration, and/or scoring procedures that can cause some test takers to get either lower or higher scores than their true ability would merit. The source of the bias is irrelevant to the trait the test is intended to measure. (See also error score).
  • Criterion-referenced score interpretation - An interpretation that involves comparing a test taker’s score with an absolute standard, or an ordered set of performance descriptions, rather than scores of other individuals. Comparing to a cut score, an expectancy table, or ordered set of behavior descriptions are examples. These are contrasted with norm-referenced score interpretations.
  • Cut score - The point on a score scale that differentiates the interpretations made about those scoring above it from those scoring below it. Pass-fail, accepted-rejected, and proficient-not proficient are examples. Cut scores also are known as cutoff scores.
  • Derived score - A score scale to which raw scores are converted to enhance their interpretation. Examples are percentile ranks, standard scores, and grade-equivalent scores.
  • Evaluation The process of gathering information to make a judgment about the quality or worth of some program or performance. The term also is used to refer to the judgment itself, as in “My evaluation of his work is . . . .” 
  • Extraneous variance - The variability in test scores that occurs among individuals in a group because of differences in those persons that are irrelevant to what the test is intended to measure. For example, a science test that requires mathematics skills and a reading ability beyond what its content domain specifies will have two sources of extraneous variance. In this case, students’ science scores might differ, not only because of differences in their science achievement, but also because of differences in their (extraneous) mathematics and reading abilities. (See also construct irrelevance.)
  • Formative use of assessments - The use of assessments during the instructional process to monitor the progress of learning and the effectiveness of instruction so that adjustments can be made, as needed. This use is contrasted with the summative use of assessments.
  • Objective test - A test containing items that can be scored without any personal interpretation (subjectivity) required on the part of the scorer. Tests that contain multiple choice, true-false, and matching items are examples. (See also subjective test.)
  • Summative use of assessments -  Using assessments at the end of an instructional segment to determine the level of students’ achievement of intended learning outcomes or whether learning is complete enough to warrant advancing the student to the next segment in the sequence. This is contrasted with formative use of assessments.
  • Test - An evaluation instrument, usually composed of questions or items, which have right answers or best answers, that is used to measure an individual’s aptitude or level of achievement in some domain. Tests are usually distinguished from inventories, questionnaires, and checklists as evaluation devices.

Last Update: 2/1/2016