OPM ADG, Section VI: Annotated References, Part II
Section VI: Annotated References (Part II)
Dubois, D., Shalin, V. L., Levi, K. R., & Borman, W.
C. (1993). Job knowledge test design: A cognitively-oriented approach. U.S. Office of Naval Research Report, Institute Report 241, i-47.
This study applied cognitive methods to the measurement of
performance using tests of job knowledge. The research goal was to improve the
usefulness of job knowledge tests as a proxy for hands-on performance. The
land navigation skills of 358 Marines were tested with a written job knowledge
test consisting of multiple-choice questions, hands-on proficiency tests, and a
work-sample performance test. Results indicate cognitively-oriented job
knowledge tests show improved correspondence with hands-on measures of
performance, compared with existing content-oriented test development
Dye, D. A., Reck, M., & McDaniel, M. A. (1993). The
validity of job knowledge measures. International Journal of Selection and
Assessment, 1, 153-157.
The results of this study
demonstrated the validity of job knowledge tests for many jobs. Job knowledge
was defined as the "cumulation of facts, principles, concepts, and other pieces
of information considered important in the performance of ones job" (p.
153). In their meta-analysis of 502 validity coefficients based on 363,528
individuals, they found high levels of validity for predicting training and job
Ree, M. J., Carretta, T. R.,
& Teachout, M. S. (1995). Role of ability and prior job knowledge in
complex training performance. Journal of Applied Psychology, 80(6),
A causal model of the role of
general cognitive ability and prior job knowledge in subsequent job knowledge
acquisition and work sample performance during training was developed.
Participants were 3,428 U.S. Air Force officers in pilot training. The
measures of ability and prior job knowledge came from the Air Force Officer
Qualifying Test. The measures of job knowledge acquired during training were
derived from classroom grades. Work sample measures came from check flight
ratings. The model showed ability directly influenced the acquisition of
job knowledge. General cognitive ability influenced work samples through job
knowledge. Prior job knowledge had almost no influence on subsequent job
knowledge but directly influenced the early work sample. Early training job
knowledge influenced subsequent job knowledge and work sample performance.
Finally, early work sample performance strongly influenced subsequent work sample
Roth, P. L., Huffcutt, A. I.,
& Bobko, P. (2003). Ethnic group differences in measures of job
performance: A new meta-analysis. Journal of Applied Psychology, 88(4),
The authors conducted a
meta-analysis of ethnic group differences in job performance. Analyses of
Black-White differences within categories of job performance were conducted and
subgroup differences within objective and subjective measurements were
compared. Contrary to one perspective sometimes adopted in the field,
objective measures are associated with very similar, if not somewhat larger,
standardized ethnic group differences than subjective measures across a variety
of indicators. This trend was consistent across quality, quantity, and
absenteeism measures. Further, work samples and job knowledge tests are
associated with larger ethnic group differences than performance ratings or
measures of absenteeism. Analysis of Hispanic-White standardized differences
shows they are generally lower than Black-White differences in several
Sapitula, L., & Shartzer,
M. C. (2001). Predicting the job performance of maintenance workers using a job
knowledge test and a mechanical aptitude test. Applied H.R.M. Research, 6(1-2),
This study examined the
predictive validity of the Job Knowledge Written Test (JKWT) and the Wiesen
Test of Mechanical Aptitude (WTMA, J. P. Wiesen, 1997), and the effects of
race, gender, and age on scores. A total of 782 applicants completed the JKWT
and the WTMA, and 102 maintenance workers were administered the JKWT, the WTMA,
and a job performance appraisal. Results show no significant relationship
between job performance ratings and either the JKWT or WTMA. Male applicants
scored higher than did female applicants and White applicants scored higher
than did minority applicants.
Barrick, M. R., & Mount, M. K. (1991). The Big Five
personality dimensions and job performance: A meta-analysis. Personnel
Psychology, 44, 1-26.
Investigated the relation of the
"Big Five" personality dimensions to three job performance criteria (job
proficiency, training proficiency, and personnel data) for five occupational
groups (professionals, police, managers, sales, and skilled/semi-skilled). A
review of 117 studies yielded 162 samples totaling 23,994 subjects.
Conscientiousness showed consistent relations with all job performance criteria
for all occupational groups. Extraversion was a valid predictor for two
occupations involving social interaction (managers and sales). Also, openness
to experience and extraversion were valid predictors of the training
proficiency criterion across occupations. Overall, results illustrate the
benefits of using the five-factor model of personality to accumulate empirical
findings. Study results have implications for research and practice in
Hogan, R., Hogan, J., & Roberts, B. W. (1996). Personality
measurement and employment decisions: Questions and answers. American
Psychologist, 51, 469-477.
Summarizes information needed to answer the most frequent
questions about the use of personality measures in applied contexts.
Conclusions are (1) well-constructed measures of normal personality are valid
predictors of performance in virtually all occupations, (2) they do not result
in adverse impact for job applicants from minority groups, and (3) using
well-developed personality measures for pre-employment screening is a way to
promote social justice and increase organizational productivity.
Hough, L. M., Eaton, N. K., Dunnette, M. D., Kamp, J. D.,
& McCloy, R. A. (1990). Criterion-related validities of personality
constructs and the effect of response distortion on those validities. Journal
of Applied Psychology, 75, 581-595.
An inventory of six personality constructs and four response
validity scales measuring accuracy of self-description were administered in
three contexts: a concurrent criterion-related validity study, a faking
experiment, and an applicant setting. Results showed (a) validities were in
the .20s against targeted criterion constructs, (b) respondents successfully
distorted their self-descriptions when instructed to do so, (c) response
validity scales were responsive to different types of distortion, (d)
applicants responses did not reflect evidence of distortion, and (e) validities
remained stable regardless of possible distortion.
Hough, L. M., & Oswald, F. L. (2000). Personnel
selection: Looking toward the future — Remembering the past. Annual Review of
Psychology, 51, 631-664.
Reviews personnel selection
research from 1995-1999. Areas covered are job analysis; performance criteria;
cognitive ability and personality predictors; interview, assessment center, and
biodata assessment methods; measurement issues; meta-analysis and validity
generalization; evaluation of selection systems in terms of differential
prediction, adverse impact, utility, and applicant reactions; emerging topics
on team selection and cross-cultural issues; and finally professional, legal,
and ethical standards. Three major themes are revealed: (1) better taxonomies
produce better selection decisions; (2) the nature and analyses of work
behavior are changing, influencing personnel selection practices; (3) the field
of personality research is healthy, as new measurement methods, personality
constructs, and compound constructs of well-known traits are being researched
and applied to personnel selection.
Tett, R. P., Jackson, D. N, & Rothstein, M. (1991).
Personality measures as predictors of job performance: A meta-analytic review. Personnel
Psychology, 44, 703-742.
Based on 97 independent samples, a meta-analysis was used to
(a) assess overall validity of personality measures as predictors of job
performance, (b) investigate moderating effects of several study
characteristics on personality scale validity, and (c) appraise predictability
of job performance as a function of eight categories of personality content.
Results indicated studies using confirmatory research strategies produced
corrected mean personality scale validity more than twice as high as studies
adopting exploratory strategies. An even higher mean validity was obtained
based on studies using job analysis explicitly in selection of personality
Aamodt, M. G. (2006). Validity of recommendations and
references. Assessment Council News, February, 4-6.
Reference data are subject to inflation and low reliability
and generally reach only moderate levels of predictive validity. Even so,
organizations are encouraged to check the references of their applicants
because of widespread resume fraud and potential liability in the form of
Taylor, P. J., Pajo, K., Cheung, G. W., &
Stringfield, P. (2004). Dimensionality and validity of a structured telephone
reference check procedure. Personnel Psychology, 57, 745-772.
Reports that reference checking, when properly structured,
can prevent defamation litigation and add significant value to the selection
process. Specifically tests the hypothesis that utilizing a structured,
competency-based approach to reference checking can increase the predictive
validity of ratings in much the same way as structuring the employment
interview process. A structured job analysis was used to identify the core
job-related competencies deemed essential to effective performance in a family
of customer-contact jobs within a 10,000-employee service organization. These
competencies (Commitment, Teamwork, and Customer Service) were incorporated
into a structured reference form and contacts were asked to rate applicants on
a number of behavioral indicators within each competency. A structured telephone
interview with contacts was then used to obtain evidence of actual occurrences
to support the ratings. Results indicated using a structured telephone
reference check increased the employer's ability to predict future job
performance. Results also indicated a shorter contact-applicant relationship
does not undermine predictions of future job performance.
U.S. Merit Systems Protection Board. (2005). Reference checking
in federal hiring: Making the call. Washington, DC: Author. Note: Report
Hiring officials should check references. The quality of reference checking can be improved by insisting job
applicants provide at least three references who have observed their
performance on the job. Supervisors should discuss the performance of
their current and former employees with prospective employers. Some former
supervisors will only provide basic facts about work histories (e.g.,
employment dates and positions held) because they are concerned with protecting
the privacy of former employees. Their concern is understandable but need not
interfere with reference checking. So long as reference checking discussions
focus on job-related issues such as performance, reference giving is
appropriate and legally defensible. Former supervisors who support reference
checking inquiries can reward good employees for their past contributions and
avoid "passing on" a problem employee to another agency. Agency human
resources personnel can work to remove barriers to effective reference
checking. For example, applicants should be required to complete Declaration
of Federal Employment (OF-306) forms early in the application process. This
form explicitly grants permission to check references. And this sets
applicants' expectations appropriately — their performance in previous employment
will be investigated.
Hanson, M. A., Horgen, K. E.,
& Borman W. C. (1998, April). Situational judgment tests (SJT) as measures
of knowledge/expertise. Paper presented as the 13th Annual Conference of the
Society for Industrial and Organizational Psychology, Dallas, TX.
This paper discusses the situational judgment test (SJT)
methodology and reasons for its popularity. This paper also investigates the
nature of the construct(s) measured by these tests, why they are valid, when
they are valid, and why they are sometimes not valid. The authors propose the
SJT methodology is best suited for measuring knowledge or expertise, and
discusses available construct validity evidence consistent with this
perspective. This perspective generates several testable hypotheses, and
additional research is proposed. Finally, the implications of this perspective
for the development of valid and useful SJTs are discussed.
McDaniel, M. A., Morgeson, F. P, Finnegan, E. B, Campion,
M. A., & Braverman, E. P. (2001). Use of situational judgment tests to
predict job performance: A clarification of the literature. Journal of
Applied Psychology, 86, 730-740.
This article reviews the history
of situational judgment tests (SJT) and presents the results of a meta-analysis
on criterion-related and construct validity. SJTs showed useful levels of
validity across all jobs and situations studied. The authors also found a
relatively strong relationship between SJTs and cognitive ability and the
relationship depended on how the test had been developed. On the basis of the
literature review and meta-analytic findings, implications for the continued
use of SJTs are discussed, particularly in terms of recent investigations into
McDaniel, M. A., Whetzel, D. L., & Nguyen, N. T. (2006).
Situational judgment tests for personnel selection. Alexandria, VA: IPMA Assessment Council.
Employers should take into account several factors before
choosing to develop their own in-house situational judgment tests (SJTs). For
example, SJT developers must make a number of decisions about the content of
items, response options, response instructions, and answer key. This monograph
also describes the major steps in building a situational judgment test such as
conducting a critical incident workshop, creating item stems from critical
incidents, generating item responses, developing item response instructions,
and choosing among several scoring key methods.
Motowidlo, S. J., Dunnette,
M. D., & Carter, G. W. (1990). An alternative selection procedure: The
low-fidelity simulation. Journal of Applied Psychology, 75,
A low-fidelity simulation was
developed for selecting entry-level managers in the telecommunications
industry. The simulation presents applicants with descriptions of work
situations and five alternative responses for each situation. Applicants
select one response they would most likely make and one they would least likely
make in each situation. Results indicated simulation scores correlated from
.28 to .37 with supervisory ratings of performance. These results show samples
of even hypothetical work behavior can predict performance.
Motowidlo, S. J., &
Tippins, N. (1993). Further studies of the low-fidelity simulation in the form
of a situational inventory. Journal of Occupational and Organizational
Psychology, 66, 337-344.
Authors examined two studies
that extend the results of S. J. Motowidlo et al (1990) by providing further
evidence about relations between situational inventory scores, job performance,
and demographic factors. Combined results from both studies yield an overall
validity estimate of .20, with small differences between race and sex
subgroups, and confirm the potential usefulness of the low-fidelity simulation
in the form of a situational inventory for employee selection.
Weekley, J. A., & Jones, C. (1999). Further studies
of situational tests. Personnel Psychology, 52(3), 679-700.
Results are reported for two
different situational judgment tests (SJTs). Across the two studies, situational
test scores were significantly related to cognitive ability and experience. In
one study, there was a slight tendency for experience and cognitive ability to
interact in the prediction of situational judgment, such that cognitive ability
became less predictive as experience increased. Situational judgment fully
mediated the effects of cognitive ability in one study, but not in the other.
SJT race effect sizes were consistent with past research and were smaller than
those typically observed for cognitive ability tests. The evidence indicates
situational judgment measures mediate a variety of job relevant skills.
Campion, M. A., Palmer, D. K., & Campion, J. E.
(1997). A review of structure in the selection interview. Personnel
Psychology, 50(3), 655-702.
Reviews the research literature
and describes and evaluates the many ways selection interviews can be
structured. Fifteen components of structure are identified which may enhance
either the content of or the evaluation process in the interview. Each
component is critiqued in terms of its impact on numerous forms of reliability,
validity, and user reactions. Finally, recommendations for research and
practice are presented. The authors conclude interviews can be easily
enhanced by using some of the many possible components of structure, and the
improvement of this popular selection procedure should be a high priority for
future research and practice.
Conway, J. M., Jako, R. A., & Goodman, D. F. (1995).
A meta-analysis of interrater and internal consistency reliability of selection
interviews. Journal of Applied Psychology, 80(5), 565-579.
A meta-analysis of 111
inter-rater reliability coefficients and 49 coefficient alphas from selection
interviews was conducted. Moderators of inter-rater reliability included study
design, interviewer training, and three dimensions of interview structure
(standardization of questions, of response evaluation, and of combining
multiple ratings) such that standardizing questions increased reliability of
ratings more for individual vs. panel interviews, and multiple ratings were
useful when combined mechanically (there was no evidence of usefulness when
combined subjectively), and standardization of questions and number of ratings
made resulted in greater levels of validity. Upper limits of validity were
estimated to be .67 for highly structured interviews and .34 for unstructured
Huffcutt, A I., & Arthur, W. (1994). Hunter and
Hunter (1984) revisited: Interview validity for entry-level jobs. Journal of
Applied Psychology, 79(2), 184-190.
By adopting the theory of
planned behavior, this study tried to predict human resources managers'
intentions toward unstructured and structured interview techniques. Managers
evaluated case descriptions of both techniques and were interviewed about their
own practices. The data revealed stronger intentions toward unstructured
interviewing than toward structured interviewing, which was consistent with
their own practices in selecting staff, which appeared to be rather
unstructured. Ajzen's (1991) theory appeared to be a useful framework for
predicting managers' intentions. In particular, attitudes and subjective norms
were predictive of intentions to engage in either method. Only intentions
toward the unstructured case were related to managers' actual behavior.
Huffcutt, A. I., & Roth,
P. L. (1998). Racial group differences in employment interview evaluations. Journal
of Applied Psychology, 83(2), 179-189.
The purpose of this
meta-analysis was to research the various factors that can play a role in
racial group differences resulting from an interview, such as the level of
structure in the interview, job complexity, etc. Results suggest, in general,
employment interviews do not affect minorities as much as other assessments
(i.e., mental ability tests). Moreover, structured interviews tend to limit or
decrease the influence of bias and stereotypes in ratings. High job complexity
resulted in mean negative effect sizes for Black and Hispanic applicants,
meaning they received higher overall ratings than White applicants. Behavior
description interviews averaged smaller group differences than situational
interviews, and group differences tended to be larger when there was a larger
percentage of a minority (i.e., Black or Hispanic) in the applicant pool.
Huffcutt, A. I., Weekley, J. A., Wiesner, W. H., DeGroot,
T. G., & Jones, C. (2001). Comparison of situational and behavior
description interview questions for higher-level positions. Personnel
Psychology, 54(3), 619-644.
This paper discusses two
structured interview studies involving higher-level positions (military officer
and a district manager) and had matching situational interviews and behavior
description interviews (BDI) questions written to assess the same job
characteristics. Results confirmed results of previous studies finding
situational interviews are less effective for higher-level positions than
BDIs. Moreover, results indicated very little correspondence between
situational and behavior description questions written to assess the same job
characteristic, and a link between BDI ratings and the personality trait
Extroversion. Possible reasons for the lower situational interview
effectiveness are discussed.
McFarland, L. A., Ryan, A.
M., Sacco, J. M., Kriska, S. D. (2004). Examination of structured interview
ratings across time: The effects of applicant race, rater race, and panel
composition. Journal of Management, 30(4), 435-452.
This study looked at the effect
of race on interview ratings for structured panel interviews (candidates were
interviewed and rated by three raters of varying races). Results indicated
panel composition produced the largest effect. Specifically, predominately
White panels provided significantly more favorable ratings (of all candidates,
regardless of race) than panels which consisting of predominately Black
raters. Panel composition also affected ratings, such that Black raters
provided higher ratings to Black candidates only when the panel was
predominately Black. However, the authors caution these effects were rather
small; therefore, the results should be cautiously interpreted.
Taylor, P., & Small, B. (2002). Asking
applicants what they would do versus what they did do: A meta-analytic
comparison of situational and past behavior employment interview questions. Journal
of Occupational & Organizational Psychology, 75(3), 277-294.
Criterion-related validities and
inter-rater reliabilities for structured employment interview studies using
situational interview (SI) questions were compared with those from studies
using behavioral description interview (BDI) questions. Validities and
reliabilities were further analyzed in terms of whether descriptively-anchored
rating scales were used to judge interviewees' answers, and validities for each
question type were also assessed across three levels of job complexity. While
both question formats yielded high validity estimates, studies using BDI
questions, when used with descriptively anchored answer rating scales, yielded
a substantially higher mean validity estimate than studies using the SI
question format with descriptively-anchored answer rating scales (.63 vs .47).
Question type (SI vs. BDI) was found to moderate interview validity. Inter-rater
reliabilities were similar for both SI and BDI questions, provided
descriptively-anchored rating scales were used, although they were slightly
lower for BDI question studies lacking such rating scales.
Lyons, T. J. (1989). Validity
of Education and Experience Measured in Traditional Rating Schedule Procedures:
A Review of the Literature. Office of Personnel Research and Development, U.S. Office of Personnel Management, Washington, DC, OPRD-89-02.
This paper reviews research on
the validity of specific education and experience measures common to
traditional rating schedule procedures used by the Federal Government. The
validity of each measure is discussed and recommendations for rating schedule
use are offered.
Lyons, T. J. (1988). Validity
Research on Rating Schedule Methods: Status Report. Office of Personnel
Research and Development, U.S. Office of Personnel Management, Washington, DC, OED-88-17.
This report summarizes the
findings from a series of studies conducted on rating schedule validity. The
first objective was to investigate the criterion-related validity of rating
schedules used in the Federal Government and the second was to study the
validity of three rating schedule methodologies. Results indicated little
evidence of validity for a rating schedule method based on training and
experience at either entry-level or full performance level jobs. Findings
supported the validity of a Knowledge, Skills, and Abilities (KSA)-based rating
schedule method for full performance level jobs, but not for entry level jobs.
Except for one entry level study, results indicated the most promising validity
coefficients (in the mid to upper .20's) for rating procedures employing
behavioral consistency measures for both entry and full performance level
McCauley, D. E. (1987). Task-Based
Rating Schedules: A Review. Office of Examination Development, U.S. Office of Personnel Management, Washington, DC, OED 87-15.
This paper reviews the evidence
for the validity and practicality of the task-based rating schedule (TBRS), a
self-report instrument used to assess applicants' training and experience in
relation to job required tasks. A review of the background of the TBRS and the
assumptions on which it is based are discussed. In addition, a discussion of
meta-analytic results on the predictive validity of the TBRS is provided.
McDaniel, M. A., Schmidt, F.
L., & Hunter, J. E. (1988). A meta-analysis of the validity of methods for
rating training and experience in personnel selection. Personnel Psychology,
This paper discusses a
meta-analysis of validity evidence of the methods (point, task, behavioral
consistency, grouping, and job element) used to evaluate training and
experience (T&E) ratings in personnel selection. Results indicate validity
varied with the type of T&E evaluation procedure used. The job element and
behavioral consistency methods each demonstrated useful levels of validity.
Both the point and task methods yielded low mean validities with larger
variability. Partial support was found for both the point and task methods
being affected by a job experience moderator. Moderator analyses suggested the
point method was most valid when the applicant pool had low mean levels of job
experience and was least valid with an experienced applicant pool.
Schwartz, D. J. (1977). A job sampling approach to merit
system examining. Personnel Psychology, 30(2), 175-185.
A method for collecting content validity
evidence for a merit examining process or rating schedule without violating the
principles of content validity is presented. This technique, called the job
sampling approach, is a task-based, structured system of eliciting the
information necessary to construct the rating schedule from sources most able
to provide that information, and for using the information to construct the
rating schedule and linking it to job performance. The steps include
definition of the performance domain of the job in terms of process statements,
identification of the selection and measurement objectives of the organization,
development of the measurement domain in relation to the performance domain and
to the selection and measurement objectives, and demonstration that a close
match between the performance domain and the measurement domain was in fact
Sproule, C. F. (1990). Personnel Assessment
Monographs: Recent Innovations in Public Sector Assessment (Vol 2, No. 2).
International Personnel Management Association Assessment Council (IPMAAC).
This report reviews selected assessment methods and
procedures (e.g., training and experience measures) used frequently in the
public sector during the 1980s. Many of these, including the rating schedule,
are still used today. Each section on assessments contains a variety of
examples describing public sector methods, as well as a summery of related
research findings. Other sections include discussions on selected Federal
assessment innovations, application of technology to assessment, use of test
scores, legal provisions related to assessment, and employment testing of
persons with disabilities.