Utility-analysis taxonomy for personnel selection
Source:vignettes/utility-analysis-taxonomy.Rmd
utility-analysis-taxonomy.RmdPurpose
The package is organised around a diagnostic question: what kind of selection problem are you analysing? Many disagreements in the utility-analysis literature arise because a method designed for one type of problem is applied to another. Taylor and Russell (1939) framed utility as a classificatory success-ratio problem. Naylor and Shine (1965) moved the focus to expected criterion gain in standard-deviation units. Brogden (1946, 1949) and Cronbach and Gleser (1965) developed the decision-theoretic monetary formulation. Boudreau (1983, 1991) added economic realism through discounting, taxes, and employee flows. Holling (1998), Sturman (2000, 2001), Thomas, Owen, and Gunst (1977), and Ock and Oswald (2018) each contributed substantive corrections or extensions. The taxonomy used here is intended to make the model-problem match explicit before any number is reported.
The two-by-two taxonomy
The taxonomy crosses two dimensions:
- Criterion scale: is the criterion treated as continuous/monetary, or dichotomised into success/failure?
- Selection structure: is selection compensatory, based on a composite score, or conjunctive/multiple-hurdle, based on passing multiple cutoffs or stages?
model_taxonomy()
#> criterion_scale
#> 1 classification/dichotomized
#> 2 classification/dichotomized
#> 3 classification/dichotomized
#> 4 continuous/monetary
#> 5 continuous/monetary
#> 6 continuous/monetary
#> 7 continuous/monetary
#> 8 multi-attribute
#> selection_structure
#> 1 compensatory or single predictor
#> 2 conjunctive multiple-hurdle with specified marginal cutoffs
#> 3 conjunctive multiple-hurdle with target joint selection ratio
#> 4 compensatory top-down
#> 5 incremental compensatory system
#> 6 sequential/multiple-hurdle simulation
#> 7 diagnostics and SDy input estimation
#> 8 multiple criteria
#> model_family
#> 1 Taylor-Russell (1939)
#> 2 Thomas-Owen-Gunst (1977)
#> 3 Thomas-Owen-Gunst tables / equal-cutoff design
#> 4 Naylor-Shine / BCG / SHP / Boudreau
#> 5 Sturman-style restricted canonical validity
#> 6 Ock-Oswald-style comparison
#> 7 Holling-style empirical checks and SDy estimation
#> 8 MAUA / Pareto decision support
#> primary_functions
#> 1 tr_classic(), tr_solve()
#> 2 tr_multivariate()
#> 3 tr_multivariate_equal_cutoff(), tr_binomial_success_probability()
#> 4 naylor_shine(), bcg_utility(), shp_utility(), boudreau_utility()
#> 5 restricted_canonical_validity(), incremental_validity()
#> 6 compensatory_selection(), multiple_hurdle_selection(), multiple_hurdle_selection_staged(), compare_selection_systems_staged()
#> 7 sdy_observed(), sdy_cost_accounting(), sdy_crepid(), utility_regression_diagnostics()
#> 8 multiattribute_utility(), pareto_frontier(), utility_fairness_frontier()This yields four practical cells.
| Criterion scale | Compensatory selection | Multiple-hurdle selection |
|---|---|---|
| Dichotomised / classificatory | Taylor-Russell on one predictor or a composite | Thomas-Owen-Gunst multivariate Taylor-Russell |
| Continuous / monetary | Naylor-Shine, Brogden-Cronbach-Gleser, Boudreau, Sturman incremental validity | Stage-wise utility or simulation-based comparisons |
The four cells are not interchangeable. A selection system can have the same predictors but a different appropriate utility model depending on the decision rule. If cognitive ability, conscientiousness, biodata, and interview ratings are added into one composite and applicants are selected top-down, the system is compensatory. If applicants must first pass a cheap screening composite and only then a more expensive interview stage, the system is staged multiple-hurdle. The distinction is not cosmetic. Ock and Oswald (2018) emphasise the cost-reliability trade-off: a compensatory composite typically uses more information and yields higher expected criterion performance, whereas a multiple-hurdle system can be cheaper because expensive stages are administered only to applicants who survive earlier screens.
Why the choice of cell matters
A simple thought experiment makes the point. Suppose two predictors and correlate with each other and have validities of and against a job-performance criterion. If selection is compensatory and applicants are ranked on the equally weighted sum, the implied composite validity follows the canonical correction for predictor intercorrelation. If selection is conjunctive on the same two predictors with marginal cutoffs at the top each, the operative quantity is the joint selection ratio, which is materially smaller than and has a different positive predictive value. Reporting one number when the other is operationally relevant misrepresents the system. The package therefore encourages the analyst to specify the cell first and choose the function second.
Argument naming and conventions
The package uses readable R argument names while preserving close correspondence with the notation used in the literature.
head(argument_glossary(), 12)
#> argument notation
#> 1 base_rate BR or phi
#> 2 selection_ratio SR
#> 3 selection_ratios SR_i
#> 4 joint_selection_ratio SR_conj
#> 5 validity r_xy
#> 6 validities r_xi,y
#> 7 predictor_cor R_XX
#> 8 R R = cor(X_1,...,X_k,Y)
#> 9 sdy SD_y
#> 10 n_selected N_s
#> 11 n_treated N_treated
#> 12 n_applicants N
#> meaning
#> 1 Applicant-population probability of criterion success before selection.
#> 2 Overall proportion selected by a single cutoff or composite.
#> 3 Vector of marginal selection ratios for multiple predictors or hurdles.
#> 4 Overall conjunctive probability of passing all multiple cutoffs.
#> 5 Focal predictor-criterion validity coefficient.
#> 6 Vector of predictor-criterion correlations.
#> 7 Predictor intercorrelation matrix only.
#> 8 Full predictor-plus-criterion correlation matrix; predictors first, criterion last.
#> 9 Standard deviation of job performance in monetary or criterion units.
#> 10 Number of selected applicants/employees.
#> 11 Number of employees receiving an intervention.
#> 12 Number of applicants assessed by a selection system.
#> note
#> 1 Use base_rate rather than BR in the public API.
#> 2 Use singular form for one decision cutoff.
#> 3 Use plural form when there is one SR per predictor or stage.
#> 4 Distinct from marginal selection_ratios; central for Thomas-Owen-Gunst tables.
#> 5 More readable than rxy, but documentation always gives the r_xy mapping.
#> 6 Used when multiple predictors enter a composite.
#> 7 Do not include the criterion in predictor_cor.
#> 8 Required by tr_multivariate() and simulation helpers.
#> 9 The package uses sdy because SDy is the conventional label in the literature.
#> 10 Avoid N when the role of the count is ambiguous.
#> 11 Used in Schmidt-Hunter-Pearlman intervention utility.
#> 12 Preferred name in v0.4.0; applicant_n is accepted as a legacy alias.The most frequent names are:
base_rate: population proportion successful before
selection; classical Taylor-Russell notation often uses
or
.
selection_ratio: proportion selected by a single cutoff
or composite; classical notation uses
.
selection_ratios: vector of marginal selection ratios in
multiple-predictor or multiple-stage models.
joint_selection_ratio: overall conjunctive selection
ratio after all cutoffs.
validity: predictor-criterion validity coefficient,
usually
.
validities: vector of predictor-criterion
correlations.
sdy: standard deviation of job performance in monetary
or criterion units, usually
.
baseline_validity: validity of the operating system that
the focal procedure is being compared against (Sturman, 2000, 2001).
n_applicants, n_selected: sample sizes of
assessed and selected groups.
tenure: time horizon, usually in years.
Recommended workflow
Step 1: Specify the selection decision
Start by writing down the rule used in practice. Is selection based on a single test, a composite, several simultaneous cutoffs, or a staged process? Do not choose a utility formula before specifying the rule. A common error is to apply Brogden-Cronbach-Gleser to a problem that is operationally a multiple-hurdle decision, which masks both the joint selection ratio and the cost differential between stages.
Step 2: Specify the criterion scale
If the criterion is success/failure, use Taylor-Russell-style
functions: tr_classic(), tr_multivariate(), or
tr_multivariate_equal_cutoff(). If the criterion is
continuous or monetary, use naylor_shine(),
bcg_utility(), or boudreau_utility(). The
choice should be driven by how the organisation actually evaluates job
performance, not by computational convenience.
Step 3: Specify the baseline
Following Sturman (2000, 2001), the realistic comparator is rarely
random selection. Almost every organisation already operates with some
procedure: reference checks, unstructured interviews, biodata. Treating
random selection as the implicit baseline inflates the estimated utility
of a new procedure by approximately
on average (Sturman, 2000, 2001). Use baseline_validity in
bcg_utility() and boudreau_utility() whenever
the operating procedure is identifiable.
Step 4: Estimate or triangulate
Holling (1998) shows that
is the central vulnerability of monetary utility analysis. The package
implements the four families documented by Holling: cost accounting
(sdy_cost_accounting()), global percentile judgements
(sdy_percentile()), proportional rules
(sdy_proportional(), sdy_rbn()), and
individualised job-analysis methods (sdy_crepid(),
sdy_superior_equivalents()). Triangulating two or three of
these, rather than relying on a single estimate, is the practice
supported by the empirical comparisons reported in Bobko, Karren, and
Parkington (1983), Becker and Huselid (1992), and Hakstian, Wooley,
Woolsey, and Kryger (1991).
Step 5: Report uncertainty and sensitivity
Ock and Oswald (2018) demonstrate via Monte Carlo simulation that
utility estimates exhibit sample-to-sample variability of the same order
of magnitude as the mean effect. A point estimate without an interval is
therefore not a complete report. The package includes
utility_monte_carlo() for full uncertainty propagation,
sensitivity_grid() for exploring how the estimate varies
under perturbations of the inputs, and
break_even_validity() for computing the validity required
to break even at given costs. Cronshaw, Alexander, Wiesner, and Barrick
(1987) introduced sensitivity and break-even analysis to selection
utility precisely because point estimates of
tend to be reported with implausible precision.
Minimal examples by model family
Dichotomous success criterion
For a single predictor or a composite that has already collapsed
several predictors into one score, use tr_classic().
tr_classic(base_rate = .50, selection_ratio = .20, validity = .35)
#> <psu_tr>
#> base_rate: 0.5
#> selection_ratio: 0.2
#> validity: 0.35
#> predictor_cutoff_z: 0.841621
#> criterion_cutoff_z: 0
#> true_positive: 0.13931
#> false_positive: 0.0606895
#> false_negative: 0.36069
#> true_negative: 0.43931
#> ppv: 0.696552
#> success_ratio: 0.696552
#> incremental_success: 0.196552
#> sensitivity: 0.278621
#> specificity: 0.878621
#> digits: 3The output includes the full
table (true positives, false positives, true negatives, false
negatives), the positive predictive value, sensitivity, and specificity.
For multiple simultaneous cutoffs, use tr_multivariate()
with the predictor-criterion correlation matrix.
R <- matrix(c(
1.00, .30, .40,
.30, 1.00, .35,
.40, .35, 1.00
), nrow = 3, byrow = TRUE)
tr_multivariate(selection_ratios = c(.50, .50), base_rate = .50, R = R)
#> <psu_tr>
#> base_rate: 0.5
#> joint_selection_ratio: 0.298451
#> criterion_cutoff_z: 0
#> true_positive: 0.210393
#> false_positive: 0.0880588
#> false_negative: 0.289607
#> true_negative: 0.411941
#> ppv: 0.704948
#> success_ratio: 0.704948
#> incremental_success: 0.204948
#> sensitivity: 0.420785
#> specificity: 0.823882
#> digits: 3Continuous criterion
For expected standardised criterion gain without a monetary unit, use
naylor_shine().
naylor_shine(validity = .35, selection_ratio = .20)
#> <psu_ns>
#> validity: 0.35
#> selection_ratio: 0.2
#> selected_mean_z: 1.39981
#> expected_criterion_z: 0.489933
#> sdy: 1
#> n_selected: 1
#> tenure: 1
#> cost: 0
#> gross_utility: 0.489933
#> net_utility: 0.489933For monetary utility, bcg_utility() provides a
transparent baseline model.
bcg_utility(
validity = .35,
selection_ratio = .20,
sdy = 50000,
n_selected = 100,
tenure = 3,
cost = 75000
)
#> <psu_bcg>
#> validity: 0.35
#> selection_ratio: 0.2
#> baseline_validity: 0
#> baseline_selection_ratio: 0.2
#> selected_mean_z: 1.39981
#> baseline_selected_mean_z: 1.39981
#> focal_expected_criterion_z: 0.489933
#> baseline_expected_criterion_z: 0
#> incremental_criterion_z: 0.489933
#> sdy: 50000
#> n_selected: 100
#> tenure: 3
#> cost: 75000
#> gross_utility: 7349000
#> net_utility: 7274000If the analysis spans several periods or requires discounting, taxes,
contribution margins, or employee flows, use
boudreau_utility().
boudreau_utility(
validity = .35,
baseline_validity = .20,
selection_ratio = .20,
sdy = 50000,
n_by_period = c(100, 90, 80),
contribution_margin = .30,
tax_rate = .25,
discount_rate = .08,
cost_by_period = c(75000, 10000, 10000)
)
#> <psu_boudreau>
#> delta_z_y: 0.209971
#> sdy: 50000
#> variable_value: 0
#> contribution_margin: 0.3
#> effective_margin: 0.3
#> tax_rate: 0.25
#> discount_rate: 0.08
#> net_present_value: 465045Effect-size conversions
In contemporary selection research, particularly when classification
or algorithmic models are involved, predictive performance is sometimes
reported as the area under the ROC curve (AUC) rather than as a validity
coefficient. The package separates three conceptually distinct
conversions, following Hanley and McNeil (1982), Rice and Harris (2005),
and Salgado (2018). First, auc_to_rank_biserial() returns
the dominance summary
,
which follows from interpreting AUC as the probability of a favourable
ordering of one positive and one negative case (Hanley & McNeil,
1982; Kerby, 2014). Second, auc_to_d_equal_variance()
converts AUC to Cohen’s
under the equal-variance binormal model (Rice & Harris, 2005;
Salgado, 2018). Third, auc_to_point_biserial() converts
that
to a point-biserial correlation for a specified base rate, making the
base-rate dependence explicit.
auc_to_rank_biserial(.75)
#> [1] 0.5
auc_to_d_equal_variance(.75)
#> [1] 0.9538726
auc_to_point_biserial(.75, base_rate = c(.50, .30, .20, .10))
#> [1] 0.4304822 0.4005260 0.3564821 0.2751188These conversions should be used as bridges between reported classification performance and utility-analysis inputs, not as assumption-free substitutes for validation studies. If the selection criterion is binary and the available evidence is AUC, the reporting analyst should state which conversion was used and whether a base rate was assumed.
Reporting checklist
A complete utility analysis should report: (i) the selection rule, (ii) the criterion scale, (iii) the base rate when a classificatory model is used, (iv) the selection ratio, (v) all validity coefficients and their sources, (vi) how was estimated and whether it was triangulated, (vii) whether validity and were corrected for unreliability or range restriction and which corrections were applied, (viii) the baseline comparator (random or operating procedure), (ix) costs disaggregated by period and stage when relevant, (x) the time horizon and discount rate, (xi) uncertainty intervals through Monte Carlo or bootstrap, and (xii) sensitivity and break-even analyses for the most uncertain inputs. Items (viii) and (xi) are the two most frequently omitted in the empirical literature, and their omission is the principal source of the practitioner scepticism documented by Latham and Whyte (1994), Whyte and Latham (1997), and König, Bösch, Reshef, and Winkler (2013).
How to proceed in applied work
- Specify the selection rule before opening the package: single test, composite, simultaneous cutoffs, or staged process.
- Locate your problem in the
taxonomy and use
model_taxonomy()as a checklist. - Use
argument_glossary()to map your existing notation to the package argument names before writing code. - Specify the operating baseline; if it cannot be identified, report this limitation explicitly and treat random-selection results as upper bounds.
- Triangulate across at least two methods; report the range, not a single value.
- Propagate uncertainty using
utility_monte_carlo()orsensitivity_grid(); do not report a deterministic point estimate. - Use
break_even_validity()to identify the validity floor at which the new procedure breaks even, and compare it to the lower bound of the validity confidence interval.
References
Becker, B. E., & Huselid, M. A. (1992). Direct estimates of and the implications for utility analysis. Journal of Applied Psychology, 77, 227–233.
Bobko, P., Karren, R., & Parkington, J. J. (1983). Estimation of standard deviations in utility analyses: An empirical test. Journal of Applied Psychology, 68, 170–176.
Boudreau, J. W. (1983). Economic considerations in estimating the utility of human resource productivity improvement programs. Personnel Psychology, 36, 551–576.
Boudreau, J. W. (1991). Utility analysis for decisions in human resource management. In M. D. Dunnette & L. M. Hough (Eds.), Handbook of industrial and organizational psychology (Vol. 2, pp. 621–745). Consulting Psychologists Press.
Brogden, H. E. (1946). On the interpretation of the correlation coefficient as a measure of predictive efficiency. Journal of Educational Psychology, 37, 65–76.
Brogden, H. E. (1949). When testing pays off. Personnel Psychology, 2, 171–183.
Cronbach, L. J., & Gleser, G. C. (1965). Psychological tests and personnel decisions (2nd ed.). University of Illinois Press.
Cronshaw, S. F., Alexander, R. A., Wiesner, W. H., & Barrick, M. R. (1987). Incorporating risk into selection utility: Two models for sensitivity analysis and risk simulation. Organizational Behavior and Human Decision Processes, 40, 270–286.
Hakstian, A. R., Wooley, R. M., Woolsey, L. K., & Kryger, B. R. (1991). Management selection by multiple-domain assessment: II. Utility to the organisation. Educational and Psychological Measurement, 51, 899–911.
Hanley, J. A., & McNeil, B. J. (1982). The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology, 143(1), 29–36.
Holling, H. (1998). Utility analysis of personnel selection: An overview and empirical study based on objective performance measures. Methods of Psychological Research Online, 3(1), 5–24.
Kerby, D. S. (2014). The simple difference formula: An approach to teaching nonparametric correlation. Comprehensive Psychology, 3, 11.IT.3.1.
König, C. J., Bösch, F., Reshef, A., & Winkler, S. (2013). Human resource managers’ attitudes toward utility analysis. Journal of Personnel Psychology, 12, 152–156.
Latham, G. P., & Whyte, G. (1994). The futility of utility analysis. Personnel Psychology, 47, 31–46.
Naylor, J. C., & Shine, L. C. (1965). A table for determining the increase in mean criterion score obtained by using a selection device. Journal of Industrial Psychology, 3, 33–42.
Ock, J., & Oswald, F. L. (2018). The utility of personnel selection decisions: Comparing compensatory and multiple-hurdle selection models. Journal of Personnel Psychology, 17(4), 172–182.
Rice, M. E., & Harris, G. T. (2005). Comparing effect sizes in follow-up studies: ROC area, Cohen’s , and . Law and Human Behavior, 29(5), 615–620.
Salgado, J. F. (2018). Transforming the area under the normal curve (AUC) into Cohen’s , Pearson’s , odds-ratio, and natural log odds-ratio: Two conversion tables. The European Journal of Psychology Applied to Legal Context, 10(1), 35–47.
Sturman, M. C. (2000). Implications of utility analysis adjustments for estimates of human resource intervention value. Journal of Management, 26, 281–299.
Sturman, M. C. (2001). Utility analysis for multiple selection devices and multiple outcomes. Journal of Human Resource Costing and Accounting, 6(2), 9–28.
Taylor, H. C., & Russell, J. T. (1939). The relationship of validity coefficients to the practical effectiveness of tests in selection. Journal of Applied Psychology, 23, 565–578.
Thomas, J. G., Owen, D. B., & Gunst, R. F. (1977). Improving the use of educational tests as selection tools. Journal of Educational Statistics, 2(1), 55–77.
Whyte, G., & Latham, G. P. (1997). The futility of utility analysis revisited: When even an expert fails. Personnel Psychology, 50, 601–610.