Utility-analysis taxonomy for personnel selection

Purpose

The package is organised around a diagnostic question: what kind of selection problem are you analysing? Many disagreements in the utility-analysis literature arise because a method designed for one type of problem is applied to another. Taylor and Russell (1939) framed utility as a classificatory success-ratio problem. Naylor and Shine (1965) moved the focus to expected criterion gain in standard-deviation units. Brogden (1946, 1949) and Cronbach and Gleser (1965) developed the decision-theoretic monetary formulation. Boudreau (1983, 1991) added economic realism through discounting, taxes, and employee flows. Holling (1998), Sturman (2000, 2001), Thomas, Owen, and Gunst (1977), and Ock and Oswald (2018) each contributed substantive corrections or extensions. The taxonomy used here is intended to make the model-problem match explicit before any number is reported.

library(personnelSelectionUtility)

The two-by-two taxonomy

The taxonomy crosses two dimensions:

Criterion scale: is the criterion treated as continuous/monetary, or dichotomised into success/failure?
Selection structure: is selection compensatory, based on a composite score, or conjunctive/multiple-hurdle, based on passing multiple cutoffs or stages?

model_taxonomy()
#>               criterion_scale
#> 1 classification/dichotomized
#> 2 classification/dichotomized
#> 3 classification/dichotomized
#> 4         continuous/monetary
#> 5         continuous/monetary
#> 6         continuous/monetary
#> 7         continuous/monetary
#> 8             multi-attribute
#>                                             selection_structure
#> 1                              compensatory or single predictor
#> 2   conjunctive multiple-hurdle with specified marginal cutoffs
#> 3 conjunctive multiple-hurdle with target joint selection ratio
#> 4                                         compensatory top-down
#> 5                               incremental compensatory system
#> 6                         sequential/multiple-hurdle simulation
#> 7                          diagnostics and SDy input estimation
#> 8                                             multiple criteria
#>                                        model_family
#> 1                             Taylor-Russell (1939)
#> 2                          Thomas-Owen-Gunst (1977)
#> 3    Thomas-Owen-Gunst tables / equal-cutoff design
#> 4               Naylor-Shine / BCG / SHP / Boudreau
#> 5       Sturman-style restricted canonical validity
#> 6                       Ock-Oswald-style comparison
#> 7 Holling-style empirical checks and SDy estimation
#> 8                    MAUA / Pareto decision support
#>                                                                                                               primary_functions
#> 1                                                                                                      tr_classic(), tr_solve()
#> 2                                                                                                             tr_multivariate()
#> 3                                                             tr_multivariate_equal_cutoff(), tr_binomial_success_probability()
#> 4                                                              naylor_shine(), bcg_utility(), shp_utility(), boudreau_utility()
#> 5                                                                       restricted_canonical_validity(), incremental_validity()
#> 6 compensatory_selection(), multiple_hurdle_selection(), multiple_hurdle_selection_staged(), compare_selection_systems_staged()
#> 7                                         sdy_observed(), sdy_cost_accounting(), sdy_crepid(), utility_regression_diagnostics()
#> 8                                                      multiattribute_utility(), pareto_frontier(), utility_fairness_frontier()

This yields four practical cells.

Criterion scale	Compensatory selection	Multiple-hurdle selection
Dichotomised / classificatory	Taylor-Russell on one predictor or a composite	Thomas-Owen-Gunst multivariate Taylor-Russell
Continuous / monetary	Naylor-Shine, Brogden-Cronbach-Gleser, Boudreau, Sturman incremental validity	Stage-wise utility or simulation-based comparisons

The four cells are not interchangeable. A selection system can have the same predictors but a different appropriate utility model depending on the decision rule. If cognitive ability, conscientiousness, biodata, and interview ratings are added into one composite and applicants are selected top-down, the system is compensatory. If applicants must first pass a cheap screening composite and only then a more expensive interview stage, the system is staged multiple-hurdle. The distinction is not cosmetic. Ock and Oswald (2018) emphasise the cost-reliability trade-off: a compensatory composite typically uses more information and yields higher expected criterion performance, whereas a multiple-hurdle system can be cheaper because expensive stages are administered only to applicants who survive earlier screens.

Why the choice of cell matters

A simple thought experiment makes the point. Suppose two predictors $X_1$ and $X_2$ correlate $.30$ with each other and have validities of $.40$ and $.30$ against a job-performance criterion. If selection is compensatory and applicants are ranked on the equally weighted sum, the implied composite validity follows the canonical correction for predictor intercorrelation. If selection is conjunctive on the same two predictors with marginal cutoffs at the top $50\%$ each, the operative quantity is the joint selection ratio, which is materially smaller than $.50$ and has a different positive predictive value. Reporting one number when the other is operationally relevant misrepresents the system. The package therefore encourages the analyst to specify the cell first and choose the function second.

Argument naming and conventions

The package uses readable R argument names while preserving close correspondence with the notation used in the literature.

head(argument_glossary(), 12)
#>                 argument               notation
#> 1              base_rate              BR or phi
#> 2        selection_ratio                     SR
#> 3       selection_ratios                   SR_i
#> 4  joint_selection_ratio                SR_conj
#> 5               validity                   r_xy
#> 6             validities                 r_xi,y
#> 7          predictor_cor                   R_XX
#> 8                      R R = cor(X_1,...,X_k,Y)
#> 9                    sdy                   SD_y
#> 10            n_selected                    N_s
#> 11             n_treated              N_treated
#> 12          n_applicants                      N
#>                                                                                meaning
#> 1              Applicant-population probability of criterion success before selection.
#> 2                         Overall proportion selected by a single cutoff or composite.
#> 3              Vector of marginal selection ratios for multiple predictors or hurdles.
#> 4                     Overall conjunctive probability of passing all multiple cutoffs.
#> 5                                      Focal predictor-criterion validity coefficient.
#> 6                                          Vector of predictor-criterion correlations.
#> 7                                              Predictor intercorrelation matrix only.
#> 8  Full predictor-plus-criterion correlation matrix; predictors first, criterion last.
#> 9                Standard deviation of job performance in monetary or criterion units.
#> 10                                            Number of selected applicants/employees.
#> 11                                      Number of employees receiving an intervention.
#> 12                                Number of applicants assessed by a selection system.
#>                                                                              note
#> 1                                 Use base_rate rather than BR in the public API.
#> 2                                      Use singular form for one decision cutoff.
#> 3                    Use plural form when there is one SR per predictor or stage.
#> 4  Distinct from marginal selection_ratios; central for Thomas-Owen-Gunst tables.
#> 5        More readable than rxy, but documentation always gives the r_xy mapping.
#> 6                                Used when multiple predictors enter a composite.
#> 7                                  Do not include the criterion in predictor_cor.
#> 8                           Required by tr_multivariate() and simulation helpers.
#> 9   The package uses sdy because SDy is the conventional label in the literature.
#> 10                               Avoid N when the role of the count is ambiguous.
#> 11                          Used in Schmidt-Hunter-Pearlman intervention utility.
#> 12           Preferred name in v0.4.0; applicant_n is accepted as a legacy alias.

The most frequent names are:

base_rate: population proportion successful before selection; classical Taylor-Russell notation often uses $BR$ or $\phi$ .

selection_ratio: proportion selected by a single cutoff or composite; classical notation uses $SR$ .

selection_ratios: vector of marginal selection ratios in multiple-predictor or multiple-stage models.

joint_selection_ratio: overall conjunctive selection ratio after all cutoffs.

validity: predictor-criterion validity coefficient, usually $r_{xy}$ .

validities: vector of predictor-criterion correlations.

sdy: standard deviation of job performance in monetary or criterion units, usually $SD_y$ .

baseline_validity: validity of the operating system that the focal procedure is being compared against (Sturman, 2000, 2001).

n_applicants, n_selected: sample sizes of assessed and selected groups.

tenure: time horizon, usually in years.

Recommended workflow

Step 1: Specify the selection decision

Start by writing down the rule used in practice. Is selection based on a single test, a composite, several simultaneous cutoffs, or a staged process? Do not choose a utility formula before specifying the rule. A common error is to apply Brogden-Cronbach-Gleser to a problem that is operationally a multiple-hurdle decision, which masks both the joint selection ratio and the cost differential between stages.

Step 2: Specify the criterion scale

If the criterion is success/failure, use Taylor-Russell-style functions: tr_classic(), tr_multivariate(), or tr_multivariate_equal_cutoff(). If the criterion is continuous or monetary, use naylor_shine(), bcg_utility(), or boudreau_utility(). The choice should be driven by how the organisation actually evaluates job performance, not by computational convenience.

Step 3: Specify the baseline

Following Sturman (2000, 2001), the realistic comparator is rarely random selection. Almost every organisation already operates with some procedure: reference checks, unstructured interviews, biodata. Treating random selection as the implicit baseline inflates the estimated utility of a new procedure by approximately $60\%$ on average (Sturman, 2000, 2001). Use baseline_validity in bcg_utility() and boudreau_utility() whenever the operating procedure is identifiable.

Step 4: Estimate or triangulate $SD_y$

Holling (1998) shows that $SD_y$ is the central vulnerability of monetary utility analysis. The package implements the four families documented by Holling: cost accounting (sdy_cost_accounting()), global percentile judgements (sdy_percentile()), proportional rules (sdy_proportional(), sdy_rbn()), and individualised job-analysis methods (sdy_crepid(), sdy_superior_equivalents()). Triangulating two or three of these, rather than relying on a single estimate, is the practice supported by the empirical comparisons reported in Bobko, Karren, and Parkington (1983), Becker and Huselid (1992), and Hakstian, Wooley, Woolsey, and Kryger (1991).

Step 5: Report uncertainty and sensitivity

Ock and Oswald (2018) demonstrate via Monte Carlo simulation that utility estimates exhibit sample-to-sample variability of the same order of magnitude as the mean effect. A point estimate without an interval is therefore not a complete report. The package includes utility_monte_carlo() for full uncertainty propagation, sensitivity_grid() for exploring how the estimate varies under perturbations of the inputs, and break_even_validity() for computing the validity required to break even at given costs. Cronshaw, Alexander, Wiesner, and Barrick (1987) introduced sensitivity and break-even analysis to selection utility precisely because point estimates of $\Delta U$ tend to be reported with implausible precision.

Minimal examples by model family

Dichotomous success criterion

For a single predictor or a composite that has already collapsed several predictors into one score, use tr_classic().

tr_classic(base_rate = .50, selection_ratio = .20, validity = .35)
#> <psu_tr>
#>   base_rate: 0.5
#>   selection_ratio: 0.2
#>   validity: 0.35
#>   predictor_cutoff_z: 0.841621
#>   criterion_cutoff_z: 0
#>   true_positive: 0.13931
#>   false_positive: 0.0606895
#>   false_negative: 0.36069
#>   true_negative: 0.43931
#>   ppv: 0.696552
#>   success_ratio: 0.696552
#>   incremental_success: 0.196552
#>   sensitivity: 0.278621
#>   specificity: 0.878621
#>   digits: 3

The output includes the full $2 \times 2$ table (true positives, false positives, true negatives, false negatives), the positive predictive value, sensitivity, and specificity. For multiple simultaneous cutoffs, use tr_multivariate() with the predictor-criterion correlation matrix.

R <- matrix(c(
  1.00, .30, .40,
  .30, 1.00, .35,
  .40, .35, 1.00
), nrow = 3, byrow = TRUE)
tr_multivariate(selection_ratios = c(.50, .50), base_rate = .50, R = R)
#> <psu_tr>
#>   base_rate: 0.5
#>   joint_selection_ratio: 0.298451
#>   criterion_cutoff_z: 0
#>   true_positive: 0.210393
#>   false_positive: 0.0880588
#>   false_negative: 0.289607
#>   true_negative: 0.411941
#>   ppv: 0.704948
#>   success_ratio: 0.704948
#>   incremental_success: 0.204948
#>   sensitivity: 0.420785
#>   specificity: 0.823882
#>   digits: 3

Continuous criterion

For expected standardised criterion gain without a monetary unit, use naylor_shine().

naylor_shine(validity = .35, selection_ratio = .20)
#> <psu_ns>
#>   validity: 0.35
#>   selection_ratio: 0.2
#>   selected_mean_z: 1.39981
#>   expected_criterion_z: 0.489933
#>   sdy: 1
#>   n_selected: 1
#>   tenure: 1
#>   cost: 0
#>   gross_utility: 0.489933
#>   net_utility: 0.489933

For monetary utility, bcg_utility() provides a transparent baseline model.

bcg_utility(
  validity = .35,
  selection_ratio = .20,
  sdy = 50000,
  n_selected = 100,
  tenure = 3,
  cost = 75000
)
#> <psu_bcg>
#>   validity: 0.35
#>   selection_ratio: 0.2
#>   baseline_validity: 0
#>   baseline_selection_ratio: 0.2
#>   selected_mean_z: 1.39981
#>   baseline_selected_mean_z: 1.39981
#>   focal_expected_criterion_z: 0.489933
#>   baseline_expected_criterion_z: 0
#>   incremental_criterion_z: 0.489933
#>   sdy: 50000
#>   n_selected: 100
#>   tenure: 3
#>   cost: 75000
#>   gross_utility: 7349000
#>   net_utility: 7274000

If the analysis spans several periods or requires discounting, taxes, contribution margins, or employee flows, use boudreau_utility().

boudreau_utility(
  validity = .35,
  baseline_validity = .20,
  selection_ratio = .20,
  sdy = 50000,
  n_by_period = c(100, 90, 80),
  contribution_margin = .30,
  tax_rate = .25,
  discount_rate = .08,
  cost_by_period = c(75000, 10000, 10000)
)
#> <psu_boudreau>
#>   delta_z_y: 0.209971
#>   sdy: 50000
#>   variable_value: 0
#>   contribution_margin: 0.3
#>   effective_margin: 0.3
#>   tax_rate: 0.25
#>   discount_rate: 0.08
#>   net_present_value: 465045

Effect-size conversions

In contemporary selection research, particularly when classification or algorithmic models are involved, predictive performance is sometimes reported as the area under the ROC curve (AUC) rather than as a validity coefficient. The package separates three conceptually distinct conversions, following Hanley and McNeil (1982), Rice and Harris (2005), and Salgado (2018). First, auc_to_rank_biserial() returns the dominance summary $2 \cdot AUC - 1$ , which follows from interpreting AUC as the probability of a favourable ordering of one positive and one negative case (Hanley & McNeil, 1982; Kerby, 2014). Second, auc_to_d_equal_variance() converts AUC to Cohen’s $d$ under the equal-variance binormal model (Rice & Harris, 2005; Salgado, 2018). Third, auc_to_point_biserial() converts that $d$ to a point-biserial correlation for a specified base rate, making the base-rate dependence explicit.

auc_to_rank_biserial(.75)
#> [1] 0.5
auc_to_d_equal_variance(.75)
#> [1] 0.9538726
auc_to_point_biserial(.75, base_rate = c(.50, .30, .20, .10))
#> [1] 0.4304822 0.4005260 0.3564821 0.2751188

These conversions should be used as bridges between reported classification performance and utility-analysis inputs, not as assumption-free substitutes for validation studies. If the selection criterion is binary and the available evidence is AUC, the reporting analyst should state which conversion was used and whether a base rate was assumed.

Reporting checklist

A complete utility analysis should report: (i) the selection rule, (ii) the criterion scale, (iii) the base rate when a classificatory model is used, (iv) the selection ratio, (v) all validity coefficients and their sources, (vi) how $SD_y$ was estimated and whether it was triangulated, (vii) whether validity and $SD_y$ were corrected for unreliability or range restriction and which corrections were applied, (viii) the baseline comparator (random or operating procedure), (ix) costs disaggregated by period and stage when relevant, (x) the time horizon and discount rate, (xi) uncertainty intervals through Monte Carlo or bootstrap, and (xii) sensitivity and break-even analyses for the most uncertain inputs. Items (viii) and (xi) are the two most frequently omitted in the empirical literature, and their omission is the principal source of the practitioner scepticism documented by Latham and Whyte (1994), Whyte and Latham (1997), and König, Bösch, Reshef, and Winkler (2013).

How to proceed in applied work

Specify the selection rule before opening the package: single test, composite, simultaneous cutoffs, or staged process.
Locate your problem in the $2 \times 2$ taxonomy and use model_taxonomy() as a checklist.
Use argument_glossary() to map your existing notation to the package argument names before writing code.
Specify the operating baseline; if it cannot be identified, report this limitation explicitly and treat random-selection results as upper bounds.
Triangulate $SD_y$ across at least two methods; report the range, not a single value.
Propagate uncertainty using utility_monte_carlo() or sensitivity_grid(); do not report a deterministic point estimate.
Use break_even_validity() to identify the validity floor at which the new procedure breaks even, and compare it to the lower bound of the validity confidence interval.

References

Becker, B. E., & Huselid, M. A. (1992). Direct estimates of $SD_y$ and the implications for utility analysis. Journal of Applied Psychology, 77, 227–233.

Bobko, P., Karren, R., & Parkington, J. J. (1983). Estimation of standard deviations in utility analyses: An empirical test. Journal of Applied Psychology, 68, 170–176.

Boudreau, J. W. (1983). Economic considerations in estimating the utility of human resource productivity improvement programs. Personnel Psychology, 36, 551–576.

Boudreau, J. W. (1991). Utility analysis for decisions in human resource management. In M. D. Dunnette & L. M. Hough (Eds.), Handbook of industrial and organizational psychology (Vol. 2, pp. 621–745). Consulting Psychologists Press.

Brogden, H. E. (1946). On the interpretation of the correlation coefficient as a measure of predictive efficiency. Journal of Educational Psychology, 37, 65–76.

Brogden, H. E. (1949). When testing pays off. Personnel Psychology, 2, 171–183.

Cronbach, L. J., & Gleser, G. C. (1965). Psychological tests and personnel decisions (2nd ed.). University of Illinois Press.

Cronshaw, S. F., Alexander, R. A., Wiesner, W. H., & Barrick, M. R. (1987). Incorporating risk into selection utility: Two models for sensitivity analysis and risk simulation. Organizational Behavior and Human Decision Processes, 40, 270–286.

Hakstian, A. R., Wooley, R. M., Woolsey, L. K., & Kryger, B. R. (1991). Management selection by multiple-domain assessment: II. Utility to the organisation. Educational and Psychological Measurement, 51, 899–911.

Hanley, J. A., & McNeil, B. J. (1982). The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology, 143(1), 29–36.

Holling, H. (1998). Utility analysis of personnel selection: An overview and empirical study based on objective performance measures. Methods of Psychological Research Online, 3(1), 5–24.

Kerby, D. S. (2014). The simple difference formula: An approach to teaching nonparametric correlation. Comprehensive Psychology, 3, 11.IT.3.1.

König, C. J., Bösch, F., Reshef, A., & Winkler, S. (2013). Human resource managers’ attitudes toward utility analysis. Journal of Personnel Psychology, 12, 152–156.

Latham, G. P., & Whyte, G. (1994). The futility of utility analysis. Personnel Psychology, 47, 31–46.

Naylor, J. C., & Shine, L. C. (1965). A table for determining the increase in mean criterion score obtained by using a selection device. Journal of Industrial Psychology, 3, 33–42.

Ock, J., & Oswald, F. L. (2018). The utility of personnel selection decisions: Comparing compensatory and multiple-hurdle selection models. Journal of Personnel Psychology, 17(4), 172–182.

Rice, M. E., & Harris, G. T. (2005). Comparing effect sizes in follow-up studies: ROC area, Cohen’s $d$ , and $r$ . Law and Human Behavior, 29(5), 615–620.

Salgado, J. F. (2018). Transforming the area under the normal curve (AUC) into Cohen’s $d$ , Pearson’s $r_{pb}$ , odds-ratio, and natural log odds-ratio: Two conversion tables. The European Journal of Psychology Applied to Legal Context, 10(1), 35–47.

Sturman, M. C. (2000). Implications of utility analysis adjustments for estimates of human resource intervention value. Journal of Management, 26, 281–299.

Sturman, M. C. (2001). Utility analysis for multiple selection devices and multiple outcomes. Journal of Human Resource Costing and Accounting, 6(2), 9–28.

Taylor, H. C., & Russell, J. T. (1939). The relationship of validity coefficients to the practical effectiveness of tests in selection. Journal of Applied Psychology, 23, 565–578.

Thomas, J. G., Owen, D. B., & Gunst, R. F. (1977). Improving the use of educational tests as selection tools. Journal of Educational Statistics, 2(1), 55–77.

Whyte, G., & Latham, G. P. (1997). The futility of utility analysis revisited: When even an expert fails. Personnel Psychology, 50, 601–610.