install.packages("renv")
Introduction
The School Context Opportunity Score (SCOscore) is a measure of potentiality for learners in a school community. It is a composite score that combines multiple factors, particularly structural factors such as race and SES, as well as performance factors such as test scores and graduation pathway completion rates, to provide a more holistic view of school corporations.
The SCOscore was developed as method for identifying and prioritizing P-12 schools and school corporations for the Indiana University Indianapolis 2030 Strategic Plan. Specifically, SCOscore was developed for Goal 2 of the Service to Our State and Beyond Pillar: Engage State P-12 to Strengthen Education and Educational Pipelines in Indiana.
The SCOscore of a school corporation is calculated through the following:
\[ S_{O} = \frac{\left[\left(P_{urm} \times 1.5\right) + \left(P_{frl} \times 1.5\right) + S_{ac}\right]}{3} \]
where \(S_{O}\) is the SCOscore, \(P_{urm}\) is the proportion of underrepresented minority students, \(P_{frl}\) is the proportion of students eligible for free or reduced lunch, and \(S_{ac}\) is an academic achievement score for the school as a whole. The academic achievement score (\(S_{ac}\)) accounts for 3rd grade ELA proficiency, 6th grade math proficiency, and graduation pathways completion rates.
Requirements and Getting Started
The SCOscore script and algorithm have been developed in the R Statistical Computing Language (R Core Team 2023). As such, R and the IDE RStudio should be installed. Instructions for Installing R and RStudio can be found on the Installing R and RStudio page of the online book Hands-On Programming with R.
All necessary files are available as a GitHub repository, scoscore-iui2030. GitHub provides a tutorial on downloading repositories.
Once the repository is downloaded, the files are part of a renv
reproducible environment. The renv
package needs to be installed with the following command in the RStudio Console:
All libraries and packages necessary for running the R scripts can be installed with the following command in the RStudio Console:
::restore() renv
Once the command is given, all libraries and packages associated with this project will be installed in the local environment.
The script itself can be initiated either by:
- Changing the Constants (see below) to fit the need; and
- Clicking on the [->Render] button for
index.qmd
or [->Run] forprocess-scoscore.R
.
The output will be an Excel .xlsx
file in the output
folder named *-in-scoscore-schools.xlsx
where *
represents the focus year.
File Structure
The scoscore-iui2030
directory is structured in the following manner:
scoscore-iui2030/
┣ _quatro.yml <- Configuration file for site
┣ .gitignore <- Files and directories to be ignored by Git
┣ assets/ <- Directory for storing auxiliary files
┃ ┗ references.bib <- Bibliography for this project in BibTeX format
┃ ┗ service-logo.png <- IUI Strategic Plan Pillar logo
┃ ┗ style.css <- Cascading style sheet for the project
┣ CITATION.cff <- Machine-readable citation format
┣ CODE_OF_CONDUCT.md <- Code of Conduct for project contributors
┣ data/ <- Directory for all input data
┃ ┗ *-in-school-corp-locale.csv <- Urban centric locale data by year
┃ ┗ *-in-school-demographics.csv <- School demographic data by year
┃ ┗ *-in-school-gcr.csv <- School graduate completion rate data by year
┃ ┗ *-in-school-ilearn6.csv <- School ILEARN-6 math data by year
┃ ┗ *-in-school-iread3.csv <- School IREAD-3 data by year
┣ docs/ <- Directory for the rendered website
┣ index.qmd <- This file, the documentation in QMD format
┣ LICENSE <- License (MIT) for project
┣ output/ <- Directory for collecting output files
┃ ┗ *-in-scoscore-schools.xlsx <- Excel SCOscore output files by year
┣ R/ <- Directory for storing R scripts
┃ ┗ process-scoscore.R <- R script distilled from this file
┣ README.md <- General overview of the project
┣ renv/ <- renv directory
┣ renv.lock <- renv lockfile
┗ scoscore-iui2023.Rproj <- R Project file
Data Preparation
All “raw” data files are available through the IDOE Data Center & Reports page. These files do need some adjustments to standardize them and they need to be saved as .csv
, not .xlsx
files.
All of these files represent key metrics in the IDOE’s Graduates Prepared to Succeed reporting system.
IREAD-3 Test Scores
The IREAD-3 data can be found under the IREAD-3 Assessment Results heading. The IREAD-3 Corporation and School Results file for the appropriate year should be downloaded and opened with Microsoft Excel. For the purposes of this process, only the School
worksheet is required.
Any extraneous rows, such as titles, explainations, or notes, should be removed. Once the extra rows have been removed, the IREAD PASS N
and IREAD TEST N
columns should be removed.
The remaining columns should be renamed (all lowercase, with underscores _
in appropriate places) in the following manner:
corp_id | corp_name | school_id | school_name | iread_rate |
---|---|---|---|---|
number | text | number | text | percent |
Once that has been completed, the School
worksheet should be saved in the data
folder using the following naming convention: YYYY-in-school-iread3.csv
where YYYY
represents the year of the data.
ILEARN-6 Math Scores
The ILEARN-6 Math data can be found under the ILEARN Assessment Results heading. The ILEARN Grade 3-8 School Results file for the appropriate year should be downloaded and opened with Microsoft Excel. For the purposes of this process, only the Math
worksheet is required.
Any extraneous rows, such as titles, explainations, or notes, should be removed. Once the extra rows have been removed, only 6th grade data is required, in particular the 6th grade Math Proficient %
column. All other grades and indicators should be removed.
The remaining columns should be renamed (all lowercase, with underscores _
in appropriate places) in the following manner:
corp_id | corp_name | school_id | school_name | ilearn_rate |
---|---|---|---|---|
number | text | number | text | percent |
Once that has been completed, the Math
worksheet should be saved in the data
folder using the following naming convention: YYYY-in-school-ilearn6.csv
where YYYY
represents the year of the data.
High School Graduation Rates
The high school graduation pathways completion data can be found under the Graduation & College/Career Readiness heading. The State Graduation Rate Data file for the appropriate year should be downloaded and opened with Microsoft Excel. For the purposes of this process, only the School Pub Disagg
worksheet is required.
Any extraneous rows, such as titles, explainations, or notes, should be removed. Once the extra rows have been removed, only data under the Total
heading is required, in particular the Graduation Rate
column. All other indicators should be removed.
The remaining columns should be renamed (all lowercase, with underscores _
in appropriate places) in the following manner:
corp_id | corp_name | school_id | school_name | grad_rate |
---|---|---|---|---|
number | text | number | text | percent |
Once that has been completed, the School Pub Disagg
worksheet should be saved in the data
folder using the following naming convention: YYYY-in-school-gcr.csv
where YYYY
represents the year of the data.
Literate Code
Computer scientist Don Knuth (1984) first coined the term “literate programming” to describe a form of programming that is created as a human-readable narrative. It has been taken up as a format that is rich in comments and documentation to illustrate and illuminate the choices and decisions that were made in the act of programming. Literate code is also an essential aspect of research to promote reproducibility of analysis (Dekker 2018; Vassilev et al. 2016). In this case, as a community-engaged study, we are more interested in exemplifying trust, transparency, and accountability (Chou and Frazier 2020; Mullins et al. 2020; Sabatello et al. 2022), literate code provides a clear window for community members and participants into the inner processes of data analysis and visualization methodologies.
Load Libraries
Libraries are packages that are loaded in to extend the functionality to the base R programming language. The following libraries are utilized:
readr
(Wickham, Hester, and Bryan 2023) allows for efficient and straightforward reading and writing of local CSV files;here
(Müller 2020) provides a simple way to find files in folders;openxlsx
(Schauberger and Walker 2023) provides R with the functionality to write and read Microsoft Excel (.xlsx
) files;glue
(Hester and Bryan 2022),tidyr
(Wickham, Vaughan, and Girlich 2023),dplyr
(Wickham et al. 2023), andstringr
(Wickham 2022) are used to process and clean data;DT
provides the ability to display the web-based table at the end of this document;showtext
(Qiu and software 2023) allows Google Fonts to be loaded and displayed.curl
is included to ensure thatshowtext
can run within arenv
reproducible environment.
library(readr)
library(here)
library(openxlsx)
library(glue)
library(tidyr)
library(dplyr)
library(stringr)
library(curl)
library(showtext)
library(DT)
Set Constants
Constants are values that are used across the entire R script. The following constants are set for later use:
critical_threshold
is the SCOscore value at which schools become prioritized for partnership and attention, for IU Indianapolis it is set at 1.;focus_year
is the year for the generation of the SCOscores;round_threshold
is the SCOscore value at which SCOscores are rounded up to 1, for IU Indianapolis it is set at 0.970.state_3rd_proficiency
is the statewide mean for the focus year of 3rd graders showing proficiency on the IREAD-3 standardized exam;state_6th_proficiency
is the statewide mean for the focus year of 6th graders meeting their growth targets on the math ILEARN standardized exam;state_gpc
is the statewide mean for the focus year of 12th graders who complete graduation requirements.
<- 1.000
critical_threshold <- 2023
focus_year <- 0.970
round_threshold <- 0.816
state_3rd_proficiency <- 0.341
state_6th_proficiency <- 0.864 state_gpc
These metrics are determined by the state on an annual basis and can be obtained from the Indiana Department of Education’s Indiana Graduates Prepared to Succeed website.
They should be updated each year.
Define Functions
Functions provide “shortcodes” for repeating calculations or analyses multiple times with different variables, datasets, or networks.
The prune_frame
function removes any rows with missing values (NA
). This function also filters out private, charter, and other schools that are not required to submit data to the Indiana Department of Education, nor are they accountable to the communities they serve.
<- function(working_frame) {
prune_frame |>
working_frame # Remove NA's
na.omit() |>
# Keep only public schools
filter(corp_id %in% crosswalk_frame$corp_id)
}
The set_numeric_ids
function sets the School Corporation ID and School ID columns as numeric for consistency, so that all IDs in all dataframes are set consistently.
<- function(working_frame) {
set_numeric_ids |>
working_frame # Set the School Corporation ID column to numeric
mutate(
corp_id = as.numeric(corp_id)
|>
) # Set the School ID column to numeric
mutate(
school_id = as.numeric(school_id)
) }
Read and Process Data Files
The data files need to be read in to R and processed in a consistent manner so that School Context Opportunity Scores can be calculated successfully.
LEA ID-School Corporation ID Crosswalk
The Urban Centric Locale data is set by the Federal, rather than the Indiana, Department of Education. The Federal DOE utilizes a standardized LEA ID while the IDOE utilizes their own School Corporation ID.
This code block creates a crosswalk to bridge the DOE’s LEA ID and the IDOE’s School Corporation ID.
# ____________________________________________________________________________
# Create Federal-IDOE Crosswalk ####
## ............................................................................
## Load Locale File ####
## The Locale File is directly from a Federal DOE database and includes
## the Federal LEA ID codes, rather than the Indiana DOE School
## Corporation codes. lea_name is renamed to corp_name for consistency.
<- read_csv(
locale_frame here(
"data",
glue("{focus_year}-in-school-corp-locale.csv")
)|>
) rename(corp_name = lea_name)
## ............................................................................
## Load IREAD File ####
## The IREAD File contains, on the otherhand, the Indiana DOE School
## Corporation codes, but not the Federal DOE LEA codes.
<- read_csv(
iread_frame here(
"data",
glue("{focus_year}-in-school-iread3.csv")
)
)
## ............................................................................
## Create Crosswalk Frame ####
## The two dataframes are joined together by matching the School
## Corporation name so that the IDOE and USDOE codes match. This can be
## used to allow different sources of data sets to align with one another.
<- iread_frame |>
crosswalk_frame select(-iread_rate) |>
full_join(locale_frame, by = "corp_name") |>
na.omit() |>
select(corp_id, school_id, leaid)
Urban-Centric Locale
The National Center for Education Statistics provides a framework for recognizing the “locale” of a school corporation, from city to suburb to town to rural. The framework is the Urban-Centric Locale Categories, and the school corporation designations are included here for accountability purposes.
# ____________________________________________________________________________
# Process Locale Frame ####
# Processing on the locale frame is completed by adding the IDOE
# identifiers on to the frame and ensuring the School ID and School
# Corporation ID columns are numeric for consistency.
<- locale_frame |>
locale_frame full_join(crosswalk_frame, by = "leaid") |>
select(-leaid) |>
set_numeric_ids()
Demographics
Demographic indicators—specifically the percentage of students who are a part of underrepresented racial minority communities and the percentage of students who receive free and reduced lunch—are critical components of the School Context Opportunity Score. They provide a measure of the potentiality of students who are often held back because of existing structural barriers and inequalities.
# ____________________________________________________________________________
# Load and Process Demographics File ####
## ............................................................................
## Load Demographics File ####
## The Demographics File is loaded and any blank or "omitted" ("***")
## values are replaced with NA for easier processing.
<-
read_demographics_frame read_csv(here("data", glue("{focus_year}-in-school-demographics.csv")),
col_names = TRUE,
na = c("", "***")
|>
) ## The Crosswalk Frame is used to only retain the school corporations that
## are of interest (excluding private, charter, etc., schools) and that
## consistently report data and are fully accountable to the communities
## they serve.
filter(corp_id %in% crosswalk_frame$corp_id) |>
## Any NA is then replaced with 0 in the demographics columns.
mutate(across(
.cols = aina:nhpi,
.fns = ~ if_else(is.na(.x) == TRUE, 0, .x)
))
## ............................................................................
## Process Demographics Frame ####
## An "underrepresented racial minority" column is created by calculating
## the proportion of non-white students in the school. Similarly, a Free
## and Reduced Lunch proportion is also calculated.
<- read_demographics_frame |>
process_demographics_frame mutate(
urm_percent =
+
(aina +
asian +
black +
latine +
multiracial /
nhpi)
enrollment|>
) mutate(
frl_percent = frl / enrollment
)
## ............................................................................
## Finalize Demographics Frame ####
## Keep only the necessary columns in the dataframe and convert all ID
## columns to numeric for consistency.
<- process_demographics_frame |>
demographics_frame select(
corp_id,
corp_name,
school_id,
school_name,
urm_percent,
frl_percent|>
) prune_frame() |>
set_numeric_ids()
## ............................................................................
## Calculate Demographic Means ####
## Means for the demographic categories-underrepresented racial minorities and
## free and reduced lunch-are calculated for future use in the counterfactual
## analysis.
<- round(mean(demographics_frame$urm_percent), 3)
urm_percent_mean <- round(mean(demographics_frame$frl_percent), 3) frl_percent_mean
3rd Grade IREAD
# ____________________________________________________________________________
# Load and Process IREAD File ####
## ............................................................................
## Load IREAD File ####
##
<- read_csv(
read_iread_frame here(
"data",
glue("{focus_year}-in-school-iread3.csv")
),na = c("", "***")
|>
) prune_frame() |>
set_numeric_ids()
## ............................................................................
## Calculate IREAD Rates ####
<- read_iread_frame |>
iread_frame mutate(iread_rate = as.numeric(str_sub(iread_rate, end = -2)) / 100)
## ............................................................................
## Calculate IREAD Mean ####
<- round(mean(iread_frame$iread_rate), 3) iread_mean
6th Grade ILEARN
# ____________________________________________________________________________
# Load and Process ILEARN File ####
## ............................................................................
## Load ILEARN File ####
<- read_csv(
read_ilearn_frame here(
"data",
glue("{focus_year}-in-school-ilearn6.csv")
),na = c("", "***")
|>
) prune_frame() |>
set_numeric_ids()
## ............................................................................
## Calculate ILEARN Rates ####
<- read_ilearn_frame |>
ilearn_frame mutate(ilearn_rate = as.numeric(str_sub(ilearn_rate, end = -2)) / 100)
## ............................................................................
## Calculate ILEARN Mean ####
<- round(mean(ilearn_frame$ilearn_rate), 3) ilearn_mean
Graduation Pathways Completion
# ____________________________________________________________________________
# Load and Process Graduation Pathways Completion File ####
## ............................................................................
## Load Graduation Pathways Completion File ####
<- read_csv(
read_grad_frame here(
"data",
glue("{focus_year}-in-school-gcr.csv")
),na = c("", "***")
|>
) prune_frame() |>
set_numeric_ids()
## ............................................................................
## Process Graduation Completion Rates File ####
<- read_grad_frame |>
grad_frame mutate(grad_rate = as.numeric(str_sub(grad_rate, end = -2)) / 100)
## ............................................................................
## Calculate Graduate Pathways Completion Rate Mean ####
<- round(mean(grad_frame$grad_rate), 3) grad_mean
Calculate SCOscore
<- locale_frame |>
scoscore_demographics full_join(
demographics_frame,by = c("corp_id", "school_id"),
suffix=c("",".y")) |>
select(-ends_with(".y"))
<- scoscore_demographics |>
scoscore_iread full_join(
iread_frame,by = c("corp_id", "school_id"),
suffix=c("",".y")) |>
select(-ends_with(".y"))
<- scoscore_iread |>
scoscore_ilearn full_join(
ilearn_frame,by = c("corp_id", "school_id"),
suffix=c("",".y")) |>
mutate(corp_name = if_else(
is.na(corp_name),
corp_name.y,
corp_name|>
)) mutate(school_name = if_else(
is.na(school_name),
school_name.y,
school_name|>
)) select(-ends_with(".y"))
<- scoscore_ilearn |>
scoscore_grad full_join(
grad_frame,by = c("corp_id", "school_id"),
suffix=c("",".y")) |>
mutate(corp_name = if_else(
is.na(corp_name),
corp_name.y,
corp_name|>
)) mutate(school_name = if_else(
is.na(school_name),
school_name.y,
school_name|>
)) select(-ends_with(".y"))
<- scoscore_grad |>
scoscore_frame filter(!is.na(corp_name)) |>
filter(!is.na(urm_percent)) |>
filter(!is.na(frl_percent)) |>
mutate(iread_rate_adj = as.numeric(if_else(
< state_3rd_proficiency,
iread_rate - iread_rate + 1),
(state_3rd_proficiency 0
|>
))) mutate(ilearn_rate_adj = as.numeric(if_else(
< state_6th_proficiency,
ilearn_rate - ilearn_rate + 1),
(state_6th_proficiency 0
|>
))) mutate(grad_rate_adj = as.numeric(if_else(
< state_gpc,
grad_rate - grad_rate + 1),
(state_gpc 0
|>
))) rowwise() |>
mutate(
adj_academic = round(
mean(
c(iread_rate_adj, ilearn_rate_adj, grad_rate_adj),
na.rm = TRUE),
3)) |>
ungroup() |>
select(
-contains(c(
"iread",
"ilearn",
"grad"))) |>
mutate(
scoScore = (
* 1.5) + (frl_percent * 1.5) + (adj_academic)) / 3
((urm_percent
)|>
) mutate(rounded_up = if_else(
< critical_threshold & scoScore >= round_threshold,
scoScore TRUE,
FALSE
|>
)) mutate(scoScore = if_else(
< critical_threshold & scoScore >= round_threshold,
scoScore 1,
scoScore|>
)) mutate(counter_factual = FALSE)
Conduct a counterfactual analysis to ensure fairness for school corporations. If a school corporation has an adjusted academic score above mean academic score + 1 standard deviation then adjust the non-white and free and reduced lunch percents to the state average if these are lower than the state mean. This gives some school corporations a little “bump” so that it cannot be said that any group is given an “unfair” advantage. If the newly calculated SCOscore is then >=1, then keep that counterfactual score.
<- round(
adj_ac_mean mean(scoscore_frame$adj_academic),
3)
<- round(
adj_ac_sd sd(scoscore_frame$adj_academic),
3)
<- adj_ac_mean + adj_ac_sd
adj_ac_cutoff
<- scoscore_frame |>
whatif_frame filter(adj_academic >= adj_ac_cutoff) |>
filter(scoScore < 1) |>
select(-scoScore) |>
mutate(urm_percent = ifelse(
< urm_percent_mean,
urm_percent
urm_percent_mean,
urm_percent|>
)) mutate(frl_percent = ifelse(
< frl_percent_mean,
frl_percent
frl_percent_mean,
frl_percent|>
)) mutate(
scoScore = (
* 1.5) + (frl_percent * 1.5) + (adj_academic)) / 3
((urm_percent
)|>
) mutate(
scoScore = round(scoScore, 3)
|>
) mutate(rounded_up = if_else(
< critical_threshold & scoScore >= round_threshold,
scoScore TRUE,
FALSE
|>
)) mutate(scoScore = if_else(
< 1 & scoScore >= 0.970,
scoScore 1,
scoScore|>
)) filter(
>= 1
scoScore |>
) mutate(
counter_factual = TRUE
|>
) select(
corp_id,
school_id,
scoScore,
rounded_up,
counter_factual
)
<- scoscore_frame |>
scoscore_frame left_join(
whatif_frame,by = c("corp_id", "school_id")
|>
) mutate(counter_factual = if_else(!is.na(counter_factual.y),
counter_factual.y,|>
counter_factual.x)) mutate(rounded_up = if_else(!is.na(rounded_up.y),
rounded_up.y,|>
rounded_up.x)) mutate(scoScore = if_else(!is.na(scoScore.y),
scoScore.y,|>
scoScore.x)) select(-(ends_with(c(".x", ".y"))))
<- scoscore_frame |>
scoscore_frame select(
corp_id,
school_id,
corp_name,
school_name,
urban_centric_locale,
urm_percent,
frl_percent,
adj_academic,
scoScore,
rounded_up,
counter_factual|>
) arrange(
corp_id,
school_id
)
write.xlsx(
scoscore_frame,here(
"output",
glue("{focus_year}-in-scoscore-schools.xlsx")
),colNames = TRUE
)
The End Result
Land Acknowledgement
Indiana University Indianapolis acknowledges our location on the traditional and ancestral territory of the Miami, Potawatomi and Shawnee people. We honor the heritage of Native peoples, what they teach us about the stewardship of the earth and their continuing efforts today to protect the planet.
Founded in 1969, Indiana University Indianapolis stands on the historic homelands of Native peoples and, more recently, that of a vibrant Black community, also displaced. As the present stewards of the land, we honor them all as we live, work and study at Indiana University Indianapolis.
Session Information
Session information is provided for reproducibility purposes.
::session_info(pkgs = "attached") sessioninfo
─ Session info ───────────────────────────────────────────────────────────────
setting value
version R version 4.3.1 (2023-06-16)
os macOS Sonoma 14.0
system aarch64, darwin20
ui X11
language (EN)
collate en_US.UTF-8
ctype en_US.UTF-8
tz America/Indiana/Indianapolis
date 2023-10-27
pandoc 3.1.1 @ /Applications/RStudio.app/Contents/Resources/app/quarto/bin/tools/ (via rmarkdown)
─ Packages ───────────────────────────────────────────────────────────────────
! package * version date (UTC) lib source
P curl * 5.1.0 2023-10-02 [?] RSPM (R 4.3.0)
P dplyr * 1.1.3 2023-09-03 [?] RSPM (R 4.3.0)
P DT * 0.30 2023-10-05 [?] RSPM (R 4.3.0)
P glue * 1.6.2 2022-02-24 [?] CRAN (R 4.3.0)
P here * 1.0.1 2020-12-13 [?] RSPM (R 4.3.0)
P openxlsx * 4.2.5.2 2023-02-06 [?] RSPM (R 4.3.0)
P readr * 2.1.4 2023-02-10 [?] RSPM (R 4.3.0)
P showtext * 0.9-6 2023-05-03 [?] RSPM (R 4.3.0)
P showtextdb * 3.0 2020-06-04 [?] RSPM (R 4.3.0)
P stringr * 1.5.0 2022-12-02 [?] CRAN (R 4.3.0)
P sysfonts * 0.8.8 2022-03-13 [?] RSPM (R 4.3.0)
P tidyr * 1.3.0 2023-01-24 [?] RSPM (R 4.3.0)
[1] /Users/jfprice/github/scoscore-iui2030/renv/library/R-4.3/aarch64-apple-darwin20
[2] /Users/jfprice/Library/Caches/org.R-project.R/R/renv/sandbox/R-4.3/aarch64-apple-darwin20/ac5c2659
P ── Loaded and on-disk path mismatch.
──────────────────────────────────────────────────────────────────────────────
References
Reuse
Citation
@online{price2023,
author = {Price, Jeremy},
title = {School {Context} {Opportuntity} {Score}},
date = {2023-10-27},
url = {https://jeremyfprice.github.io/scoscore-iui2030/},
doi = {10.5281/zenodo.10047883},
langid = {en}
}