Study design and population
We conducted a retrospective observational study to estimate the association between repeated pollutant exposure and 5-km race times amongst NCAA collegiate track & field male athletes. To acquire race observations, we used the next procedure: (1) identified universities with top-tier running programs by identifying universities with not less than one member who competed within the NCAA Division-1 5-km final race in the course of the years 2010–2014; (2) chosen athletes from these universities who competed in identified NCAA sanctioned outdoor track & field 5-km race between March and June in the course of the 2010–2014 NCAA track & field season. The study population was restricted to male subjects as a consequence of the time-intensive nature of the manual data collection. Our evaluation was exempt from institutional review as all data was publicly available, and our research activity didn’t involve any interaction with individuals.
Race observations
Race results and athlete information was obtained from the Track & Field Results Reporting System (TFRRS) database, maintained by the Direct Athletics Incorporation25. Details on the database and technique of obtaining data are within the complement (Complement section A).
Pollutant exposures
Exposure profiles were developed for the 21 days: 20 training days prior to satisfy and day of the meet. This exposure profile accounts for the potential cumulative impact of pollutant exposure on cardio-respiratory system during training. Every day air pollution concentrations were assigned to census tracts of every athlete’s home university for 20 days prior to satisfy and meet location on meet date using the EPA downscaler model26, 27. The 21-day period accounts for dates spent training at the house university and competing at races away from the house university. Further details are provided within the supplemental sections B and C.
To calculate the AQI, the EPA designates pollutant-specific concentration breakpoints and provides a linear piecewise function18. The AQI breakpoints signify the extent of health concern: good, moderate, unhealthy for sensitive groups, unhealthy, very unhealthy, and unsafe. When multiple pollutants are measured, the reported AQI is the best index value amongst all pollutants. While AQI is traditionally reported based on five pollutants (carbon monoxide, ozone, lead, PM2.5, and sulfur dioxide), we calculated the two-pollutant threshold AQI using PM2.5 and ozone, which drove 91% of observed AQI values in our study, further analyzed within the complement section D. Moreover, we hypothesized that the impact of air pollution exposure on performance is just not independent between pollutants, so we evaluated the mix of two pollutants as additive slightly than substitutionary, defined because the summed two-pollutant AQI value (addition of PM2.5-specific and ozone-specific AQI values). We compared the two-pollutant AQI values with the two-pollutant summed AQI values using Kendall’s tau, a measure of correspondence between two measurement approaches.
Confounders
The meteorological conditions in the course of the meet were measured by matching track race location data to the corresponding grid and time for North American Land Data Assimilation Systems (NLDAS) project-228, 29. The NLDAS data is a 0.125°(~ 13 × 13 km grid cells) gridded product that gives hourly values of temperature measured in Kelvin 2-m above ground (°K), specific humidity measured in kilograms per kilograms 2-m above ground (kg/kg), and 10-m zonal wind speed (m/s) and 10-m meridional wind speed (m/s)—wind speed was calculated because the hypotenuse of those values.
Along with meteorological variables, we controlled for several performance variables specific to every athlete-race combination. We controlled for the athlete’s personal record prior to the 5-km race being evaluated and the athlete’s previous 5-km race time, each of which represent the athlete’s ability. We also controlled for the variety of days for the reason that previous 5-km race, athlete’s 12 months in education, and variety of days into the calendar 12 months as a proxy for a way athletes develop over a season. We included random effects for (a) the athlete’s home university and (b) race.
Non-linear distributed lag model
We employed distributed-lag non-linear models (DLNMs) to characterize the lagged effects of exposure30. DLNMs can capture complex exposure-lag-response relationships by concurrently adjusting for exposure at each lag, via non-linear terms akin to natural splines. This sort of model has been utilized in other studies on the lagged effects of air pollution and temperature on health outcomes31,32,33,34,35,36. In a DLNM, lagged exposure is represented as a crossbasis term, a combined basis matrix for the exposure dimension and lag dimension. For every exposure, the constraints of its basis matrix, i.e., its functional form, were chosen via Akaike information criterion (AIC). We tested various degrees of freedom (df = 3, 4, 5) and equal and logarithmic lag placements for each exposure and lag dimensions.
Final model
We estimated the association between air pollutant exposure and race times using mixed-effects linear models. First, we used the backwards stepwise AIC function to pick the covariates of the bottom mixed-effects linear model without the crossbasis terms for air pollution exposure. Next, for every of the 4 exposures, we ran the model with chosen covariates and compared AIC values, while various functional types of the cross-basis matrix. Our final model is represented by the next Eq. (1).
$${E[O}_{i,j,k}] =sum_{l=1}^{21}{beta }_{l}{T}_{l}+{{varvec{lambda}}X}_{i}+{delta }_{i}+{gamma }_{k}$$
(1)
On this equation, i, j, k represent the race, athlete, and athlete’s home university of interest, respectively; ({O}_{ij}) represents the 5-km race consequence of interest in total altitude-adjusted seconds; ({T}_{l}) is the air pollution exposure matrix obtained by applying the premise functions to either PM2.5, ozone, two-pollutant threshold AQI, or summed two-pollutant AQI; ({beta }_{l}) represents the coefficients for the lagged air pollution exposure matrix ({T}_{l}) differentiated by the lag day (l), which ranges from 0 to 21 days. For every of our 4 exposures, in accordance with the AIC, we identified the optimal type of the cross-basis matrix for each the exposure–response and lag-response functions to be a natural cubic spline with 5 degrees of freedom (df) at equally spaced knots. The covariates in our equation are represented by ({X}_{i}), and ({varvec{lambda}}) is the vector of coefficients. Random intercepts for each the race and the athlete’s home university are represented by ({delta }_{i}) and ({gamma }_{k},) respectively. In the ultimate model, the backwards AIC algorithm chosen all covariates apart from the previous 5-km race time. We estimated lag-response relationships and cumulative effect estimates for comparing exposure on the eightieth percentile with the twentieth percentile.
All analyses were conducted using R version 4.0.3 and the next packages: dlnm version 2.4.6 for the distributed lag model, lme4 version 1.1–27.1 for linear mixed-effects models, and MLmetrics version 1.1.1 for evaluation metrics30, 37, 38. All included data are publicly available; data and code used for evaluation is offered on Github (https://github.com/marikamaecusick/RunningAP).
Sensitivity analyses
First, given uncertainty of the impact that different lagged exposures may impose, we considered alternative variety of lags by running the models using 14-day and 28-day lags for all 4 of our exposures. Second, we fit 1000 negative exposure models, which evaluate the impact of a perturbed exposure matrix randomly sampled from exposures in our study, on race outcomes. If we detected statistically significant associations for greater than 5% of the models, this is able to suggest that our confidence intervals are overly confident (i.e., too narrow)39, 40. We assessed the proportion of perturbed exposure matrices that had a statistically significant cumulative effect for every of our exposures when comparing the 20–eightieth percentile exposure.