In This Section

Data Management and Biostatistics Services

Published on Feb 09, 2021 · Last Updated 2 years 3 months ago


Subscribe to be notified of changes or updates to this page.

The Data Management and Biostatistics Program team supports CHOP investigators to manage, clean, and analyze data; develop and apply novel methodologies to complex pediatric research questions; and co-author grants and manuscripts.


Protocol / Grant Development

Engaging with biostatistician faculty and staff early in the research process ensures a well-designed study that tailors data collection and management to the appropriate analytic methods to answer a research question. At the inception of a study proposal, whether for a grant application, IRB protocol, or unfunded study, biostatisticians provide expertise related to study design, statistical analysis plans, sample size and power. Working closing with the study team, biostatisticians ensure the proposed design and analysis plan align with research questions and data, whether prospectively or retrospectively collected.

Typically, faculty are funded in grant proposals as co-investigators for a specific percent effort. DSBU staff are funded for a specific percent effort, either as key personnel or other staff. BDMC staff are funded for an hourly rate, either as key personnel or other staff.

Data Collection

Data Collection and Management

Appropriately collecting, recording and structuring research data is essential to a valid and reproducible research study. For prospective data collection, investigators often utilize REDCap or OnCore to collect and store research data. At the beginning of the study, biostatisticians work with the study team to ensure prospectively collected data is in the format that will most easily translate to an analytic dataset. During this stage, biostatisticians help the study team create data dictionaries and codebooks. Throughout the course of data collection, biostatisticians may serve as data managers to monitor for data anomalies. There are also data management staff that specialize in setting up these platforms for data collection and monitoring for anomalies.

For retrospective data collection, the source(s) of data often dictate which research teams are involved. If a study requires CHOP patient data for research, the Clinical Reporting Unit serves as CHOP’s honest broker to pull and structure patient data. The DSBU has access to many large, administrative datasets (e.g. IBM Marketscan), national surveys of inpatient/emergency department data (e.g. HCUP), PHIS, and other datasets.

For studies involving the FDA and other regulatory bodies, upon completion of data collection, the BDMC has statistical programmers that have experience creating datasets and documentation meeting rigorous regulatory submission criteria of these agencies.

Finally, research data should be stored securely; biostatisticians partner with Research IS to ensure folders and drives have the capacity and security to store research data.

Data Analyses

Data Analyses

Once research data are structured and cleaned, the dataset is ready for analyses. Analyses are guided by the statistical analysis plan, preferably created at the early stages of the research study but always prepared prior to analysis and based on the primary study question(s). Most often, biostatisticians utilize statistical software including SAS, R, Python or STATA to analyze data. Their code is annotated and structured in a way to facilitate replication of study results. Over the course of data analyses, the research team meets periodically to review and discuss results to ensure clinicians understand the results of the data and determine the best way to display these data for publication.

CHOP faculty biostatisticians have expertise in a wide range of statistical methods, including but not limited to causal inference, longitudinal data analysis, infectious diseases outbreak modeling, unsupervised and supervised machine learning methods, statistical genetics, and Bayesian analyses.

DSBU staff have expertise related to matching methodology (entropy balancing, coarsened exact matching, propensity score matching), mixed effect modeling, latent variable mixture models, machine learning, supervised and unsupervised data analysis and simulations. Additionally, expertise in geospatial visualizations and analyses is available within the DSBU. Examples of the DSBU's analyses can be found here.


Interpretation and Dissemination

Once results of the study are ready to report, biostatisticians typically serve as co-authors for abstracts and manuscripts (with a focus on methods sections), prepare clinical study reports, and participate in future related grants and proposals.

Data visualization is also a large component of dissemination of study results. DSBU staff have experience publishing research data via webapps such as R Shiny and ArcGIS.

BDMC’s focus is to produce text, tables, figures and listings for manuscripts, abstracts and/or submission to regulatory governing bodies, e.g. FDA, NIH, DOD and CDC.