Inimary Toby-Ogundeji, PhD
Assistant Professor
University of Dallas
The use of JupyterLab notebook provides a user-friendly method for learning data analysis. It is easy to work with and also provides a variety of datasets for direct use and case study data discussions. One example follow-up task that can be used to extend this data analysis activity is performing logistic regression. An example approach using Firth’s logistic regression method is provided here (https://bit.ly/31gb7vG). JupyterLab provides a temporary workspace to accomplish basic tasks in R. One consideration is that it doesn’t maintain the user’s data and/or work once they close the browser. Analysis performed in JupyterLab cannot be saved to the virtual platform, however files from the work session can be exported out and saved externally. For users wanting to have the capabilities of saving work sessions and transferring between JupyterLab sessions in a streamlined manner, they can establish a freely available account.
The activity described in this article highlight a user-friendly method to learn some basic data analysis skills. It is ideal for students with little to no experience in Biostatistics, Bioinformatics or Data Science. The article provides an opportunity for students to reflect and practice analysis of data collected from biological experiments within an online learning environment. The activity is suitable for an instructor led session (using an app with screen sharing capabilities). This article provides basic knowledge about how to use R for simple data analysis using the JupyterLab virtual notebook platform.
The goal of this activity is to familiarize the user with the basic steps for importing a data file, retrieval of file contents and generating a histogram using R within a JupyterLab environment. The workflow steps to accomplish these tasks are outlined below:
- Access JupyterLab
- Access “R”
- Access datasets
- Perform summary statistics
- Data visualization
Workflow Step-by-Step instructions and screenshots from JupyterLab
1. Access JupyterLab
a. Login to JupyterLab here: https://mybinder.org/v2/gh/jupyterlab/jupyterlab-demo/try.jupyter.org?urlpath=lab

2. Access “R”
a) Select the (+) symbol at the top left of the JupyterLab screen;
b) Select R


3. Access the dataset
a) Select the directory titled: “UPMC_cohort”;
b) Identify the filename “meta.csv”.
c) Type data<-read.csv(“meta.csv”,header=TRUE, stringsAsFactors-FALSE)
d) Click run
e) Type data
f) Click run

4. Perform summary statistics (on variable Cigarette_Pack_Years)
a) Type str(data)
b) Click run
c) Type data$Cigarette_Pack_Years
d) Click run
e) Type summary (data$Cigarette_Pack_Years)
f) Click run


5. Draw a histogram using the “hist” function
a) Type hist(data$Cigarette_Pack_Years, 100, main=”Use of Cigarette (in years)”, xlab=Cigarette Pack Years”, ylab”Frequency”)
b) Click run

References:
JupyterLab- https://jupyterlab.readthedocs.io/en/latest/getting_started/overview.html
R programming- https://www.r-project.org/
Github- https://github.com/initoby/JupyterLab_R_basics/blob/master/PECOP

Dr. Toby holds a PhD in Biomedical Sciences (specialization in Organ Systems Biology) from Ohio State University, College of Medicine. Her postdoctoral training was in Functional Genomics at the FAA-Civil Aerospace Medical Institute in Oklahoma City. She is currently an Assistant Professor of Biology at University of Dallas. She teaches several courses including: Human Biology, Bioinformatics and Biostatistics. She enjoys mentoring undergraduate students and is an active member of The APS. Dr. Toby’s research program at UD is focused on cell signaling consequences that occur at the cellular/molecular interface of lung diseases. She is also leveraging the use of computational methods to assess immune sequencing and other types of high throughput sequencing data as a means to better understand lung diseases.