BCB 520 - Midterm Portfolio Post

Let’s look at funding!

Let’s do some visualizations on how well University of Idaho is acquiring grants
Midterm
DataViz
Visualization
Author

Geraline Trossi-Torres

Published

March 29, 2024

PREAMBLE

In this blog post, you will get more insight into how much federal funding is awarded at the University of Idaho (UI) by the following federal agencies: the Department of Energy (DOE), the US Department of Agriculture (NIFA), the National Institute of Health (NIH), and National Science Foundation (NSF). These agencies provide funding to different types/ areas of research (agriculture, engineering, biology, computer science, physics, social science, etc.), and we want to determine how much money is going into these types of research. We also are looking at the timeline trend of how much funding is awarded at UI compared to previous years and if there’s any positive or negative impact. We can also see the longevity of the funding, like how long that funding is available. We also did a comparison between institutions in how much funding is awarded across institutions, we compared UI with Boise State University and Idaho State University. This information will provide a general understanding of where UI stands and how it competes with other institutions in acquiring federal funding from those agencies. By the end of this blog post, you will understand the distribution of awarded federal funding across these agencies.

DATA

The data was acquired by the database of each of these federal agencies, the data is open for the public and you can obtain information of currently active/past research projects by institution and principal investigator (PI).

Data Dictionary

This data dictionary provides an overview of the variables that was used in each individual dataset, along with their descriptions and data types. This will give us more understanding of the structure of the data that was selected and will facilitate in the analysis and the interpretation of how UI stands in means of funding from each individual federal agency.

Code
DataDictionary <- read_xlsx("Data Dictionary.xlsx")

knitr::kable(head(DataDictionary ))
Variable Description Data Type
Award Number Unique identifier for the award Character
Institution Name of the institution receiving the award Character
PI Principal Investigator Character
Start Date Start date of the award Date
End Date End date of the award Date
Amount Amount awarded for the project Numeric

Summary of Data Sources

Each of these data sets will provide information in how much money is entering UI. This provide insight, where these funds is being used, like these awarded funding are used to support the activities that are conducted to support the research project, which this is also includes the funding for faculty salaries, graduate student stipends, equipment purchases, and other research-related expenses. This will also help the UI administration to have an overall general idea how these resources are being used and how is benefiting the institution.

NOTE

Some of these data sets are long and extensive and it was shorten, for your benefit to be able to visualize how the data table looks like.

DOE Awards Data (DOEawards.xlsx)

This dataset contains information about awards provided by the Department of Energy (DOE).

Code
DOEawardsUI <- read_xlsx("DOEawards.xlsx", .name_repair = "minimal")

DOEawardsUI <- read_xlsx("DOEawards.xlsx")

DOEUI_General <- DOEawardsUI %>% 
  dplyr::filter(Institution == 'Regents of the University of Idaho')

DOEUI_New_Awards <- DOEUI_General %>%
  select(Title, Institution, PI, Status, `Action Type`, `Program Office`, `Start Date`, `End Date`, `Most Recent Award Date`, `Amount Awarded to Date`)

knitr::kable(head(DOEUI_New_Awards))
Title Institution PI Status Action Type Program Office Start Date End Date Most Recent Award Date Amount Awarded to Date
Nuclear Theory at the University of Idaho Regents of the University of Idaho Sammarruca, Francesca Active Renewal Office of Nuclear Physics 12/01/2021 11/30/2024 12/29/2023 1812000
Converting methoxy groups on lignin-derived aromatics from a toxic hurdle to a useful resource: a systems-driven approach Regents of the University of Idaho Marx, Christopher Active New Office of Biological & Environmental Research 09/01/2021 08/31/2024 08/02/2023 1404162
Integrative Imaging of Plant Roots during Symbiosis with Mycorrhizal Fungi Regents of the University of Idaho Vasdekis, Andreas Active New Office of Biological & Environmental Research 08/15/2021 08/14/2024 06/26/2023 1519359
Nutrient and Fine Sediment Transport Driven by Perturbations in River Bed Movement Regents of the University of Idaho Yager, Elowyn Active New Office of Biological & Environmental Research 09/01/2020 08/31/2024 04/12/2023 603903
Code
library(readr)
DOEawardsUI <- suppressMessages(read_xlsx("DOEawards.xlsx"))

USDA to UI Awards Data (USDAtoUI.csv):

This dataset includes awards data related to the University of Idaho (UI) received from the U.S. Department of Agriculture (USDA).

Code
suppressPackageStartupMessages(library(readxl))
suppressPackageStartupMessages(library(dplyr))
suppressPackageStartupMessages(library(tidyverse))
Code
library(tidyverse)
library(readxl)
library(knitr)

USDAUI  <- read.csv("USDAtoUI.csv")
knitr::kable(head(USDAUI))
Award.Date Grant.Number Proposal.Number Grant.Title State.Name Grantee.Name Award.Dollars Program.Name Program.Area.Name
2010-09-30 2010-48679-01200 N/A N/A IDAHO SAES - UNIVERSITY OF IDAHO 7495 N/A N/A
2009-09-30 2009-48679-01200 N/A N/A IDAHO SAES - UNIVERSITY OF IDAHO 6813 N/A N/A
2008-09-30 2008-48679-01200 N/A N/A IDAHO SAES - UNIVERSITY OF IDAHO 8524 N/A N/A
2003-09-30 2003-48604-01200 N/A N/A IDAHO SAES - UNIVERSITY OF IDAHO 1097 N/A N/A
2010-09-30 2010-48024-01200 N/A N/A IDAHO SAES - UNIVERSITY OF IDAHO 11997 N/A N/A
2009-09-30 2009-48024-01200 N/A N/A IDAHO SAES - UNIVERSITY OF IDAHO 14990 N/A N/A
Code
library(dplyr)

# Assuming 'Date' is the column containing the grant date information
start_date <- as.Date("2021-01-01")
end_date <- as.Date("2024-03-15")

UIgrants_recent_grantsUSDA <- USDAUI %>% 
  filter(Award.Date >= start_date & Award.Date <= end_date) %>%
  arrange(desc(Award.Date))

library(dplyr)

start_date <- as.Date("2024-01-01")
end_date <- as.Date("2024-12-31")

UIgrants_2024_grantsUSDA <- UIgrants_recent_grantsUSDA %>% 
  filter(Award.Date >= start_date & Award.Date <= end_date) %>%
  summarise(UIgrants_2024_grantsUSDA = n())

NSF UI Awards Data (NSFUI_2.xlsx)

This dataset consists of awards data from the National Science Foundation (NSF) received by the University of Idaho (UI).

Code
library(readxl)
library(dplyr)
library(tidyverse)
library(knitr) # Ensure knitr is explicitly loaded for kable()

# Reading the dataset from an Excel file
NSFUIAwardsActive <- read_xlsx("NSFUI_2.xlsx")

# Selecting specific columns, ensure there are no leading or trailing spaces in column names
NSFUI_New_Awards_Specific <- NSFUIAwardsActive %>%
  select(Title, NSFOrganization, StartDate, LastAmendmentDate, EndDate, AwardedAmountToDate)

# Displaying the first few rows in a table format
knitr::kable(head(NSFUI_New_Awards_Specific))
Title NSFOrganization StartDate LastAmendmentDate EndDate AwardedAmountToDate
RII Track-1: Idaho Community-engaged Resilience for Energy-Water Systems (I-CREWS) OIA 08/01/2023 09/11/2023 07/31/2028 $2,099,031.00
RII Track-1: Linking Genome to Phenome to Predict Adaptive Responses of Organisms to Changing Landscapes OIA 10/01/2018 09/07/2022 03/31/2024 $20,000,000.00
RII Track-2 FEC: Developing a Circular Bio-Based Framework For Architecture, Engineering and Construction Through Additive Manufacturing OIA 10/01/2021 08/23/2023 09/30/2025 $2,999,475.00
Phase III IUCRC at University of Idaho: Center for Advanced Forestry Systems EEC 12/15/2019 03/11/2024 11/30/2024 $693,814.00
Conference: NSF EPSCoR Workshop: Intelligent Manufacturing for Extreme Environments OIA 09/01/2023 08/17/2023 08/31/2024 $99,445.00
Collaborative Research: As above so below: Quantifying the role of simultaneous LLSVPs and continents on Earth’s cooling history using numerical simulations of mantle convection EAR 07/01/2023 06/08/2023 06/30/2026 $120,952.00

NIH UI Awards Data (NIHUI_2.xlsx)

This dataset contains information about awards received by the University of Idaho (UI) from the National Institutes of Health (NIH).

Code
library(readxl)
library(dplyr)
library(tidyverse)

NIHUIAwardsActive <- read_xlsx("NIHUI_2.xlsx")

NIHUI_New_Awards_Specific <- NIHUIAwardsActive %>%
  select('Project Title', 'Administering IC', 'Award Notice Date', `Opportunity Number`, `Project Number`, `Project Start Date`, `Project End Date`, `Budget Start Date`, `Budget End Date`, 'Total Cost', 'Total Cost (Sub Projects)', 'Funding IC(s)', 'Direct Cost IC', 'InDirect Cost IC', 'Total Cost IC')

knitr::kable(head(NIHUI_New_Awards_Specific))
Project Title Administering IC Award Notice Date Opportunity Number Project Number Project Start Date Project End Date Budget Start Date Budget End Date Total Cost Total Cost (Sub Projects) Funding IC(s) Direct Cost IC InDirect Cost IC Total Cost IC
Idaho INBRE Administrative Core NIGMS 9/18/2023 PA-20-272 3P20GM103408-23S4 9/30/2001 4/30/2024 5/1/2023 4/30/2024 NA 848625 NA 610878 237747 NA
Center for Modeling Complex Interactions NIGMS 9/12/2023 PA-20-272 3P20GM104420-09S1 3/15/2015 6/30/2025 7/1/2023 6/30/2024 266181 NA NIGMS 375566 181166 266181
Idaho INBRE Program NIGMS 9/18/2023 PA-20-272 3P20GM103408-23S4 9/30/2001 4/30/2024 5/1/2023 4/30/2024 848625 NA NIGMS 610878 237747 848625
Identifying phage-bacteria interactions using a multispecies model NIGMS 8/17/2023 PAR-19-312 5P20GM104420-09 3/15/2015 6/30/2025 7/1/2023 6/30/2024 NA 152932 NA 106362 46570 NA
Sequence-structure-function relationships in human visual photopigments NIGMS 8/17/2023 PAR-19-312 5P20GM104420-09 3/15/2015 6/30/2025 7/1/2023 6/30/2024 NA 156572 NA 109348 47224 NA
Idaho INBRE Administrative Core NIGMS 5/8/2023 PA-20-272 3P20GM103408-23S1 9/30/2001 4/30/2024 5/1/2023 4/30/2024 NA 190515 NA 165515 25000 NA

DATA VISUALIZATION AND ANALYSIS

The following visualizations is going to help us visualize UI’s longevity and distribution of the university’s portfolio of current and past awards. It will helps us identify if there are awards that are nearing their expiration.Also, will be doing a comparison with UI between two peer institutions: Boise State University and Idaho State University. This will help us understand UI’s performance in securing federal funding, and how those it measures up against to the other institutions within the same region.

University of Idaho - Current and Future Portfolio

I am creating visualization that will displays the active awards from each sponsor, including their start and end dates, the amount of the award, and the name of the Principal Investigator. This will provide insights of the longevity of the University of Idaho current portfolio of awards. This will also help us identify what awards are near expiration, and which one ones has a longer duration, and to identify if any of these patterns may differ across federal agencies. Ultimately, these visualizations will provide an aid to have an understanding of the sustainability of UIs funding from different federal agencies.

In our first visualization (Figure 1), its a general overview of the longevity of our current active awards. By, looking at this Timeline chart, we can easily point out that our NSF awards has the longest longevity (meaning we will have funding until 2028). Looking at NIH and DOE awards timeline their awards are ending in the year 2024, that is something to keep in mind and following up with the PIs that have those funding to see if they applied for new grants.

USDA data was not added in this timeline chart because it didn’t provide an end date of their current active awards, so for this time of grant will be analyzed and visualized differently

Code
library(readxl)
library(dplyr)
library(ggplot2)

Q1_Data <- read_xlsx("Q1_Compilled_Data.xlsx")

# Convert StartDate and EndDate to Date objects
Q1_Data$StartDate <- as.Date(Q1_Data$StartDate, format = "%m/%d/%Y")
Q1_Data$EndDate <- as.Date(Q1_Data$EndDate, format = "%m/%d/%Y")

# Filter out rows with NA values in StartDate or EndDate
Q1_Data <- Q1_Data[complete.cases(Q1_Data$StartDate, Q1_Data$EndDate), ]

# Create the Gantt chart
# Create the Gantt chart
ggplot(Q1_Data, aes(y = Sponsor, x = StartDate, xend = EndDate, yend = Sponsor)) +
  geom_segment(size = 10, color = "black") +  # Use linewidth instead of size
  scale_x_date(date_breaks = "1 year", date_labels = "%Y", limits = c(as.Date("2021-01-01"), as.Date("2029-01-01"))) +
  labs(title = "Active Awards Timeline",
       x = "Timeline",
       y = "Sponsor") +
  theme_minimal() +
  theme(legend.position = "bottom")
Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.
Warning: Removed 6 rows containing missing values (`geom_segment()`).

Figure 1: Active Awards at UI. This Gantt Chart provides information of the longevity of the current awards that are at UI.

The previous visualization (Figure 1) was just an overview of the longevity of the current awards that are at UI. The following visualizations, we can see the active awards by sponsor and Principal Investigator (PI). By looking at these visualization we can determine the longevity of their active awards and compare/contrast between sponsors. By looking at these three visualizations (Figure 1, Figure 2, Figure 3), at a glance we can observe that we have more active awards in NSF compared to NIH and DOE, and these NSF awards has a longer end date by 2028, but still we some awards that ends in between 2024/2025 (like for the DOE and NIH awards).

By looking at these visualizations, we can say that for the sponsor DOE, we have only 4 current PIs that have active awards (their expiration date is in late 2024). Which comes to a concern, because why PIs are not applying for awards at DOE, it is something to look into. Like, look for information what the DOE asks to apply for their awards, and also talk to those current PIs at UI, to have their perspective and how can we expand the university’s portfolio this specific sponsor. For the NIH and NSF awards, we have an active flow of awards, but still we have to look into those awards that are to be expired, and what are the plans from those current PIs.

Code
library(ggplot2)
library(readxl)
library(dplyr)
library(lubridate)

# Read the data
Q1_Data_PI <- read_xlsx("Q1_Compilled_Data.xlsx")

# Filter the data for DOE sponsor
Q1_PI_DOE <- Q1_Data_PI %>%
  filter(Sponsor == "DOE")

# Convert StartDate and EndDate to Date objects
Q1_PI_DOE$StartDate <- as.Date(Q1_PI_DOE$StartDate, format = "%m/%d/%Y")
Q1_PI_DOE$EndDate <- as.Date(Q1_PI_DOE$EndDate, format = "%m/%d/%Y")

# Filter out rows with NA values in StartDate or EndDate
Q1_PI_DOE <- Q1_PI_DOE[complete.cases(Q1_PI_DOE$StartDate, Q1_PI_DOE$EndDate), ]

# Create the Gantt chart
ggplot(Q1_PI_DOE, aes(y = PI, x = StartDate, xend = EndDate, yend = PI)) +
  geom_segment(size = 5, color = "darkgrey") +
  scale_x_date(date_breaks = "1 year", date_labels = "%Y") +
  labs(title = "Active Awards Timeline by PI",
       x = "Timeline",
       y = "Principal Investigator") +
  theme_minimal() +
  theme(legend.position = "bottom",
        plot.title = element_text(hjust = 0.5))  # Set horizontal justification to center

Figure 2: Active Award by PI (DOE). This Gantt Chart provides information of the longevity of the current awards that are at UI by PI for DOE.

Code
library(ggplot2)
library(readxl)
library(dplyr)
library(lubridate)

# Read the data
Q1_Data_PI <- read_xlsx("Q1_Compilled_Data.xlsx")

# Filter the data for DOE sponsor
Q1_PI_NIH <- Q1_Data_PI %>%
  filter(Sponsor == "NIH")

# Convert StartDate and EndDate to Date objects
Q1_PI_NIH$StartDate <- as.Date(Q1_PI_NIH$StartDate, format = "%m/%d/%Y")
Q1_PI_NIH$EndDate <- as.Date(Q1_PI_NIH$EndDate, format = "%m/%d/%Y")

# Filter out rows with NA values in StartDate or EndDate
Q1_PI_NIH <- Q1_PI_NIH[complete.cases(Q1_PI_NIH$StartDate, Q1_PI_NIH$EndDate), ]

# Create the Gantt chart
ggplot(Q1_PI_NIH, aes(y = PI, x = StartDate, xend = EndDate, yend = PI)) +
  geom_segment(size = 5, color = "black") +
  scale_x_date(date_breaks = "1 year", date_labels = "%Y") +
  labs(title = "Active Awards Timeline by PI",
       x = "Timeline",
       y = "Principal Investigator") +
  theme_minimal() +
  theme(legend.position = "bottom",
        plot.title = element_text(hjust = 0.5))  # Set horizontal justification to center

Figure 3:Active Award by PI (NIH). This Gantt Chart provides information of the longevity of the current awards that are at UI by PI for NIH.

Code
library(ggplot2)
library(readxl)
library(dplyr)
library(lubridate)

# Read the data
Q1_Data_PI <- read_xlsx("Q1_Compilled_Data.xlsx")

# Filter the data for DOE sponsor
Q1_PI_NSF <- Q1_Data_PI %>%
  filter(Sponsor == "NSF")

# Convert StartDate and EndDate to Date objects
Q1_PI_NSF$StartDate <- as.Date(Q1_PI_NSF$StartDate, format = "%m/%d/%Y")
Q1_PI_NSF$EndDate <- as.Date(Q1_PI_NSF$EndDate, format = "%m/%d/%Y")

# Filter out rows with NA values in StartDate or EndDate
Q1_PI_NSF <- Q1_PI_NSF[complete.cases(Q1_PI_NSF$StartDate, Q1_PI_NSF$EndDate), ]

# Create the Gantt chart with adjusted y-axis labels
ggplot(Q1_PI_NSF, aes(y = PI, x = StartDate, xend = EndDate, yend = PI)) +
  geom_segment(size = 5, color = "gold") +
  scale_x_date(date_breaks = "1 year", date_labels = "%Y") +
  labs(title = "Active Awards Timeline by PI",
       x = "Timeline",
       y = "Principal Investigator") +
  theme_minimal() +
  theme(legend.position = "bottom",
        plot.title = element_text(hjust = 0.5),
        axis.text.y = element_text(size = 6))  # Reduce the size of y-axis labels

Figure 4Active Award by PI (NSF). This Gantt Chart provides information of the longevity of the current awards that are at UI by PI for NSF.

For the following visualizations, is going to show the total of award money of each of the sponsors. On Figure 3 we have a bar chart representing each of the sponsors and in the y axis we have the total of award money in millions. By looking at this graph we can say that NSF has the highest amount of award of our current active awards, and the least amount of awarded money is from the DOE, which is to be expected because we don’t have many active award from them.

Code
library(ggplot2)
library(scales)  # For formatting labels

Attaching package: 'scales'
The following object is masked from 'package:purrr':

    discard
The following object is masked from 'package:readr':

    col_factor
Code
# Read the data
Q1_Data_Amount <- read_xlsx("Q1_Compilled_Data_4.xlsx")

# Convert Amount to numeric
Q1_Data_Amount$Amount <- as.numeric(Q1_Data_Amount$Amount)
Warning: NAs introduced by coercion
Code
# Check if there are any non-numeric values in Amount
non_numeric <- Q1_Data_Amount[!is.na(as.numeric(Q1_Data_Amount$Amount)), ]

# Check the structure of the Amount variable
str(Q1_Data_Amount$Amount)
 num [1:126] 0 266181 848625 0 0 ...
Code
# Define colors for each sponsor
sponsor_colors <- c("DOE" = "darkgray", "NSF" = "gold", "NIH" = "black", "USDA" = "lightgray")

# Create the bar plot with colors assigned to each sponsor
ggplot(Q1_Data_Amount, aes(x = Sponsor, y = Amount, fill = Sponsor)) +
  geom_bar(stat = "summary", fun = "sum") +
  labs(title = "Total Amount by Sponsor",
       x = "Sponsor",
       y = "Total Amount (Millions)") +
  scale_y_continuous(labels = scales::unit_format(unit = "M")) +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1),
        plot.title = element_text(hjust = 0.5)) +  # Adjust title alignment
  scale_fill_manual(values = sponsor_colors)  # Use manually defined colors for each sponsor
Warning: Removed 51 rows containing non-finite values (`stat_summary()`).

Figure 5: Total Amount of Award in Curent Active Award from each Sponsor

Figure 6 is going to show the total of award money of each of the sponsors by each PI at UI. In this bar chart representing each of the sponsors, in the y axis we have the total of award money in millions, and in the x axis are the PIs that currently have funding from those sponsor. This will provide information how much funding is awarded to each PI for their research project. Knowing this type of information, you will now if that lab has funding to purchase materials, equipment, stipend for graduate/undergraduate for conducting research in their lab and also provide salary for their research technician or post-docs. The highest funding that we have is from NSF from a specific PI, and the following is NIH, for USDA the data is not presented, because in their data set doesn’t provide the list of the PIs.

Code
# Convert Amount to numeric
Q1_Data$Amount <- as.numeric(Q1_Data$Amount)
Warning: NAs introduced by coercion
Code
# Define colors for each sponsor
sponsor_colors <- c("DOE" = "darkgray", "NSF" = "gold", "NIH" = "black")

# Create the bar plot with PI on the x-axis and filled bars by Sponsor
ggplot(Q1_Data, aes(x = PI, y = Amount, fill = Sponsor)) +
  geom_bar(stat = "summary", fun = "sum") +
  labs(title = "Total Amount by PI and Sponsor",
       x = "Principal Investigator",
       y = "Total Amount (Millions)") +
  scale_y_continuous(labels = scales::unit_format(unit = "M")) +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1, size = 8),  # Adjust font size
        plot.title = element_text(hjust = 0.5)) +  # Adjust title alignment
  scale_fill_manual(values = sponsor_colors)  # Use manually defined colors for each sponsor
Warning: Removed 51 rows containing non-finite values (`stat_summary()`).

Figure 6:Total Amount of Award in Current Active Award from each Sponsor by PIs

University of Idaho - Past Portfolio (10 Years)

We already saw the current/future portfolio of UI, now lets see how was the performance of UI from the past 10 years. The reason that I am doing these specific visualization is to have a representation of new awards over the past 10 years. By doing this, we can identify trends that may lead to positive and/or negative developments in terms of UIs funding sponsors.By observing these trends, if we have an increasing of awards indicates a positive support from those sponsors, while if we have decrease of awards indicates a negative support from those sponsors, and we have to look more in detail why is there that decline and find solutions to improve in achieving acquiring awards from those sponsors.

In our first visualization (Figure 7), we can see the performance of UI for the past 10 years. This will provide an understanding of how UI performed in the past compared to our current standing. By looking at the graph we can say that the sponsor USDA has had the highest number of awards compared from the other sponsors, but looking at Figure 8 we can see that the USDA doesn’t bring the highest total amount of awarded money in that case is NSF across the past 10 years. Meaning we may have a lot of grants coming from USDA but the awarded money is not the same as a grant from NSF. Now focusing on FIGURE 8, like I mention NSF is the highest in the total amount that is being award but across these 10 years there’s has been like an up and down, especially in 2019 and 2022 has the highest funding compared to the following years.

Code
# Read the Excel file
DOEawardsUI_Dates <- read_xlsx("DOEawards.xlsx")
New names:
• `` -> `...27`
Code
filtered_data_UI_DOE <- DOEawardsUI_Dates %>%
  filter(Institution == 'Regents of the University of Idaho')

# Assuming the start date column is named "Start_Date"
ten_years_ago <- Sys.Date() - years(10)  # 

# Filter the data to include only awards that
filtered_data_UI_10y_DOE <- filtered_data_UI_DOE %>%
  filter(`Start Date` >= ten_years_ago)

#USDA

# Read the Excel file
USDAawardsUI_Dates <- read_xlsx("USDAtoUI_Edited_YR.xlsx")

filtered_data_UI_USDA <- USDAawardsUI_Dates %>%
  filter(Institution == 'University of Idaho')

# Assuming the start date column is named "Start_Date"
ten_years_ago <- Sys.Date() - years(10)  # 

# Filter the data to include only awards that
filtered_data_UI_10y_USDA <- filtered_data_UI_USDA %>%
  filter(`Award Date` >= ten_years_ago)

#NIH

# Read the Excel file
NIHawardsUI_Dates <- read_xlsx("UI_NIH_ALL_YR.xlsx")

filtered_data_UI_NIH <- NIHawardsUI_Dates %>%
  filter(Institution == 'UNIVERSITY OF IDAHO')

# Assuming the start date column is named "Start_Date"
ten_years_ago <- Sys.Date() - years(10)  # 

# Filter the data to include only awards that
# Assuming the date format is Month/Day/Year
filtered_data_UI_10y_NIH <- filtered_data_UI_NIH %>%
  filter(as.Date(`Project Start Date`, format = "%m/%d/%Y") >= ten_years_ago)

#NSF
# Read the Excel file
NSFawardsUI_Dates <- read_xlsx("NSF_ALL_YR.xlsx")

filtered_data_UI_NSF <- NSFawardsUI_Dates %>%
  filter(Institution == 'University of Idaho')

# Assuming the start date column is named "Start_Date"
ten_years_ago <- Sys.Date() - years(10)  # 

# Filter the data to include only awards that
# Assuming the date format is Month/Day/Year
filtered_data_UI_10y_NSF <- filtered_data_UI_NSF %>%
  filter(as.Date(StartDate, format = "%m/%d/%Y") >= ten_years_ago)

# Assuming the column names for "Start Date" vary across datasets, replace "Start Date" with the actual column name for each dataset.

# For DOE dataset
filtered_data_UI_10y_DOE <- filtered_data_UI_10y_DOE %>%
  mutate(Start_Date = as.Date(`Start Date`, format = "Start Date"))

# For USDA dataset
filtered_data_UI_10y_USDA <- filtered_data_UI_10y_USDA %>%
  mutate(Start_Date = as.Date(`Award Date`, format = "Award Date"))

# For NIH dataset
filtered_data_UI_10y_NIH <- filtered_data_UI_10y_NIH %>%
  mutate(Start_Date = as.Date(`Project Start Date`, format = "Project Start Date"))

# For NSF dataset
filtered_data_UI_10y_NSF <- filtered_data_UI_10y_NSF %>%
  mutate(Start_Date = as.Date(StartDate, format = "StartDate"))

# Count the number of awards by award date for each dataset
award_counts_DOE <- filtered_data_UI_10y_DOE %>%
  count(`Start Date`)

award_counts_USDA <- filtered_data_UI_10y_USDA %>%
  count(`Award Date`)

award_counts_NIH <- filtered_data_UI_10y_NIH %>%
  count(`Project Start Date`)

award_counts_NSF <- filtered_data_UI_10y_NSF %>%
  count(StartDate)

# Add Sponsor column to each data frame
award_counts_DOE <- award_counts_DOE %>% mutate(Sponsor = "DOE")
award_counts_USDA <- award_counts_USDA %>% mutate(Sponsor = "USDA")
award_counts_NIH <- award_counts_NIH %>% mutate(Sponsor = "NIH")
award_counts_NSF <- award_counts_NSF %>% mutate(Sponsor = "NSF")

# For DOE dataset
award_counts_DOE <- award_counts_DOE %>%
  rename(Start_Date = `Start Date`) %>%
  mutate(Sponsor = "DOE")

# For USDA dataset
award_counts_USDA <- award_counts_USDA %>%
  rename(Start_Date = `Award Date`) %>%
  mutate(Sponsor = "USDA")

# For NIH dataset
award_counts_NIH <- award_counts_NIH %>%
  rename(Start_Date = `Project Start Date`) %>%
  mutate(Sponsor = "NIH")

# For NSF dataset
award_counts_NSF <- award_counts_NSF %>%
  rename(Start_Date = StartDate) %>%
  mutate(Sponsor = "NSF")

# For DOE dataset
award_counts_DOE <- award_counts_DOE %>%
  mutate(Start_Date = as.Date(Start_Date, format = "%m/%d/%Y")) %>%
  mutate(Sponsor = "DOE")

# For USDA dataset
award_counts_USDA <- award_counts_USDA %>%
  mutate(Start_Date = as.Date(Start_Date, format = "%m/%d/%Y")) %>%
  mutate(Sponsor = "USDA")

# For NIH dataset
award_counts_NIH <- award_counts_NIH %>%
  mutate(Start_Date = as.Date(Start_Date, format = "%m/%d/%Y")) %>%
  mutate(Sponsor = "NIH")

# For NSF dataset
award_counts_NSF <- award_counts_NSF %>%
  mutate(Start_Date = as.Date(Start_Date, format = "%m/%d/%Y")) %>%
  mutate(Sponsor = "NSF")


# Combine all dataframes into a single dataframe
all_award_counts <- bind_rows(
  award_counts_DOE,
  award_counts_USDA,
  award_counts_NIH,
  award_counts_NSF
)

# Define colors for each sponsor
sponsor_colors <- c("DOE" = "darkgray", "NSF" = "gold", "NIH" = "black", "USDA" = "lightgray")

# Plot the timeline with overlapping lines, adjusted x-axis labels, individual colors, and centralized title
ggplot(all_award_counts, aes(x = Start_Date, color = Sponsor, group = Sponsor)) +
  geom_freqpoly(binwidth = 30, size = 1) +
  labs(title = "Awards Timeline by Sponsor",
       x = "Start Date",
       y = "Count") +
  scale_x_date(date_breaks = "1 year", date_labels = "%Y", expand = c(0, 0)) +
  scale_color_manual(values = sponsor_colors) +
  theme_minimal() +
  theme(
    plot.title = element_text(hjust = 0.5)
  )

Figure 7: Performance of UI from the past 10 years on total amount of awards

Code
# Read the Excel file
DOEawardsUI_Dates <- read_xlsx("DOEawards.xlsx")
New names:
• `` -> `...27`
Code
filtered_data_UI_DOE <- DOEawardsUI_Dates %>%
  filter(Institution == 'Regents of the University of Idaho')

# Assuming the start date column is named "Start_Date"
ten_years_ago <- Sys.Date() - years(10)  # 

# Filter the data to include only awards that
filtered_data_UI_10y_DOE <- filtered_data_UI_DOE %>%
  filter(`Start Date` >= ten_years_ago)

#USDA

# Read the Excel file
USDAawardsUI_Dates <- read_xlsx("USDAtoUI_Edited_YR.xlsx")

filtered_data_UI_USDA <- USDAawardsUI_Dates %>%
  filter(Institution == 'University of Idaho')

# Assuming the start date column is named "Start_Date"
ten_years_ago <- Sys.Date() - years(10)  # 

# Filter the data to include only awards that
filtered_data_UI_10y_USDA <- filtered_data_UI_USDA %>%
  filter(`Award Date` >= ten_years_ago)

#NIH

# Read the Excel file
NIHawardsUI_Dates <- read_xlsx("UI_NIH_ALL_YR.xlsx")

filtered_data_UI_NIH <- NIHawardsUI_Dates %>%
  filter(Institution == 'UNIVERSITY OF IDAHO')

# Assuming the start date column is named "Start_Date"
ten_years_ago <- Sys.Date() - years(10)  # 

# Filter the data to include only awards that
# Assuming the date format is Month/Day/Year
filtered_data_UI_10y_NIH <- filtered_data_UI_NIH %>%
  filter(as.Date(`Project Start Date`, format = "%m/%d/%Y") >= ten_years_ago)

#NSF
# Read the Excel file
NSFawardsUI_Dates <- read_xlsx("NSF_ALL_YR.xlsx")

filtered_data_UI_NSF <- NSFawardsUI_Dates %>%
  filter(Institution == 'University of Idaho')

# Assuming the start date column is named "Start_Date"
ten_years_ago <- Sys.Date() - years(10)  # 

# Filter the data to include only awards that
# Assuming the date format is Month/Day/Year
filtered_data_UI_10y_NSF <- filtered_data_UI_NSF %>%
  filter(as.Date(StartDate, format = "%m/%d/%Y") >= ten_years_ago)

# Remove dollar signs ($) and commas (,) from numeric columns
filtered_data_UI_10y_NSF <- filtered_data_UI_10y_NSF %>%
  mutate(AwardedAmountToDate = as.numeric(gsub("[\\$,]", "", AwardedAmountToDate)))


# Sum the total amount by start date for each sponsor
amount_sum_DOE <- filtered_data_UI_10y_DOE %>%
  group_by(`Start Date`) %>%
  summarize(Total_Amount = sum(`Amount Awarded to Date`))

amount_sum_USDA <- filtered_data_UI_10y_USDA %>%
  group_by(`Award Date`) %>%
  summarize(Total_Amount = sum(`Award Dollars`))

amount_sum_NIH <- filtered_data_UI_10y_NIH %>%
  group_by(`Project Start Date`) %>%
  summarize(Total_Amount = sum(`Total Cost`))

amount_sum_NSF <- filtered_data_UI_10y_NSF %>%
  group_by(StartDate) %>%
  summarize(Total_Amount = sum(AwardedAmountToDate))

# Add Sponsor column to each data frame
amount_sum_DOE <- amount_sum_DOE %>% mutate(Sponsor = "DOE")
amount_sum_USDA <- amount_sum_USDA %>% mutate(Sponsor = "USDA")
amount_sum_NIH <- amount_sum_NIH %>% mutate(Sponsor = "NIH")
amount_sum_NSF <- amount_sum_NSF %>% mutate(Sponsor = "NSF")

# For DOE dataset
amount_sum_DOE <- amount_sum_DOE %>%
  rename(Start_Date = `Start Date`) %>%
  mutate(Sponsor = "DOE")

# For USDA dataset
amount_sum_USDA <- amount_sum_USDA %>%
  rename(Start_Date = `Award Date`) %>%
  mutate(Sponsor = "USDA")

# For NIH dataset
amount_sum_NIH <- amount_sum_NIH  %>%
  rename(Start_Date = `Project Start Date`) %>%
  mutate(Sponsor = "NIH")

# For NSF dataset
amount_sum_NSF <- amount_sum_NSF %>%
  rename(Start_Date = StartDate) %>%
  mutate(Sponsor = "NSF")

# Assuming Start_Date is in character format, convert it to datetime
amount_sum_DOE <- amount_sum_DOE %>%
  mutate(Start_Date = as.Date(Start_Date, format = "%m/%d/%Y"))

amount_sum_USDA <- amount_sum_USDA %>%
  mutate(Start_Date = as.Date(Start_Date, format = "%m/%d/%Y"))

amount_sum_NIH <- amount_sum_NIH %>%
  mutate(Start_Date = as.Date(Start_Date, format = "%m/%d/%Y"))

amount_sum_NSF <- amount_sum_NSF %>%
  mutate(Start_Date = as.Date(Start_Date, format = "%m/%d/%Y"))

# Combine the data frames
all_amount_sum <- bind_rows(
  amount_sum_DOE %>% mutate(Sponsor = "DOE"),
  amount_sum_USDA %>% mutate(Sponsor = "USDA"),
  amount_sum_NIH %>% mutate(Sponsor = "NIH"),
  amount_sum_NSF %>% mutate(Sponsor = "NSF")
)

# Define colors for each sponsor
sponsor_colors <- c("DOE" = "darkgray", "NSF" = "gold", "NIH" = "black", "USDA" = "lightgray")

library(scales)

# Plot the timeline with overlapping lines, adjusted x-axis labels, individual colors, and centralized title
ggplot(all_amount_sum, aes(x = Start_Date, y = Total_Amount / 1e6, color = Sponsor, group = Sponsor)) +
  geom_line(size = 1) +
  labs(title = "Awards Timeline by Sponsor - Total Amount",
       x = "Start Date",
       y = "Total Amount (Millions)") +
  scale_x_date(date_breaks = "1 year", date_labels = "%Y", expand = c(0, 0)) +
  scale_y_continuous(labels = scales::unit_format(unit = "M")) +
  scale_color_manual(values = sponsor_colors) +
  theme_minimal() +
  theme(
    plot.title = element_text(hjust = 0.5)
  )
Warning: Removed 1 row containing missing values (`geom_line()`).

Figure 8:Performance of UI from the past 10 years of total of amount money awarded

Comparison Between Peers Institutions

We already saw how UI is performing with these sponsors, but now lest compare with the following peer institutions: Boise State University and Idaho State University. The aim for the following visualizations is to understand how UIs performance in securing awards, conducting research, and acquiring funding measures up against similar institutions in the region or within the same academic field.

NOTE

The following data sets are from the institutions of Boise State University and Idaho State University, the data set from University of Idaho is already been presented at the beginning of the blog post. Also, these data sets are long and extensive and it was shorten, for your benefit to be able to visualize how the data table looks like.

Boise State University - Data Sources

Department of Agriculture (NIFA)

Code
BSUUSDA  <- read.csv("USDABSU.csv")
knitr::kable(head(BSUUSDA,4))
Award.Date Grant.Number Proposal.Number Grant.Title State.Name Grantee.Name Award.Dollars Program.Name Program.Area.Name
2003-11-14 2004-35302-14138 2003-01470 Host Selection Decisions and Mass Colonization in the Douglas-Fir Beetle, Dendroctonus Pseudotsugae (Coleoptera: Scolytidae) IDAHO BOISE STATE UNIVERSITY 70000 Organismal & Population Biology of Arthropods & Nematodes National Research Initiative Competitive Grants Program
2003-08-13 2003-35101-13682 2003-01569 The Effects of Wildfire on Trophic Structure and Food Web Dynamics in Stream Ecosystems IDAHO BOISE STATE UNIVERSITY 66867 Managed Ecosystems National Research Initiative Competitive Grants Program
2004-07-22 2004-35102-14802 2004-00882 Utilizing Ground-Penetrating Radar and Solute Tracer Experiments to Determine the Extent of the Hyporheic Zone in Mountain Streams IDAHO BOISE STATE UNIVERSITY 83489 Water and Watersheds National Research Initiative Competitive Grants Program
2006-07-24 2006-35101-17430 2006-01372 Understanding Linkages Between Agricultural and Natural Systems: Trophic Structure, Pesticide Exposure, and Costs and Benefits of Group Liv IDAHO BOISE STATE UNIVERSITY 100000 Managed Ecosystems National Research Initiative Competitive Grants Program

Department of Energy

Code
DOEawardsBSU <- read_xlsx("DOEawards.xlsx")
New names:
• `` -> `...27`
Code
DOEBSU_General <- DOEawardsBSU %>% 
  dplyr::filter(Institution == 'Boise State University')

DOEBSU_New_Awards <- DOEBSU_General %>%
  select(Title, Institution, PI, Status, `Action Type`, `Program Office`, `Start Date`, `End Date`, `Most Recent Award Date`, `Amount Awarded to Date`)

knitr::kable(head(DOEBSU_New_Awards,4))
Title Institution PI Status Action Type Program Office Start Date End Date Most Recent Award Date Amount Awarded to Date
Emerging Properties through Controlled Phase Transformations for High Energy Sodium Ion Batteries Boise State University Xiong, Hui (Claire) Active New Office of Basic Energy Sciences 08/15/2023 08/14/2026 09/23/2023 599992
Uptake mechanisms of REE in sedimentary phosphorite mineral Sponsor: Department of Energy Boise State University Kohn, Matthew Active New Office of Basic Energy Sciences 09/01/2023 08/31/2026 09/14/2023 888250
DNA-Controlled Dye Aggregation ¿ A Path to Create Quantum Entanglement Boise State University Knowlton, William Active Renewal Office of Basic Energy Sciences 08/15/2023 08/14/2025 08/31/2023 12500000
Neuromorphic Systems for Power Grid Cyber-Resilience Boise State University Cantley, Kurtis Active New Office of Basic Energy Sciences 09/01/2022 08/31/2025 08/07/2023 708985

National Institutes of Health (NIH)

Code
NIHBSUAwardsActive <- read_xlsx("NIHBSU_2.xlsx")

NIHBSU_New_Awards_Specific <- NIHBSUAwardsActive %>%
  select('Project Title', 'Administering IC', 'Award Notice Date', `Opportunity Number`, `Project Number`, `Project Start Date`, `Project End Date`, `Budget Start Date`, `Budget End Date`, 'Total Cost', 'Total Cost (Sub Projects)', 'Funding IC(s)', 'Direct Cost IC', 'InDirect Cost IC', 'Total Cost IC')

knitr::kable(head(NIHBSU_New_Awards_Specific,4))
Project Title Administering IC Award Notice Date Opportunity Number Project Number Project Start Date Project End Date Budget Start Date Budget End Date Total Cost Total Cost (Sub Projects) Funding IC(s) Direct Cost IC InDirect Cost IC Total Cost IC
Administrative Core NIGMS 8/24/2023 PA-20-272 3P20GM109095-10S1 8/1/2014 5/31/2024 6/1/2023 5/31/2024 NA 723429 NA 517798 205631 NA
Center of Biomedical Research Excellence in Matrix Biology Phase II NIGMS 8/24/2023 PA-20-272 3P20GM109095-10S1 8/1/2014 5/31/2024 6/1/2023 5/31/2024 723429 NA NIGMS 517798 205631 723429
Role of LINC-mediated Mechanosignaling in MSC Aging NIA 1/29/2024 PA-16-442 5R01AG059923-05 3/1/2020 1/31/2025 2/1/2024 1/31/2025 252208 NA NIA 184500 67708 252208
Equipment for Spatiotemporal Dynamics of the Genome by 3D Orbital Tracking NIGMS 5/23/2023 PA-20-272 3R15GM123446-02A1S1 5/17/2017 7/31/2025 8/1/2022 7/31/2025 97574 NA NIGMS 97574 0 97574

National Science Foundation (NSF)

Code
# Reading the dataset from an Excel file
NSFBSUAwardsActive <- read_xlsx("NSFBSU_2.xlsx")

# Selecting specific columns, ensure there are no leading or trailing spaces in column names
NSFBSU_New_Awards_Specific <- NSFBSUAwardsActive %>%
  select(Title, NSFOrganization, StartDate, LastAmendmentDate, EndDate, AwardedAmountToDate)

# Displaying the first few rows in a table format
knitr::kable(head(NSFBSU_New_Awards_Specific,4))
Title NSFOrganization StartDate LastAmendmentDate EndDate AwardedAmountToDate
Planning: Track 1: Curriculum and Advancements in Recruitment, Education, and Engineering Retention (CAREER) EEC 07/15/2022 07/14/2022 06/30/2024 $99,808.00
IUCRC Phase II Boise State University: Center for Atomically Thin Multifunctional Coatings (ATOMIC) EEC 08/01/2021 11/14/2023 07/31/2026 $582,631.00
MRI: Acquisition of a 600 MHz NMR Console and Cryoprobe to Support Research and Education at Boise State University DBI 10/01/2022 08/10/2023 09/30/2025 $769,221.00
MRI: Track 1: Acquisition of a Liquid Chromatography-High Resolution Mass Spectrometry System for Multidisciplinary Research and Training CHE 09/01/2023 08/22/2023 08/31/2026 $710,000.00

Idaho State University - Data Sources

Department of Energy

Code
DOEaward <- read_xlsx("DOEawards.xlsx")
New names:
• `` -> `...27`
Code
DOEIDAHOSATTE_General <- DOEaward %>% 
  dplyr::filter(Institution == 'Idaho State University')

DOEDAHOSATTE_New_Awards <- DOEIDAHOSATTE_General %>%
  select(Title, Institution, PI, Status, `Action Type`, `Program Office`, `Start Date`, `End Date`, `Most Recent Award Date`, `Amount Awarded to Date`)

knitr::kable(head(DOEDAHOSATTE_New_Awards,4))
Title Institution PI Status Action Type Program Office Start Date End Date Most Recent Award Date Amount Awarded to Date
Precision Electroweak Probe of BSM Physics Idaho State University McNulty, Dustin Active Renewal Office of Nuclear Physics 09/01/2023 08/31/2026 08/31/2023 1195000
Mechanistic and Kinetic Analysis of Polymer Deconstruction and Modification by Irradiation for Polymer Upcycling Idaho State University Jenkins, Courtney Active New Office of Basic Energy Sciences 09/01/2022 08/31/2025 07/24/2023 583930

Department of Agriculture (NIFA)

Code
ISUUSDA  <- read.csv("USDAISU.csv")
knitr::kable(head(ISUUSDA,4))
Award.Date Grant.Number Proposal.Number Grant.Title State.Name Grantee.Name Award.Dollars Program.Name Program.Area.Name
2002-07-19 2002-35320-12359 2002-00673 Instrumentation for Evaluating the Role of Photosynthetic Ecophysiology in Plant Invasions of Semiarid Communities IDAHO IDAHO STATE UNIVERSITY 24464 Biology of Weedy & Invasive Species in Agroecosystems National Research Initiative Competitive Grants Program
2003-07-28 2003-35206-13612 2003-03242 Metabolic Consequences of Lipid Suppression on Carbohydrate Tolerance and Growth Performance in Rainbow Trout (Oncorhynchus Mykiss) IDAHO IDAHO STATE UNIVERSITY 74759 Animal Growth and Nutrient Utilization National Research Initiative Competitive Grants Program
2006-08-23 2006-35320-17463 2006-03625 Land Effects:Peristence Exotic Forbs in Sagebrush Steppe:Are Loss of Foundation Species/Disruption Soil Resource Partitioning Causal Links? IDAHO IDAHO STATE UNIVERSITY 323764 Biology of Weedy & Invasive Species in Agroecosystems National Research Initiative Competitive Grants Program
2010-01-12 2010-85320-20506 2009-04939 Exotic Bromus grasses in agroecosystems of the western US: REEnet synthesis of current and future invasions, impacts, and management. IDAHO IDAHO STATE UNIVERSITY 81029 Biology of Weedy and Invasive Species in Agroecosystems Agriculture and Food Research Initiative

National Institutes of Health (NIH)

Code
NIHISUAwardsActive <- read_xlsx("NIHISU_2.xlsx")

NIHISU_New_Awards_Specific <- NIHISUAwardsActive %>%
  select('Project Title', 'Administering IC', 'Award Notice Date', `Opportunity Number`, `Project Number`, `Project Start Date`, `Project End Date`, `Budget Start Date`, `Budget End Date`, 'Total Cost', 'Total Cost (Sub Projects)', 'Funding IC(s)', 'Direct Cost IC', 'InDirect Cost IC', 'Total Cost IC')

knitr::kable(head(NIHISU_New_Awards_Specific,4))
Project Title Administering IC Award Notice Date Opportunity Number Project Number Project Start Date Project End Date Budget Start Date Budget End Date Total Cost Total Cost (Sub Projects) Funding IC(s) Direct Cost IC InDirect Cost IC Total Cost IC
The Brain-Behavior Relationship: Age, Hearing, and Their Effects on Understanding Speech in Noise NIDCD 1/12/2023 PA-13-302 7R01DC015240-06 8/15/2016 7/31/2024 6/1/2022 7/31/2024 251216 NA NIDCD 200394 50822 251216
MOLECULAR ANALYSIS OF MALARIA MITOCHONDRIAL GENE REGULATION NIAID 12/18/2023 PAR-20-259 5DP2AI164244-03 1/1/2022 12/31/2026 1/1/2024 12/31/2024 364371 NA NIAID 262096 102275 364371
Timely Response to In-Hospital Deterioration Through Design of Actionable Augmented Intelligence NIGMS 6/20/2023 PA-19-056 5R01GM137083-04 7/15/2020 6/30/2024 7/1/2023 6/30/2024 397867 NA NIGMS 356308 41559 397867
The role of metal ion homeostasis in regulating bacterial capsule production NIAID 8/10/2022 PAR-18-714 1R15AI149725-01A1 8/10/2022 7/31/2025 8/10/2022 7/31/2025 408497 NA NIGMS 235007 84993 320000

National Science Foundation (NSF)

Code
# Reading the dataset from an Excel file
NSFISUAwardsActive <- read_xlsx("NSFISU_2.xlsx")

# Selecting specific columns, ensure there are no leading or trailing spaces in column names
NSFISU_New_Awards_Specific <- NSFISUAwardsActive %>%
  select(Title, NSFOrganization, StartDate, LastAmendmentDate, EndDate, AwardedAmountToDate)

# Displaying the first few rows in a table format
knitr::kable(head(NSFISU_New_Awards_Specific,4))
Title NSFOrganization StartDate LastAmendmentDate EndDate AwardedAmountToDate
CDS&E: Immersive Virtual Reality for Discovering Hidden Chemical Information and Improving Multivariate Modeling and Predication CHE 09/15/2023 09/13/2023 08/31/2026 $449,994.00
GP-IN: Pathways to tribal geosciences careers through cultural connections to iconic landscapes RISE 01/01/2022 08/16/2021 12/31/2024 $284,964.00
Supporting Transfer Student Success Using a Multidisciplinary Approach DUE 10/01/2022 07/25/2022 09/30/2028 $1,499,956.00
Reynolds Creek Carbon Critical Zone Observatory EAR 12/01/2013 07/24/2023 05/31/2024 $3,755,249.00

Data Comparison of Awarded Money

The following visualizations will show us how much is awarded money is coming to these institutions, and will provide insights into UI’s competitiveness and standing within its peer group.

In our first visualization (Figure 9) we are looking at the total of awarded money of current active awards from the USDA, and UI’s excelling compared to Boise State University. There’s no data of Idaho State University,because they don’t have current active awards with them. This indicates that UI’s is on top in the game in regards of obtaining funds from the USDA. On Figure 10 we have for the sponsor DOE in contrast of what we saw in the previous graph, UI is under performing and Boise State is excelling in obtaining funds. Which is something that we need to take into consideration and look for more information, why is UI not performing well in securing funds from DOE. Figure 11. In our last visualization (Figure 11), we have the sponsor NIH, in which UI is performing well comparing to its peers. Looking at these data we can determine in the areas the UI is excelling and under performing, and we can make the necessary adjustments to keep excelling in the academic field.

Idaho State University doesn’t have currently active USDA award that is why is not added in the following visualization

Code
# COMPARISON BETWEEEN USDA - AWARD MONEY

USDA_Compare_Data <- read_xlsx("USDA_Combined_Data_Recent_Awards.xlsx")

# Define a vector of old names and a new name
old_names <- c("SAES - UNIVERSITY OF IDAHO", "FRST - UNIVERSITY OF IDAHO-FORESTRY SCHOOL")
new_name <- "UNIVERSITY OF IDAHO"

# Use mutate with ifelse and %in% to change multiple old names to the new name
USDA_Compare_Data <- mutate(USDA_Compare_Data, Grantee_Name = ifelse(Grantee_Name %in% old_names, new_name, Grantee_Name))

# Calculate total award money for each institution
USDA_total_award_money <- USDA_Compare_Data %>%
  group_by(Grantee_Name) %>%
  summarise(USDA_total_award_money = sum(Award_Dollars))

my_colors_6 <- c("BOISE STATE UNIVERSITY" = "orange", "UNIVERSITY OF IDAHO" = "gold")

ggplot(USDA_total_award_money, aes(x = Grantee_Name, y = USDA_total_award_money)) +
  geom_bar(stat = "identity", fill = my_colors_6) +
  ggtitle("Total of Award Money from USDA Grants") +
  xlab("") +
  ylab("Total Award Money") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1),
        axis.text.y = element_text(size = 12),
        plot.title = element_text(hjust = 0.5)) +  # Corrected position of plot title
  scale_y_continuous(labels = scales::number_format(scale = 1e-6, suffix = "M"))

Figure 9: Total of Award Money from USDA Grants

Code
# COMPARISON BETWEEEN DOE - AWARD MONEY

DOE_Compare_Data <- read_xlsx("DOEawards_Combined_Data.xlsx")

# Calculate total award money for each institution
DOE_total_award_money <- DOE_Compare_Data %>%
  group_by(Institution) %>%
  summarise(DOE_total_award_money = sum(`Amount Awarded to Date`))

# Colors
my_colors <- c("Boise State University" = "orange", "Idaho State University" = "black", "Regents of the University of Idaho" = "gold")


ggplot(DOE_total_award_money, aes(x = Institution, y = DOE_total_award_money)) +
  geom_bar(stat = "identity", position = "dodge", fill = my_colors)  +
  ggtitle("Total of Award Money from DOE Grants") +
  xlab("") +
  ylab("Total Award Money") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1, size = 12),  # Adjust size of x-axis labels
        axis.text.y = element_text(size = 12),  # Adjust size of y-axis labels
        legend.text = element_text(size = 12),  # Adjust size of legend text
        plot.title = element_text(hjust = 0.5, size = 14)) +  # Adjust size of plot title
  scale_y_continuous(labels = scales::number_format(scale = 1e-6, suffix = "M"))

Figure 10:Total of Award Money from DOE Grants

Code
# COMPARISON BETWEEEN NSF - AWARD MONEY - NO APARECE LA DATA

NSF_Compare_Data <- read_xlsx("NSF_Combined_Data.xlsx")

# Filter the data for each institution
UI_NSF_data <- NSF_Compare_Data %>%
  filter(Organization == "Regents of the University of Idaho")

BSU_NSF_data <- NSF_Compare_Data %>%
  filter(Organization == "Boise State University")

ISU_NSF_data <- NSF_Compare_Data %>%
  filter(Organization == "Idaho State University")

# Convert AwardedAmountToDate column to numeric
NSF_Compare_Data$AwardedAmountToDate <- as.numeric(gsub("\\$", "", NSF_Compare_Data$AwardedAmountToDate))
Warning: NAs introduced by coercion
Code
# Remove dollar signs ($) and commas (,) from numeric columns
UI_NSF_data <- UI_NSF_data %>%
  mutate(AwardedAmountToDate = as.numeric(gsub("[\\$,]", "", AwardedAmountToDate)))

# Calculate total award money for each institution
NSF_total_award_money <- NSF_Compare_Data %>%
  group_by(Organization) %>%
  summarise(NSF_total_award_money = sum(AwardedAmountToDate, na.rm = TRUE))

# Calculate total award money for each institution
NSF_total_award_money <- NSF_Compare_Data %>%
  group_by(Organization) %>%
  summarise(NSF_total_award_money = sum(AwardedAmountToDate))

# Colors
my_colors <- c("Boise State University" = "orange", "Idaho State University" = "black", "Regents of the University of Idaho" = "gold")

ggplot(NSF_total_award_money, aes(x = Organization, y = NSF_total_award_money)) +
  geom_bar(stat = "identity", position = "dodge", fill = my_colors)  +
  ggtitle("Total of Award Money from NSF Grants") +
  xlab("") +
  ylab("Total Award Money") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1, size = 12),  # Adjust size of x-axis labels
        axis.text.y = element_text(size = 12),  # Adjust size of y-axis labels
        legend.text = element_text(size = 12),  # Adjust size of legend text
        plot.title = element_text(hjust = 0.5, size = 14)) +  # Adjust size of plot title
  scale_y_continuous(labels = scales::number_format(scale = 1e-6, suffix = "M"))
Warning: Removed 3 rows containing missing values (`geom_bar()`).

Figure 11:Total of Award Money from NSF Grants

Code
# COMPARISON BETWEEEN NIH - AWARD MONEY - CALCULATED THE AVERAGE INSTEAD THE TOTAL AMMOUNT

NIH_Compare_Data <- read_xlsx("NIH_Combined_Data.xlsx")

# Calculate total award money for each institution, removing NA values in TotalCost

NIH_total_award_money <- NIH_Compare_Data %>%
  group_by(OrganizationName) %>%
  summarise(NIH_total_award_money = sum(TotalCost, na.rm = TRUE))

# Colors

my_colors_3 <- c("BOISE STATE UNIVERSITY" = "orange", "IDAHO STATE UNIVERSITY" = "black", "UNIVERSITY OF IDAHO" = "gold")

ggplot(NIH_total_award_money, aes(x = OrganizationName, y = NIH_total_award_money)) +
  geom_bar(stat = "identity", position = "dodge", fill = my_colors_3)  +
  ggtitle("Total of Award Money from NIH Grants") +
  xlab("") +
  ylab("Total Award Money") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1, size = 12),  # Adjust size of x-axis labels
        axis.text.y = element_text(size = 12),  # Adjust size of y-axis labels
        legend.text = element_text(size = 12),  # Adjust size of legend text
        plot.title = element_text(hjust = 0.5, size = 14)) +  # Adjust size of plot title
  scale_y_continuous(labels = scales::number_format(scale = 1e-6, suffix = "M"))

Figure 12:Total of Award Money from NIH Grants

Data Comparison of Grant Duration

The following visualizations will show us how the duration of the awarded grants of these institutions, and will provide insights into UI’s competitiveness and standing within its peer group. These visualization are for the current active awards from each sponsor.

In our first visualization (Figure 13), its the duration of the current active awards for the sponsor DOE. By, looking at these bar chart, we can easily point out that our DOE awards ends in 4 years compared to Boise State University that ends in 5 years. This means that during that year gap, UI have to look for renewing awards or to apply for new awards. For the NSF award (Figure 14) Idaho State University has the longest duration of their current active awards for 10 years, and University of Idaho comes second. We are coming in par with the Idaho State University, also to clarify some of these awarded grants, depending of the type of grant can go for longer than 5 years. For the NIH award (Figure 15) University of Idaho, is at the top for the the longest duration of their current active awards for the NIH, over 20 years. Like I mentioned before some of these grants can go longer than 5 years.

USDA data was not added for this data visualization because it didn’t provide an end date of their current active awards

Code
# COMPARISON BETWEEEN DOE - Grant Duration

library(readxl)

DOE_Compare_Data <- read_xlsx("DOEawards_Combined_Data.xlsx")

library(dplyr)

# Assuming 'start_date' and 'end_date' are columns containing the start and end dates of the grants,
# and 'institution' is a column indicating the institution for each grant

# Filter the data for each institution
UI_data <- DOE_Compare_Data %>%
  filter(Institution == "Regents of the University of Idaho")

BSU_data <- DOE_Compare_Data %>%
  filter(Institution == "Boise State University")

ISU_data <- DOE_Compare_Data %>%
  filter(Institution == "Idaho State University")

# Calculate the duration of grants for each institution
# Convert "Start Date" and "End Date" columns to date objects
UI_data <- UI_data %>%
  mutate(`Start Date` = as.Date(`Start Date`, format = "%m/%d/%Y"),
         `End Date` = as.Date(`End Date`, format = "%m/%d/%Y"))

# Assuming 'grant_duration' is in days
UI_data <- UI_data %>%
  mutate(grant_duration = `End Date` - `Start Date`)

UI_data <- UI_data %>%
  mutate(grant_duration_years = as.numeric(grant_duration) / 365.25)

# BSU
BSU_data <- BSU_data %>%
  mutate(`Start Date` = as.Date(`Start Date`, format = "%m/%d/%Y"),
         `End Date` = as.Date(`End Date`, format = "%m/%d/%Y"))

# Assuming 'grant_duration' is in days
BSU_data <- BSU_data %>%
  mutate(grant_duration = `End Date` - `Start Date`)

BSU_data <- BSU_data %>%
  mutate(grant_duration_years = as.numeric(grant_duration) / 365.25)

# ISU
ISU_data <- ISU_data %>%
  mutate(`Start Date` = as.Date(`Start Date`, format = "%m/%d/%Y"),
         `End Date` = as.Date(`End Date`, format = "%m/%d/%Y"))

# Assuming 'grant_duration' is in days
ISU_data <- ISU_data %>%
  mutate(grant_duration = `End Date` - `Start Date`)

ISU_data <- ISU_data %>%
  mutate(grant_duration_years = as.numeric(grant_duration) / 365.25)

# Visualization

library(dplyr)

# Assuming UI_data contains the grant duration information for one institution
# and other_datasets contain the grant duration information for the other institutions

# Combine datasets
combined_data <- bind_rows(UI_data, BSU_data, ISU_data)

# Colors

my_colors <- c("Boise State University" = "orange", "Idaho State University" = "black", "Regents of the University of Idaho" = "gold")

# Plotting grant duration as a bar graph
# Plotting grant duration as a bar graph
library(ggplot2)

ggplot(combined_data, aes(x = Institution, y = grant_duration_years)) +
  geom_bar(stat = "identity", position = "dodge", fill = c(my_colors[combined_data$Institution])) +  
  ggtitle("Grant Duration by Institution") +
  xlab("") +
  ylab("Grant Duration (Years)") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1, size = 12),  
        axis.text.y = element_text(size = 12),  
        plot.title = element_text(hjust = 0.5, size = 14))  

Figure 13: Grant Duration by Institution (DOE). This Bar Chart provides information of the duration of the current active awards by Institution for DOE.

Code
# COMPARISON BETWEEEN NSF - Grant Duration

library(readxl)

NSF_Compare_Data <- read_xlsx("NSF_Combined_Data.xlsx")

library(dplyr)

# Filter the data for each institution
UI_NSF_data <- NSF_Compare_Data %>%
  filter(Organization == "Regents of the University of Idaho")

BSU_NSF_data <- NSF_Compare_Data %>%
  filter(Organization == "Boise State University")

ISU_NSF_data <- NSF_Compare_Data %>%
  filter(Organization == "Idaho State University")

# Calculate the duration of grants for each institution
# Convert "Start Date" and "End Date" columns to date objects
#UI

UI_NSF_data <- UI_NSF_data %>%
  mutate(`Start Date` = as.Date(StartDate, format = "%m/%d/%Y"),
         `End Date` = as.Date(EndDate, format = "%m/%d/%Y"))

# Assuming 'grant_duration' is in days
UI_NSF_data <- UI_NSF_data %>%
  mutate(grant_duration = `End Date` - `Start Date`)

UI_NSF_data <- UI_NSF_data %>%
  mutate(grant_duration_years = as.numeric(grant_duration) / 365.25)

# BSU
BSU_NSF_data  <- BSU_NSF_data %>%
  mutate(`Start Date` = as.Date(StartDate, format = "%m/%d/%Y"),
         `End Date` = as.Date(EndDate, format = "%m/%d/%Y"))

# Assuming 'grant_duration' is in days
BSU_NSF_data  <- BSU_NSF_data %>%
  mutate(grant_duration = `End Date` - `Start Date`)

BSU_NSF_data  <- BSU_NSF_data  %>%
  mutate(grant_duration_years = as.numeric(grant_duration) / 365.25)

# ISU
ISU_NSF_data<- ISU_NSF_data %>%
  mutate(`Start Date` = as.Date(StartDate, format = "%m/%d/%Y"),
         `End Date` = as.Date(EndDate, format = "%m/%d/%Y"))

# Assuming 'grant_duration' is in days
ISU_NSF_data <- ISU_NSF_data %>%
  mutate(grant_duration = `End Date` - `Start Date`)

ISU_NSF_data <- ISU_NSF_data %>%
  mutate(grant_duration_years = as.numeric(grant_duration) / 365.25)

# Visualization

library(dplyr)

# Assuming UI_data contains the grant duration information for one institution
# and other_datasets contain the grant duration information for the other institutions

# Combine datasets
combined_data_NSF <- bind_rows(UI_NSF_data, BSU_NSF_data, ISU_NSF_data)

# Colors

my_colors <- c("Boise State University" = "orange", "Idaho State University" = "black", "Regents of the University of Idaho" = "gold")

# Plotting grant duration as a bar graph
library(ggplot2)

ggplot(combined_data_NSF, aes(x = Organization, y = grant_duration_years)) +
  geom_bar(stat = "identity", position = "dodge", fill = c(my_colors[combined_data_NSF$Organization])) +
  ggtitle("Grant Duration by Organization") +
  xlab("") +
  ylab("Grant Duration (Years)") +
  scale_fill_manual(values = my_colors) +  # Apply custom colors
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1, size = 12),  # Adjust size of x-axis labels
        axis.text.y = element_text(size = 12),  # Adjust size of y-axis labels
        legend.text = element_text(size = 12),  # Adjust size of legend text
        plot.title = element_text(hjust = 0.5, size = 14))  # Adjust size of plot title

Figure 14:Grant Duration by Institution (NSF). This Bar Chart provides information of the duration of the current active awards by Institution for NSF.

Code
# COMPARISON BETWEEEN NIH - Grant Duration
library(readxl)
NIH_Compare_Data <- read_xlsx("NIH_Combined_Data.xlsx")

library(dplyr)

# Filter the data for each institution
UI_NIH_data <- NIH_Compare_Data %>%
  filter(OrganizationName == "UNIVERSITY OF IDAHO")

BSU_NIH_data <- NIH_Compare_Data %>%
  filter(OrganizationName == "BOISE STATE UNIVERSITY")

ISU_NIH_data <- NIH_Compare_Data %>%
  filter(OrganizationName == "IDAHO STATE UNIVERSITY")

# Calculate the duration of grants for each institution
# Convert "Start Date" and "End Date" columns to date objects
#UI

UI_NIH_data <- UI_NIH_data%>%
  mutate(`Start Date` = as.Date(ProjectStartDate, format = "%m/%d/%Y"),
         `End Date` = as.Date(ProjectEndDate, format = "%m/%d/%Y"))

# Assuming 'grant_duration' is in days
UI_NIH_data <- UI_NIH_data %>%
  mutate(grant_duration = `End Date` - `Start Date`)

UI_NIH_data <- UI_NIH_data %>%
  mutate(grant_duration_years = as.numeric(grant_duration) / 365.25)

# BSU - no quiere funcionar no tengo la menor idea pq
BSU_NIH_data  <- BSU_NIH_data %>%
  mutate(`Start Date` = as.Date(ProjectStartDate, format = "%m/%d/%Y"),
         `End Date` = as.Date(ProjectEndDate, format = "%m/%d/%Y"))

# Assuming 'grant_duration' is in days
BSU_NIH_data  <- BSU_NIH_data %>%
  mutate(grant_duration = `End Date` - `Start Date`)

BSU_NIH_data  <- BSU_NIH_data  %>%
  mutate(grant_duration_years = as.numeric(grant_duration) / 365.25)

# ISU
ISU_NIH_data<- ISU_NIH_data %>%
  mutate(`Start Date` = as.Date(ProjectStartDate, format = "%m/%d/%Y"),
         `End Date` = as.Date(ProjectEndDate, format = "%m/%d/%Y"))

# Assuming 'grant_duration' is in days
ISU_NIH_data <- ISU_NIH_data %>%
  mutate(grant_duration = `End Date` - `Start Date`)

ISU_NIH_data <- ISU_NIH_data %>%
  mutate(grant_duration_years = as.numeric(grant_duration) / 365.25)

# Visualization

library(dplyr)

# Assuming UI_data contains the grant duration information for one institution
# and other_datasets contain the grant duration information for the other institutions

# Combine datasets
combined_data_NIH <- bind_rows(UI_NIH_data, BSU_NIH_data, ISU_NIH_data)

# Colors

my_colors_3 <- c("BOISE STATE UNIVERSITY" = "orange", "IDAHO STATE UNIVERSITY" = "black", "UNIVERSITY OF IDAHO" = "gold")

# Plotting grant duration as a bar graph
library(ggplot2)

ggplot(combined_data_NIH, aes(x = OrganizationName, y = grant_duration_years, fill = OrganizationName)) +
  geom_bar(stat = "identity", position = "dodge", fill = my_colors_3[combined_data_NIH$OrganizationName]) +
  ggtitle("Grant Duration by Organization") +
  xlab("") +
  ylab("Grant Duration (Years)") +
  scale_fill_manual(values = my_colors_3) +  # Apply custom colors
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1, size = 12),  # Adjust size of x-axis labels
        axis.text.y = element_text(size = 12),  # Adjust size of y-axis labels
        legend.text = element_text(size = 12),  # Adjust size of legend text
        plot.title = element_text(hjust = 0.5, size = 14))  # Adjust size of plot title

Figure 15:Grant Duration by Institution (NIH). This Bar Chart provides information of the duration of the current active awards by Institution for NIH.

By looking at these visualizations, we can say that for the sponsor DOE, is the only one that we the grant duration is less compared with the other sponsors by institution. Which comes to a concern, like how we can expand our portfolio in regards of acquiring more awards from DOE. For NIH and NSF awards, we have an active flow of awards, and some of these awards are going to be active for the formidable future.

Data Comparison of Grant Status

For the last visualizations, I mentioned in the previous section, there’s different types of awarded grants. The following graphs will give us and understanding what type of grants (current active grants) are coming to UI compared to our peer institution.

The only available data for this type of analysis is from the sponsors DOE and NSF

In our first visualization (Figure 16), we are looking at the type of grants that these institutions have in DOE. We have two types of grants which are: new and renewal, for our current active we have 3 that are new and 1 renewal. This a type of data to keep in mind, to see how many grants are coming that are new, or if we have that are renewal. But, looking ar this Boise State University has more in new and renewal for DOE sponsor.

It is not surprising that UI is under performing in acquiring grants from this sponsor. We have seen that from previous visualizations

Figure 17 we are looking at the type of grants that these institutions have in NSF. In contrast to DOE sponsor, we have more types of grants in NSF, which are the following: continuing grant, cooperative agreement, fellowship award, and standard grant. I mentioned before (Grant duration section) that some grants have a longer duration, and that’s because it depends on the type of grant it has (that also applies how much money is being awarded). In general as a whole, Boise State University has more active current awards compared to University of Idaho, so the number of types of grants are going to be different then University of Idaho. So, we are going to focus on University of Idaho, that highest type of grant that we current have is the standard grant followed by continuing grants. The one that is under performing is Fellowship award, and that is because most of these awards are awarded to graduate students. Looking at this we can see that not a lot of graduate students at UI are applying to fellowships in NSF, which is something that can be changed, giving the proper guidance.

Code
# COMPARISON BETWEEEN DOE - Grant Status
library(dplyr)
library(ggplot2)

# Calculate total grant action count for each institution
DOE_Grant_Action <- DOE_Compare_Data %>%
  group_by(Institution, `Action Type`) %>%
  summarise(count = n(), .groups = "drop") %>%
  ungroup()

library(ggplot2)

# Assuming DOE_Grant_Action is your summarized data frame with counts of Action Type by Institution

ggplot(DOE_Grant_Action, aes(x = Institution, y = count, fill = `Action Type`)) +
  geom_bar(stat = "identity", position = "dodge") +
  ggtitle("Number of Counts of Action Type by Institution") +
  xlab("") +
  ylab("Number of Active Grants") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1, size = 12),  # Adjust size of x-axis labels
        axis.text.y = element_text(size = 12),  # Adjust size of y-axis labels
        legend.text = element_text(size = 12),  # Adjust size of legend text
        plot.title = element_text(hjust = 0.5, size = 14))

Figure 16: Type of Grant for current active awards for DOE.

Code
# COMPARISON BETWEEEN NSF - Grant Status
library(dplyr)
library(ggplot2)

# Calculate total grant action count for each institution
NSF_Grant_Award_Instrument <- NSF_Compare_Data %>%
  group_by(Organization, AwardInstrument) %>%
  summarise(count = n(), .groups = "drop") %>%
  ungroup()

# Check unique levels of AwardInstrument
unique_levels <- unique(NSF_Grant_Award_Instrument$AwardInstrument)

# Adjust my_colors_4 to match the number of unique levels
my_colors_4 <- c("navyblue", "darkgreen", "darkorange", "darkred")  # or any other color palette you prefer

# Plot with adjusted colors
ggplot(NSF_Grant_Award_Instrument, aes(x = Organization, y = count, fill = AwardInstrument)) +
  geom_bar(stat = "identity", position = "dodge") +
  ggtitle("Number of Counts of Action Type by Institution") +
  xlab("") +
  ylab("Number of Active Grants") +
  scale_fill_manual(values = my_colors_4) +  # Add this line to set custom colors
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1, size = 12),  
        axis.text.y = element_text(size = 12),  
        legend.text = element_text(size = 12),  
        plot.title = element_text(hjust = 0.5, size = 14))

Figure 17: Type of Grant for current active awards for NSF.

By looking at these visualizations, we can have an understanding of the types of grants that we have at UI for our current awards. I think to expand UIs portfolio is to provide information and workshops for graduate students to have the opportunity to apply for fellowship awards, which will help them to further continue on their research projects.

CONLUSIONS/SUMMARY

Looking back at all the previous visualizations, we got a better understanding in how much federal funding is awarded at the University of Idaho (UI) by the following federal agencies: the Department of Energy (DOE), the US Department of Agriculture (NIFA), the National Institute of Health (NIH), and National Science Foundation (NSF). UI excelled in some of them in securing funding in USDA, NIH and NSF grants, the only one that we need to looked into is DOE. We have only 4 currently PIs that have funding through them. It is important for the administration to gain more information, how UI can obtain more funding from DOE. The other thing we have to keep in mind is the current projects that are about to expire in 2024, and if there’s plan for renewal or writing for a new grant. Overall, University of Idaho is a academic institution that have grants coming to the university and is increasing in research projects that are on par with other institutions.

##IMPROVED VISUALIZATIONS

Question #1: NSF Active Awards by PI

Code
library(ggplot2)
library(readxl)
library(dplyr)
library(lubridate)

# Read the data
Q1_Data_PI <- read_xlsx("Q1_Compilled_Data.xlsx")

# Filter the data for DOE sponsor
Q1_PI_NSF <- Q1_Data_PI %>%
  filter(Sponsor == "NSF")

# Convert StartDate and EndDate to Date objects
Q1_PI_NSF$StartDate <- as.Date(Q1_PI_NSF$StartDate, format = "%m/%d/%Y")
Q1_PI_NSF$EndDate <- as.Date(Q1_PI_NSF$EndDate, format = "%m/%d/%Y")

# Filter out rows with NA values in StartDate or EndDate
Q1_PI_NSF <- Q1_PI_NSF[complete.cases(Q1_PI_NSF$StartDate, Q1_PI_NSF$EndDate), ]

# Determine the minimum and maximum dates for the x-axis
min_date <- min(Q1_PI_NSF$StartDate)
max_date <- max(Q1_PI_NSF$EndDate)

# Extend the x-axis by a certain margin (e.g., 1 year)
extended_min_date <- min_date - lubridate::years(1)
extended_max_date <- max_date + lubridate::years(1)

# Create the Gantt chart with adjusted y-axis labels and amounts
ggplot(Q1_PI_NSF, aes(y = PI, x = StartDate, xend = EndDate)) +
  geom_segment(aes(xend = EndDate, y = PI, yend = PI), size = 5, color = "gold") +  # Segment colored by PI
  geom_text(aes(x = EndDate, label = Amount), vjust = -0.5, hjust = -0.1, size = 2.5) +  # Add text labels for amounts at the end of bars with reduced size
  scale_x_date(date_breaks = "1 year", date_labels = "%Y", limits = c(extended_min_date, extended_max_date)) +
  labs(title = "Active Awards Timeline by PI",
       x = "Timeline",
       y = "Principal Investigator") +
  theme_minimal() +
  theme(legend.position = "none",  # Remove the legend
        plot.title = element_text(hjust = 0.5),
        axis.text.y = element_text(size = 6))  # Reduce the size of y-axis label

Amount of Current Active Awards in UI

Code
library(ggplot2)
library(scales)  # For formatting labels

# Read the data
Q1_Data_Amount_2 <- read_xlsx("Q1_Compilled_Data_4.xlsx")

# Make a copy of the Amount column
Q1_Data_Amount_2$Original_Amount <- Q1_Data_Amount_2$Amount

# Remove dollar signs from the Amount column
Q1_Data_Amount_2$Amount <- gsub("\\$", ",", Q1_Data_Amount_2$Amount)

# Convert the Amount column to numeric
Q1_Data_Amount_2$Amount <- as.numeric(Q1_Data_Amount_2$Amount)
Warning: NAs introduced by coercion
Code
# Check for NA values in the Amount column
any(is.na(Q1_Data_Amount_2$Amount))
[1] TRUE
Code
# Check the structure of the Amount variable
str(Q1_Data_Amount_2$Amount)
 num [1:126] 0 266181 848625 0 0 ...
Code
# Define colors for each sponsor
sponsor_colors <- c("DOE" = "darkgray", "NSF" = "gold", "NIH" = "black", "USDA" = "lightgray")

# Create the bar plot with colors assigned to each sponsor
ggplot(Q1_Data_Amount, aes(x = Sponsor, y = Amount, fill = Sponsor)) +
  geom_bar(stat = "summary", fun = "sum") +
  labs(title = "Total Amount by Sponsor",
       x = "Sponsor",
       y = "Total Amount (Millions)") +
  scale_y_continuous(labels = scales::unit_format(unit = "M")) +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1),
        plot.title = element_text(hjust = 0.5)) +  # Adjust title alignment
  scale_fill_manual(values = sponsor_colors)  # Use manually defined colors for each sponsor
Warning: Removed 51 rows containing non-finite values (`stat_summary()`).