Menu

Project page

Every Breath You Take, Visualization and Visual Analytics

About

Every Breath You Take, Visualization and Visual Analytics

Introduction

This is the second project for the CS 424 Visualization and Visual Analytics class at UIC. It consists in various visualizations and interactive plots in a web application created using the Shiny library for R. The visualizations are about the Air Quality dataset by County in the United States, concentrating on the daily and hourly data.

Authors

Mirko Mantovani
Abhishek Vasudevan
Ashwani Khemani

Code

Go to Github Repository

Data

Data link

Access the application

Access the application on a EVL Shiny server (optimized for big wall displays)
Access the application on Shinyapps.io (optimized for normal displays)

(please notice that the application hosted on shinyapps.io is only a demo since its size is too big for a free plan, hourly and daily data for the US are not present there)

How to run the application

Accessing the web application online

Just click on the provided link to access the web application hosted on the EVL server or the one hosted on shinyapps.io from your browser. It is recommended to use a recent version of Google Chrome.

Hosting the web application locally

Clone the project repository. Open app.R with RStudio, download RStudio and R if you don't have them installed on your machine. Download all the R libraries used in the project executing the command install.packages("library") in the RStudio console. The application needs the data from the EPA website to be converted in to R data objects for better performance. The data required for the application can be downloaded from the data link provided at the start of the page. Please follow the below steps to do the required preprocessing before running the application to generate the data again.
  • There are 4 preprocessing required to generate the necessary data files for the application:

    dataRead.R, preProcess.R, preprocess_hourly_italy.R, preprocess_daily_italy.R

  • The first two files dataRead.R and preProcess.R are used to download the data from the US EPA website and convert the data files into fst. The dataRead.R needs to be run to download the files first. This will download all the files and unzip them into a "data" folder. Run this script from the root directory of the project.
  • Now, run the preProcess.R script from the root directory to convert the data files into fst format. These files would be generated in a new folder "fst" in the project folder. This script might take some time to complete
  • The CSV files containing pollutant data for italy are already available in the GitHub repository in the project folder. We need to convert this to the fst format for the app to run. To achieve this, run the preprocess_daily_italy.R and preprocess_hourly_italy.R scripts from the root directory and this will generate the fst files inside the italy folder. Note the fst files for italy are also already provided in the project italy folder.
Once we have all the neceessary data/fst files , we can now proceed with running of the application. Run the application via RStudio by clicking Run App at the top right on the main RStudio panel. Access it using a browser and with the local machine address and port on which the application is started. (http://127.0.0.1:6676/)

Functionalities

Getting started and Sidebar

The application starts in full screen. You can open the menu items for different categories in the sidebar to see more options. The inputs item helps the user to switch between metric or imperial units for the whole application.

Yearly visualizations: Yearly trends

This first tab shows AQI and pollutants time series. The left box contains various inputs that allow interactivity with the plots. You can choose the color of the plot background, choose the county from an alphabetically ordered list of (County - State) pairs, select a range of years in which you want to concentrate.
Screen
By clicking on the settings button you can also change the grid and text colors in the plots to black, if the background color is too bright. In addition, you can change the colors for the pollutants for the second plot.
Screen
Screen

The first plot in the first tab shows AQI statistics over time. The second tabs contains a time series plot of the percentages of days as main pollutant and a table with those percentages. The third tab is a map showing the location of the selected county on the map, and showing all the counties in the US that highlights in white when you hover over them.

Yearly visualizations: Year details for County

The main panel is divided into two boxes, the one of the left shows AQI (Air Quality Index) levels. It consists in: a pie chart showing the percentages (sometimes estimated if there are missing data) of days with a certain level of AQI in a specific year for the selected County. Under this pie chart there is a bar chart and a table, both showing the number of days in the year with that level.
The right box, instead, shows detected pollutants data. The first tab in this box shows a pie chart for each pollutant with the percentage of days in which that pollutant was the main cause of problems. The second tab shows a bar chart with the number of days in which they were the main pollutant in that year. Again, the table at the bottom shows the same thing as the bar chart but in a different way.
Screen

Daily AQI

This panel allows the user to visualize daily AQI trends for the selected year for all six pollutants : Ozone, SO2, CO, NO2, PM2.5, PM10. The left box allows the user to select a particular county from a list of alphabetically sorted counties in the US. The user can also choose the year for which the AQI data is to be displayed using the slider in the left panel. There are three visualizations available for daily AQI: One is the line graph which shows the AQI values over all days for selected year. The color of the points shows which pollutant had the highest value for that day. The user can click on any point on the line graph which would display a tooltip showing the date for that point, AQI value and major pollutant for that date.
Screen
The second visualization is the stacked bar chart which shows the number of days of each AQI category for all 12 months in the selected year. Different shades of grayscale indicate the AQI category as shown in the legend.
Screen
The third visualization is the table which similar to the bar chart shows the number of days of each AQI category for all 12 months in the selected year. The user can also see the daily trends for the top 12 counties in the US by making use of the switch button provided in the left box.
Screen
The conversion to imperial units can be made by using the inputs tab in the main panel.

Hourly pollutants

This panel allows the user to visualize the hourly data for the six pollutants Ozone, SO2, CO, NO2, PM2.5, PM10 along with wind speed and temperature. The left box allows the user to select a particular county from a list of alphabetically sorted counties in the US. The user can then select any particular day in a year 2018 and see hourly trends for the pollutants , wind speed and temperature.

Data Selection Panel

The user can now pick any subset or all of hourly Ozone, SO2, CO, NO2, PM2.5, PM10, wind, and temperature and see them as different lines on the same line chart by using the select checkboxes below. A tick mark shows a selection for the checkbox.The user can also see the hourly trends for the top 12 counties in the US by making use of the switch button provided in the left box.To change the units for the hourly data, the user can make of the switch to imperial units option in the Inputs section of the main sidebar.

Legend and units

The legend shows the mapping of data to color along with the units. Notice that you can change from imperial to metric units from the sidebar switch in the Inputs tab.
Screen

Pollutants heatmap

The interactive map allows the user to visualize a heatmap of all the counties of the US and a pollutant (or AQI). It is possible to see data for an entire year or change to daily data through the switch that can be found in one of the 2 input panels ("Time and Pollutant").
Screen

As the rest of the application, this map is responsive and particularly, it was created for the big wall of display that we have in the classroom (11520 by 3240 pixels). In order to allow a practical user experience in terms of the touch screen wall present in the classroom, the UI has been designed to work best in this configuration that is the one applied to this version of the application:

Shown counties panel

The shown counties panel is static and positioned in the bottom right corner together with the legend. This type of circular input slider was implemented thinking about a functional touch screen use. This is why the whole panel is static and the shiny input variable behind the scene is only updated after approximately half a second since the user started interacting with it. The user can also type in a number by clicking in the displayed number using a HW keyboard. An additional input is present in this panel: a confidence level slider which allows the user to control if showing the counties with less data, thus less confidence in the computed percentages, with less opacity (towards 0) or to show all with the same opacity (towards 1).
Screen

Time and Pollutant panel

The second panel that controls the Time and Pollutant inputs is dynamic, users can drag and drop it wherever they want in the map. This is useful when the screen is very large and the users would have to move by a few steps just to reach this panel and change the input. The inputs present in this panel are: Pollutant and AQI (if yearly data), switch to yearly/daily data, choose year for yearly data, choose month and day for daily data (only the year 2018 is available).
Screen

Legend, units and colors

The legend shows the mapping of data range to color. Notice that you can change from imperial to metric units from the sidebar switch in the Inputs tab. The color scale is continuous, the palette used is Viridis, which uses a range of colors distinguishable by all types of color blind people.

Map interactions

The user can visualize the name of the County by hovering over any County. The user can visualize the name of the State and County, as well as the precise pollutant value, the percentage of available days (confidence level), the total days available and a link to the wikipedia page of that County by clicking on any County.
Screen

Italy: daily trends

This panel allows the user to visualize the daily pollutant value line graph for italy for the six pollutants: Ozone, SO2, CO, NO2, PM2.5, PM10. The left panel allows the user to choose any italian city for which data is available. The line graph shows a different colored line for each pollutant. The x-axis denotes the date and the y-axis is the pollutant value. The user can visualize select pollutants by turning off the other pollutants in the checkboxes given below the graph. The user can also click on points which would display a tooltip showing the exact date for that point.
Screen
The conversion to imperial units can be made by using the inputs tab in the main panel. By default, the dataset contains all pollutants in the ug/m3 unit. By switching to imperial, the units are converted to the (e-12 oz/ft3) unit.

Italy: hourly trends

This panel allows the user to visualize the hourly data of Italy over the period of 90 days (9 December - 8 March 2019) for the six pollutants Ozone, SO2, CO, NO2, PM2.5, PM10.The left box allows the user to select a particular city from a list of alphabetically sorted counties in Italy.The user can then select any particular day for the given period and see hourly trends for the pollutants.

Data Selection Panel

The user can now pick any subset or all of hourly Ozone, SO2, CO, NO2, PM2.5, PM10, and see them as different lines on the same line chart by using the select checkboxes below. A tick mark shows a selection for the checkbox. To change the units for the hourly data, the user can make of the switch to imperial units option in the Inputs section of the main side bar.

Legend and units

The legend shows the mapping of data to color along with the units. Notice that you can change from imperial to metric units from the sidebar switch in the Inputs tab.
Screen

Italy: Totals over 90 days

This panel allows the user to visualize the average value of pollutants of any selected city in Italy in the date range between December 8,2018 to March 9, 2019. The left panel allows the user to choose any city in Italy. The bar chart shows the average values of each of the 6 pollutants. A checkbox given at the bottom can be used to deselect pollutants from being displayed on the bar chart. The way the average was calculated is as follows: For a given pollutant, its values for all days was added and divided by the total number of days. If the pollutant has values only for 30 days, its summed value over 30 days is divided by 30. In this way, the user is able to understand which pollutant is more rampant in a particular city.
Screen

Data, libraries and implementation

Data

The data for the application was downloaded from the following sources:
United States Environmental Protection Agency
OpenAQ
United States Counties shape in GeoJSON

More information about how to download data and preprocessing is present in

"How to run the application" tab and "Preprocessing" tab under "Problems during the development"

Missing data

The missing dataset presents some missing data for some specific years and counties, this was handled by warning the user with an alert message whenever he selects this type of data.
Screen

Units of measure

To allow users from both the United States and the rest of the world to give a meaning to all of this data we provided a practical switch on the sidebar under the inputs tabs so that the User can choose to convert data from Metric to Imperial and viceversa.
Screen

Used R libraries

This is the list of R libraries used for this project:
  • shiny
  • shinydashboard
  • ggplot2
  • scales
  • shinythemes
  • dashboardthemes
  • ggthemes
  • shinyalert
  • leaflet
  • rgdal
  • geojson
  • geojsonio
  • colourpicker
  • shinyWidgets
  • viridis
  • cdlTools
  • htmltools
  • plotly
  • RColorBrewer
  • reshape2
  • fst
  • future
  • data.table
  • ggvis

Problems during the development

Data size and slow loading problem

One of the challenges we faced while deveoping this application was that the size of the dataset was big (around 7-8 GB) as we were dealing with yearly,daily and hourly data.

Preprocessing

To make the size of the dataset managable and make our application efficient in terms of memory and response time, we performed preprocessing on the dataset downloaded from the EPA website.This was done in an automated manner by running a single script before running the application. Please note that as we are dealing with very large file sizes here, the preprocessing could take some considerable time to complete. There are two scripts used for preprocessing. One is used to download the data files for the US(dataRead.R)and the other to perform preprocessing operation on the downloaded data(preProcess.R). The preprocessing scripts in the application performs the following operations:
  • Download the relevant data required for the application from the EPA website. This is done by data read script provided in the application
  • Create three output files one each for Daily AQI data, Daily all pollutants data and Hourly data for pollutants,wind and temperature
  • To read the various data files provided by the EPA, we make of data read function which can read the data in a fast manner. We make use of fread() function provided by R to read the very large files and make the reading process more efficient.
  • To reduce the size of the dataset we make use of one of R data object (fst). You can read more about it here .The decision to make use of this package was made after performing experiments with various R data objects like rds,rda,feather,etc and performing benchmarking on those and checking the performane of the application based on memory usage and response time.
  • Selection of relevant columns is done from the various data files so that only those columns which are relevant for visualizations are provided and read in the application.
  • The columns selected for daily AQI data are: "State Name","County Name","AQI","Category","Defining Parameter","Year","Month","Day".
  • The columns selected for hourly data are: "State Name","County Name","SO2","CO","NO2","Ozone","PM2.5","PM10","Wind Direction", "Wind Speed","Temperature","Year","Month","Day".
  • The columns selected for daily pollutants data are "State Name","County Name","SO2","CO","NO2","Ozone","PM2.5","PM10","Year", "Month","Day".
  • Aggregation is performed for data which siginify the same values by taking average for such values. For e.g Values for pollutants are averaged across multiple sites in a county, multiple monitors at the same site,etc.
  • To reduce the number of files and have all data for a particular type namely daily AQI, hourly data, daily pollutant data, merging of datasets was done for various pollutants, wind, temperature whenever possible. Special attention is paid to not lose data if one of the parameter is not available and others are during the merge process.
  • The date provided in the dataset is further split in individual components of day, month , year to avoid computations during the running of the application.

Preprocessing for Italy pollutant files

The preprocessing for Italy is straightforward compared to the USA.
  • Download the CSV data (already available in github repository) for past 90 days for each of Italy's cities using the API provided by EPA.
  • Two preprocessing scripts are used to generate fst files for italy: one is the daily data script and other is the hourly data script.
  • The CSV files contain hourly data. The hourly preprocessing script gets rid of unwanted columns in the data and converts the data frames into fst format.
  • The daily preprocessing script creates a new dataframe containing data of each pollutant for each day. Since the CSV files contain only hourly data, the pollutant value for a particular day is found by averaging the pollutant values over 24 hours for that day.
  • There are 45 Italian cities in total. Each city has several locations from which data is obtained for the same day. The pollutant value for a particular city is found by averaging the pollutant values for all locations of that city.

High application start up time

To make the application load time faster, we make use of futures to delay the load of data file like json files for maps. This was done by making us the futures functionality provided by R and loading the data only when the data is required by the application. This along with making of special R object files(fst) made the application more responsive even for the big dataset used.

Interesting Insights

Overall AQI comparison 20 years ago and now

From the heatmap it seems that the overall AQI in the US has sligthly improved over the past 20 years. I would say the average AQI over all the states in 2018 was around 40 while 20 years ago it was maybe closer to 50. Moreover, it looks like the AQI is more uniform nowadays than it was in the past.
AQI 1998


Screen

AQI 2018


Screen

PM2.5 comparison 20 years ago and now

PM2.5 is one of the new emerging and problematic pollutants, as we can see, in 20 years, a lot of counties passed from never having the PM2.5 as main pollutant during the year, to having it as most pollutant for every day of the year. We can also notice how, except the big cities in the coasts, the difference between neighboring counties is sharp and not graded, suggesting that the problem is local and that the pollutant is unlikely to spread far from where it is originated.
PM2.5 % of days as main pollutant in 1998


Screen

PM2.5 % of days as main pollutant in 2018


Screen

Urban CO pollution in Italy

Carbon Monoxide levels in urban areas are significantly high in comparison with other pollutants. In contrast, the CO levels in rural areas like Alfonsine are non-existent. Carbon Monoxide in urban areas is high due to more vehicular emissions.
Average pollutants value in Rome (Urban)


Screen

Average pollutants value in Alfonsine (Rural)


Screen

Volanic activity in Hawaii

Due to high levels of volanic activity in Hawaii, there is a high level of SO2 in the island throughout the year.
Screen

Cleanest air in the USA

Chittenden county in Vermont is supposed to have one of the cleanest air in the US devoid of air pollution. This is confirmed from the daily AQI bar chart as shown in image below where the AQI category is good for most of the days.
Screen

Hourly Data - US

The particulate matter values(PM10/PM2.5) and Ozone are usually high during the day time or late night as compared to evening.Carbon monoxide (CO) and Nitrogen Ozide (NO2) have compartively high values in the evening as compared to afternoon. The temperature are low during the night and early morning hours and peak around during the afternoon and gradually decreasing in the evening. The temperatures follow seasonal trends, are high in summer months as compared to fall and winter. The wind speed is generally more during the day time as compared to night and early morning hours.
PM values


Screen

Wind - Temperature


Screen

CO/NO2/Ozone


Screen

Hourly Data - Italy

Carbon monoxide (CO) and Nitrogen Ozide (NO2) have generally high values in the evening and early morning hours as compared to afternoon. The Ozone value are generally high during night time with some exceptions as compared to day time.
Ozone


Screen

CO and NO2


Screen

Some screenshots of the app on the big wall

Map


Screen

Map zoomed


Screen

Line Chart


Screen

Weekly Updates on progress

Week 1

Mirko

In our first meeting we decided to use part of my project 1 Just Breathe to start implementing the second project. During this week I adapted the code as base structure to work for the second project and reorganized the code (extracting css and JS from the app.js, loading the new data, created project tabs). I also implemented the map as per requirement of part C and B, and tuned the parameters to allow responsitivity for large displays.

Abhishek

Started working on daily AQI data. Analyzed the structure of the dataset and figuring out the most efficient way to extract data with less overhead.

Ashwani

During the first week, we brainstormed about different ideas on how to read the data of large sizes. We discussed various approaches along with the pros and cons of them.I came up with script to automatically download files from the EPA website and unzip them in our local repository.I also performed preprocessing of data by storing data in r data format to reduce file sizes and loading only the columns which are required for analysis.

Week 2

Mirko

During this week I implemented a date picker for the hourly data plot which allows the user to select Year, Month and Day for a county by showing to him only the available dates with data. I was also trying to figure out how to produce the heatmap but I got stuck because the incorrect data that we got from the source.

Abhishek

implemented the graph, bar chart and table for daily AQI statistics. Need to integrate this with the app's interface.

Ashwani

During this week, I worked on getting the daily data for pollutants and preprocessing it in R object format. I worked on merging the hourly and daily data for all the pollutants to make the dataset more compact and easy to use. I started working on the line charts to show the hourly data for pollutants for a particular county and a day.

Alpha version

Mirko

Completed A part: The map UI was improved allowing the user to switch back and forth between the daily heatmap and yearly visualizations. When the daily heatmap is selected, the Month and Day input for the year 2018 are shown, the pollutants choice is also modified and AQI is removed from the dropdown list because of the lack of the corresponding column in the daily data. I also decided to adopt a more user-friendly widget for the county number input, using a circular input in a fixed panel at the bottom right of the screen.

Abhishek

Completed daily data visualizations given in part C. Should continue improvising this by better data loading methods to make the app boot up faster.

Ashwani

During this week, I did preprocessing for the hourly wind and temperature data and get them in R object format. I created a single data file for all the yearly AQI data to make the loading of the data faster in the application. I completed the creation of all the hourly line charts for pollutants, wind and temperature and allowing user to pick any of the plots for analysis.

Week 3

Mirko

Changed UI sidebar. Added convert to metric input and handled conversion to metric units for pollutants map.

Abhishek

Worked on preprocessing daily and hourly data for italy (Graduate student's requirements). Started working on the UI for displaying these data on the shiny app.

Ashwani

During this week, I experimented and benchkmarked various R data objects formats to make our reading of data more efficient both in time and memory. I made use of futures to delay the load of data to the time when required. I also stated working on adding the feature of imperial units for hourly data. Hourly data plots were modified to display default plots and wind direction plot was removed.

Week 4

Mirko

No scrolling is possible in the application. Modified panels for map. Added confidence input in map and modified color opacity based on days with data in map. Minor fixes to colors and sizes for Yearly plots.

Abhishek

Completed the daily and totals for italy. Made several minor bug fixes. Added imperial conversion for daily AQI for both USA and italy. Implemented touch feature for ggvis plots.

Ashwani

Metric to imperial conversion for hourly plots for US. Worked on creation of hourly charts for one more country Italy.Added switch for top 12 counties in US. Updated the hourly preprocessing script for italy for performance. Minor fixes for yearly plots.

Video

Video presentation

Contact Us

Reach out for a new project or just say hello

Send Me A Message

Sending...
Something went wrong. Please try again.
Your message was sent, thank you!

Contact Info

Where I live

Chicago, IL
60622 USA

Email Me At

mrk23 at hotmail dot it

Call Me At

Mobile: