Menu

Project page

Chicago Real-Time Air Quality, Visualization and Visual Analytics

About

Chicago Real-Time Air Quality, Visualization and Visual Analytics

Introduction

This is the third project documentation for the CS 424 Visualization and Visual Analytics class at UIC. It consists in Chicago real-time air quality and weather data visualizations in an interactive web application created using the Shiny R library.

Authors

Mirko Mantovani
Abhishek Vasudevan
Ashwani Khemani

Access the application

Access the application on EVL Shiny server (optimized for big wall displays) Access the application on Shinyapps.io (optimized for normal displays)

Data

Array of Things
Open Air Quality
Dark Sky
Chicago Traffic Tracker

Code

Go to Github Repository

How to run the application

Accessing the web application online

Just click on the provided link to access the web application hosted on the EVL server from your browser. It is recommended to use a recent version of Google Chrome.

Hosting the web application locally

Clone the project repository. Open app.R with RStudio, download RStudio and R if you don't have them installed on your machine. Download all the R libraries used in the project executing the command install.packages("library") in the RStudio console. The application. The application queires data realtime from the various data sources Array of Things, Darksky , Open AQ and Chicago Traffic Tracker in order to perform various visualizations. Please follow the below steps to do the required preprocessing before running the application to generate the data again.
  • There is 1 preprocessing required to generate the necessary data file for the openAQ part of the application. Essentially, what this script does is to generate latitude and longitude pairs for all the openAQ locations in the city of Chicago. This is stored as an fst file which will be loaded into the app when it is run.
  • The file is generated by running the preprocess_aqi.R which would create an fst file in the following location: fst/openaq.fst
Once we have all the neceessary data/fst files , we can now proceed with running of the application. Run the application via RStudio by clicking Run App at the top right on the main RStudio panel. Access it using a browser and with the local machine address and port on which the application is started. (http://127.0.0.1:port/)

Functionalities

Getting started and Sidebar

The application starts in full screen. You can open the menu items for different categories in the sidebar to see more options. The inputs item helps the user to switch between metric or imperial units for the whole application , to enable or disable the heatmap visualization and to show/hide the sensor nodes in the tabular form.

Map

This application has only one main page, a full screen map centered in the Chicago area. On the map there are sensor nodes that can be clicked by the user to get real-time, as well as historical data from it. There is also real-time traffic data that can also be clicked on to get more information. As the rest of the application, this map is responsive and particularly, it was created for the big wall of display that we have in the classroom (11520 by 3240 pixels). In order to allow a practical user experience in terms of the touch screen wall present in the classroom, the UI has been designed to work best in this configuration that is the one applied to this version of the application:

Nodes

Nodes are groups of various hardware sensors located in the Chicago area. Each node has multiple sensors in it. The nodes come from 2 different sources: Array of Things and Open Air Quality. The AoT nodes are blue, the OpenAQ green. The red nodes are the inactive nodes from AoT. Inactive nodes are all the nodes which are deployed but when queried respond with no observation for each sensor they have.
By clicking on any node, a popup will appear showing more information about the node. The node is also queried and by waiting a few seconds it is possible to visualize the data it provides in the plot panel. We also query at the same time the Dark Sky APIs that retrieve weather data and other measures for any node and they are displayed in the plots panel as well. Keep in mind that it could take a while to load they are real time data. Especially for the last 7 days and 24 hours the data gathering and preprocessing is really slow, because multiple requests are performed and in particular the AoT APIs have a very slow response time.
Screen

Nodes Table

The different Array of Things nodes along with their location are shown in tabular form on the left. The table can be shown and hidden based on the user preference. The table show the information about the various AoT nodes, what measures they are reporting and the node overall status active or inactive. The nodes which are reporting measures have "True" corresponding for that measure. The user can filter using the checkboxes below in the panel to show and hide sites based on the measures selected. The user can select any node from the table and visualize the measures for that site in form of graphical and tabular form. The comparison for two different sites can be done by selecting one row at a time and then the visualization for them show up in the current and previous section in the graphical and tabular panel.
Screen

Map background

The map background can be changed by clicking on the select boxes in the layers menu. There are 4 different map backgrounds. The default is a drawn map with train stations, streets and highways, parks and icons for most important locations.
The Dark Matter allows to make good use of the different colors (traffic and nodes) that pop out on the black background.
The satellite is a usual satellite image of Chicago. The terrain is a mix of all the previous features: allows to better visualize colors given that the predominant color is white/grey, it has terrain features (see the green areas and water), it has streets and highways highlighted.

Map backgrounds

Default



Screen

Dark matter



Screen

Satellite



Screen

Terrain



Screen

Traffic Data

Traffic data come from Chicago Traffic Tracker. This data is real-time (at most 20 minutes ago) data that describe the congestion status in the streets of Chicago. In particular, the dataset contains the current estimated speed for about 1250 segments covering 300 miles of arterial roads.
We decided to integrate traffic data because it could help explain and give a cause to the real-time variations of pollutants such as CO, PM2.5, PM10, NO2 that are also produced by vehicles. The roads are colored followed usual traffic color used in common GPS navigation applications (e.g. Google Maps). The colors conventions are: grey, no traffic (no cars or no data provided), blue: normal/high speed and above, yellow: medium speed, orange: low speed, red: very low or zero speed.
Each road is clickable and more information are provided in a popup.
Screen

Heatmap

The Heatmap can be use be used to visualize the various measure intensity across the city of Chicago. The Heatmap has three inputs : Type of Measure, Value Type (min,max,average) and time range (current, last 24 hours and last 7 days). The user can select any measure from the dropdown for the three data sources and visualize it for various values type and time duration. The Heatmap is created by interpolation all the nodes data for a particular data source in order to get better data for visualization for the entire city of chicago. Based on the amount of data available, the interpolation is not possible for some of the measures if we have very less number of nodes reporting for them and a message is shown to the user on the screen. There is a legend available for the heatmap which tell about the value being shown on the heatmap.
Screen

Panels and menus

Plots panel

This is the most important panel of the application. It shows the data queried by the various nodes in the form of line charts or tabular form. The inputs present in this panel are two: the first one is the time range, which can be selected from current (or most recent data), last 24 hours and last 7 days data. The second one is a set of checkboxes that allow the users to select only the pollutants features that they want to visualize.
It is divided into two tabs: First tab lists the pollutants metrics and the second one lists weather/climate metrics.
Each tab consists of a graphical and tabular view. The graphical view displays the metrics for node which is currently selected and it also shows the graph for the node which was previously selected. Similarly, the tabular view shows metrics for currently selected node in a data table and it also shows the the data table for previously selected node.
Tab 1 - Pollutants
The following are the metrics which can be displayed in the first tab:
1)CO, 2) H2S, 3) NO2, 4)O3, 5)SO2, 6)PM2.5, 7) PM10, 8) BC
The metrics can be filtered/removed from the graph using the checkbox given at the end of the second graph output. Following screenshot shows a sample graph shown in the first tab:
Screen
The legend shows what colors are used for each metric along with the units (note that imperial units can be triggered from the main panel on the left). Distinct colors are used for each metric. Similarly, the same data can be viewed in tabular format:
Screen
Tab 2 - Weather measures
Similar to tab 1, we have graphs and tables for tab 2. The following are the measures which can be displayed for tab 2:
1)Temperature, 2) Humidity, 3) Intensity, 4) Wind Speed, 5) Cloud coverage, 6) Visibility, 7) Pressure, 8) Ozone
Note that temperature, humidity and intensity are available in both darksky and Array of Things data, hence the distinction used in the legend is that AoT measures have the trailing text "(AOT)" in their name while darksky measures do not have any trailing text. This can be seen in the example graph below:
Screen
As usual, you can convert the units to imperial and this is showed in the legend text. We can also visualize the data in tabular format as shown by the example table shown below. As there are many weather metrics, this tab is best visualized by filtering out variabls using the checkboxes at the end of the panel.
Screen
NOTE that all these graphs and tables have data which is real time and the data is refreshed every minute. No data is stored locally. Even for displaying data for previously selected node's output graph/table, the data is fetched real time.

Layers menu

The layers menu can be opened by hovering over or clicking on the corresponding menu on the top right corner. From there you can select one of the 4 different map backgrounds, as well as toggle on or off the sensors belonging to the nodes on the map based on the type of measure they give, or if they are active or inactive. The last toggle is the traffic data, that shows or hides the roads real-time traffic visualization layer.
Screen

Units of measure

To allow users from both the United States and the rest of the world to give a meaning to all of this data we provided a practical switch on the sidebar under the inputs tabs so that the User can choose to convert data from Metric to Imperial and viceversa. Pollutants having the ppm unit (part per million) do not have any equivalent in the imperial system and hence remain the same. The conversion details for other pollutants are listed below:
Screen

Used R libraries

This is the list of R libraries used for this project:
  • shiny
  • devtools
  • shinydashboard
  • ggplot2
  • scales
  • shinythemes
  • dashboardthemes
  • ggthemes
  • shinyalert
  • leaflet
  • rgdal
  • shinyWidgets
  • viridis
  • cdlTools
  • htmltools
  • RColorBrewer
  • reshape2
  • fst
  • data.table
  • dplyr
  • tidyr
  • lubridate
  • tidyverse
  • ropenaq
  • darksky
  • RSocrata
  • base
  • sp
  • raster
  • gstat

Interesting Insights

Using traffic data

As explain in the documentation, using traffic data could help explain and give a cause to the real-time variations of pollutants such as CO, PM2.5, PM10, NO2 that are also produced by vehicles. A strategy to do this kind of visual analytics with the application would be to select in the layers the nodes that give information about those pollutants, then select nodes with different type of traffic and compare the current pollutants value. Unfortunately, the data provided by the nodes is not really good, the CO measurement is most of the times negative (and the APIs don't explain what could be the meaning of negative values, it could be that it means that it is 0), the NO2 is 0 everywhere in Chicago, and the pm2.5 and pm10 are almost not provided by any sensor. For the sake of explaining the application's functionalities, I provide a screenshot of what the application looks like when interactions of this type are performed.
Screen

Last 7 days, downtown vs uptown

I tried to compare 2 different locations in terms of traffic, buildings, amount of people, closeness to the lake. The upper plot shows the data for the last 7 days for a node located in uptown, close to the lake. The bottom plot shows data for the last 7 days for a node located in downtown, a little more west (further from the lake). The things that I noticed are that: downtown the humidity level is more constant and overall less than uptown. H2S is only present downtown (it is 0 in the 08F node), the temperature is higher in uptown.
Screen

Heatmap for Particulate Matter 2.5 over the week

I tried comparing the average PM2.5 measurement value from the OpenAQ source for the last week and it looks like that south side of Chicago has more PM2.5 particles as compare to north side of Chicago. The reason might be due to presence of scrap yards, distribution warehouses and low income neighborhoods in south part of chicago.
Screen

Heatmap for Light Intensity over the last week

I tried comparing the light intensity for Chicago area for the last week. It can be seen that few areas have a compartively very high light intensity as compared to the rest of the city. It may be due to the fact that some locations have better access to light as compared to other places leading to better intensity for those locations.
Screen

Temperature over the last week using Darksky

I tried comparing the Temperature values for the last weeek using Darksky API using the Heatmap and sometimes darksky shows absurd values for Temperature (-60 F) like shown in the figure. The reason behind this might be faulty sensors. I came to know by reading more that darksky sensors are prone to errors and can give wrong measurement sometimes.
Screen

Data obtained fromopenAQ

OpenAQ reports more or less the same level of values for the weather measures. This is possibly because the weather is constant throughout Chicago or it is possible that the sensors report inaccurate values. The following table shows two openAQ nodes which have almost same values for many of the metrics.
Screen

Comparing data from AoT and openAQ

For two locations which are geographically close, the values obtained for the pollutant Ozone is pretty different as seen from the data obtained from openAQ(top graph) and the bottom one(AoT). The measure is reported in ppm and hence even a small difference is a lot.
Screen

Temperature obtained from Darksky and AoT

For a given AoT node, the darksky temperature is messed up as it shows -56 degrees celsius as shown in screenshot. On the other hand, the AoT sensor reports it accurately for the past 7 days.
Screen

Weekly Updates on progress

Week 1

Mirko

For the first week I started my work by designing the initial high-level Graphical User Interface with sidebar, tabs, colors. I then concentrated on the Array of Things APIs. I integrated useful functions that we would need in our project inspired by the official R wrapper APIs that are currently not working. I noticed that there were a few problems in the AoT APIs, and with the queries that they provide it is tricky to get the initial information that we need for the map (dataset with all the nodes and the sensor that each node provides). Since right now there is no trivial way (apparently) to get the sensors list from a node, I got the same information parsing the observations of each individual node, this process is slow due to the need of querying each node and slow response from the AoT servers, it takes approximately 30 seconds. Once I created the needed dataset I noticed some interesting problems: only 38 nodes currently responde with data observations (the toal number of nodes already deployed is 119). Also, I cannot explain why 2 nodes with different vsn (070 and 072) have same exact location and address. I then plotted on the map the locations of the sensors and created the selection layers for the users, where they can select to visualize only part a subset of sensors. I saved the nodes information in FST format and created a variable that controls the update of the nodes information or retrieves the previously saved one. I also created a basic plot and the comparison for the Current data and all of the tracked measures coming from the AoT source: when clicking on a node the plot is displayed for the selected measures, the user can also select another node and compare it two the previous one by visualizing an identical plot below the most recent's one.

Abhishek

Started working on preprocessing the OpenAQ data. Task is to bring it to a certain dataframe format, so that reading it becomes easy.

Ashwani

During this week, I tried working with the R library for AoT and contacted the owner of repository for a fix for the recent changes. We then moved forward with using the AoT API directly. We discussed about the various project requirements and how to implement them. I worked on creating some of the graphical charts for the Array of Things data and added corresponding units of measure for them.

Week 2

Mirko

If I thought the Array of Things data source was bad from the first week impressions, now I can confirm this is the worst APIs and service I've ever used. The challenges I faced this week were related to showing the last 24 hours and last 7 days of data in a line chart upon clicking on a sensor. I discovered that a lot of sensor that have data, randomly shut down and turn back on disrupting the service and making it difficult to understand what is going on with the data. I had trouble in finding a node which was pretty good in terms of providing data constantly, eventually I picked node 08D. The issue now was on how to query the API s to get the data I wanted. I initially tried to get all the observations in the last 24 hours and then preprocess them. I soon realized the infeasibility of this method because of the latency in downloading and preprocessing such a large quantity of data (around 100k rows) in real-time. It would take around 1 minute from when the user clicks the node to when the plot is generated. Way too much. I thought about a possible implementation in which data for each node are preloaded and processed when the application starts but that would take too much time and the user expects reactivity quickly. My final solution to solve this problem was that of subsampling the data (200 obs each hour/day) based on the timestamp. It also took me a long time to figure out how to create an HTTP request to the AoT APIs filtering based on timestamp. The official APIs documentation as of now lacks examples or proper explanation. Eventually I got it to work. Another cool surprised AoT gave us is that there are places in Chicago with -30 ppm of so2, which would be great if it was true but I'm afraid it's not really physically possible.

Abhishek

Done with the preprocessing script for openAQ data. Need to start working on creating 4 new tabs for displaying graph and tables for two subsets of data identified during our discussion.

Ashwani

During this week, I created the table for node and sensor information for Array of Things dataset. I then worked on the tabular visualzation for Darksky for the current time in a separte panel. The functionality to filter sites based on measures was added for the table representation of the sites. The graphical charts for Array of Things now can take input from the table and show corresponding visualizations. The selection of multiple inputs from table is a bit tricky and needs to be looked into in detail for the comparsion task.

Alpha version

Mirko

During the third week and for the Alpha version I integrated OpenAQ nodes on map, created nodes legend in map, added 4 different map backgrounds and refreshing the charts every minute which completed all the C requirements.

Abhishek

Merged panels for openAQ and AoT pollutants which will be tab 1. Tab 2 will contain other variables like intensity, humidity, etc. Merge needs to be done for this also which will act as tab 2.

Ashwani

During this week, I fixed the map and table inputs issue for graphical plots to handle them independently. I preprocessed the darksky data and created the 24 hours and last 7 days charts for Darksky. I worked on the issue of comparison of plots where same nodes were being shown on comparsion. The comparison fix currently has been done for Array of Things map and table inputs and needs to be added for Darksky once the reorganization of charts is done.

Week 4

Mirko

After the professor's approval on the new data source to be used as a graduate additional requirement, I integrated real-time traffic and congestion data taken from Chicago Traffic Tracker in the application. In particular, the dataset comes with 5 important features, which are: start and end of longitude and latitude and current traffic speed. I decided to show thid data graphically on the map by creating colored lines with the following color convention. colors: grey, no traffic (no cars), blue: normal/high speed and above, yellow: medium speed, orange: low speed, red: very low or zero speed. By clicking on a street, a popup appears showing data such as: the name of the main street and the crossings it joins, numeric speed in mph, link to APIs. The street layers can be toggled on or off from the respective layers menu.

Abhishek

Successfully merged AoT, openAQ and darksky into one panel. This would involve querying darksky and openaq simultaneously as parameters from both sites are displayed on the graph. I also added tables for displaying metrics and values for current node and also a comparison table for the previous node chosen.

Ashwani

During this week, I changed the sensor table presentation to make it visually better. I fixed the filtering of sites from sensor table to show sites based on measure availability. I added a button to hide the table based on user preference. I fixed various issues related to graphical plots for Darksky. I started working on the heatmaps and preprocessed the Array of Things API data required for its heatmap.

Final delivery

Mirko

For the final week I adjusted all the sizes to make the application work well on the class wall, fixed some recurrent bugs. I also tried to decrease the loading time for the last 24 hours data for the Array of Things by getting all the entries in the last 24 hours, the advantage of this over the method in which I get samples of data for each hour is that only 1 request has to be made, and AoT has a slow response time so 1 request with a lot more data is faster than 24 requests with small data. The problem here was that if you pass as query parameters the instants defining the interval between now and 24 hours ago and don't modify the maximum size of the retrieved data, you will only get 200 results. If instead you modify the size, there is a limit of 100k in the API that is not enought for some nodes that could have more than 100k entries in the last 24 hours. In the end I decided to just switch back to the old and slow method. There is not much we can do to improve the speed of data gathering and plot generation for AoT since their APIs are the main bottleneck and they are also not working as expected.

Abhishek

Fixed various minor bugs. I changed colors for all variables used in the graph so that they all have colors which are easily differentiable. I also added documentation on the website related to the merged panel and also how the data is being displayed from the various sources into one panel.

Ashwani

During this week, I worked on completing the heatmaps for all the three data sources, Array of Things, Darksky and OpenAq. I did preprocessing for all the three data source to get data for all the nodese for them. I created the various inputs for heatmaps for all the required measures. The heatmaps are created for various type of values min,max,average, for all time ranges current, last 24 hours and last 7 days. The randomization of Darksky key is done to select any key at random for a given set of keys. I fixed the slowness of previous click on the graphical plots. I explored the possibility of saving the previous data but it might causes some problems so the data is queried in real time which might affect performance depending on the response time of API and amount of data requested.

Video

Video presentation

Contact Us

Reach out for a new project or just say hello

Send Me A Message

Sending...
Something went wrong. Please try again.
Your message was sent, thank you!

Contact Info

Where I live

Chicago, IL
60622 USA

Email Me At

mrk23 at hotmail dot it

Call Me At

Mobile: