Qchlorophyll
R Package for geographical data analysis and prediction.
The package aims to offer a set of tools for analysing marine bioregions and applying machine learning techniques (supervised and unsupervised).
On MilanoR you can find an article describing the main process in which the package was used.
Example plot of interpolated net heat flux over a user-specified spatial grid:
Functionalities
The package provides the following functions:
Functions for loading geographical .nc data easily and quickly in a nice dataframe
- A set of functions for loading and extracting geographical data from .nc files.
- A set of functions for loading geographical data with different frequency stored in .nc files.
Functions for data cleaning and descriptive statistics
- A set of functions for cleaning the data and calculating any user-defined descriptive statistics/index.
K-mean unsupervised analysis functions
- A set of functions for running k-means on the data and extracting information from the analysis.
Missing data imputation functions
- A set of functions for imputing missing data.
Random forest fitting, prediction and plotting functions
- A set of functions for loading sets of yearly .csv files grouped in local folders.
- A set of functions for fitting a random forest model, getting information from the fitted model, predicting and plotting predictions in a geographical map.
Geographical data manipulating functions
- A set of functions for changing (increasing or decreasing) the resolution of geographical data. Given a spatial grid (ie a set of longitude and latitude coordinates) the functions convert the resolution of the data supplied to the selected resolution.
Dependencies
Requires the following packages:
- ncdf4
- dplyr (>= 1.0.2)
- tidyr (>= 0.4.1)
- lubridate (>= 1.5.6)
- ggplot2 (>= 2.1.0)
- clusterSim
- mice
- lattice
- randomForest (>= 4.6-12)
- grid (>= 3.2.2)
- lazyeval (>= 0.1.10)
- stringi
- sp
- gstat
Examples
The scripts folder contains some examples of use. In the following lines you can find a quick shortcut list to each .rmd example of use file.
- Definitive guide to data loading in Qchlorophyll
- Descriptive summary statistics calculation with Qchlorophyll
- K-means analysis example script with Qchlorophyll
- Random forest model: data loading, fitting and predicting with Qchlorophyll
- Spatial data resizing in Qchlorophyll.
Heat map of net heat flux:
—
Example of partial dependence plot of y vs other variables
—
Example of bioregion variables prediction on a predictive map for each year:
—
and average predicted map
—
Example of spatial data resizing on the qnet variable (net heat flux), here is a heat map of the outcome:
—
and as a comparison, the original available data:
Notes
Global ocean heat flux and evaporation products were provided by the WHOI OAFlux project (http://oaflux.whoi.edu) funded by the NOAA Climate Observations and Monitoring (COM) program