项目作者: OldLipe

项目描述 :
ggplot extension to visualize SOMs network
高级语言: R
项目地址: git://github.com/OldLipe/ggsom.git
创建时间: 2018-05-23T16:55:09Z
项目社区:https://github.com/OldLipe/ggsom

开源协议:Other

下载


" class="reference-link">ggsom

Travis-CI Build Status
CRAN
Downloads

Overview

The aim of this package is to offer more variability of graphics based on the self-organizing maps (SOM). The SOM (Kohonen, 1982) is an unsupervised neural network that uses the competitive learning process to map multidimensional input vectors into a low-dimensional rectangular or hexagonal two-dimensional grid. In short, output nodes compete for input vectors, and at the end of each iteration the winning node (BMU) is determined, the one with the shortest euclidean distance to the input vector. After choosing the BMU, all the neighboring nodes of a certain radius, update their values in order to approach the chosen pattern (Kohonen, 2013).

The tool developed in this work is based on two packages of programming language R: Kohonen (Wehrens & Buydens, 2007) and ggplot2 (Wickham, 2011). The Kohonen package is used to train the SOM and the ggplot2 to create the parallel coordinate graph. Thus, the ggsom package operates as a utility between the two above mentioned packages, in order to model the data generated by Kohonen and visualize it in ggplot2.

Installing Requirements

  1. # Easiest way to install this package
  2. devtools::install_github("oldlipe/ggsom")
  3. # Or using CRAN repository
  4. install.packages("ggsom")
  5. # Functions to train self-organising maps (SOMs)
  6. install.packages("kohonen")
  7. # The easiest way to get ggplot2:
  8. install.packages("ggplot2")

Basic example of using the ggsom package

  1. # ggplot2 package import
  2. library(ggplot2)
  3. # use iris dataset
  4. data(iris)
  5. # Creation SOM - 5x5
  6. iris_som <- kohonen::som(X = as.matrix(iris[1:4]),
  7. grid = kohonen::somgrid(xdim = 5,
  8. ydim = 5,
  9. neighbourhood.fct = "gaussian",
  10. topo = "rectangular"),
  11. rlen = 100)
  12. # Using the ggsom package
  13. ggsom::geom_class(iris_som, class = iris$Species,
  14. x_o = 1, y_o = 5.8, x_e = 1.1, y_e = 7.4)

In the upper left corner of each neuron is shown the number of observations associated with each neuron (N) and its respective purity measured by entropy (E).

Example of customization

  1. library(cowplot) # themes ggplot2
  2. theme_set(theme_cowplot())
  3. # Using the ggsom package
  4. ggsom::geom_class(iris_som, class = iris$Species,
  5. x_o = 1, y_o = 5.8, x_e = 1.1, y_e = 7.4) +
  6. labs(x = "Attributes", y = "Values", title = "ggsom plot",
  7. caption = "Source: Felipe") +
  8. scale_color_manual(name = "Classes",
  9. labels = c("setosa", "versicolor", "virginica"),
  10. values = c("#ffd319", "#005500", "#ff0000")) +
  11. background_grid(minor = 'none') +
  12. panel_border()

Time series example

In this example we will use data from earth surface climate change from the kaggle platform. To define the continents of each country this database was used.

  1. library(readr) # read rectangular data
  2. library(dplyr) # data manipulation
  3. library(tidyr) # functions to transform data to tidy
  4. library(cowplot) # themes ggplot2
  5. theme_set(theme_cowplot())
  6. # Reading of temperature data
  7. temperature_countries <- readr::read_csv("./example/GlobalLandTemperaturesByCountry.csv")
  8. # Reading and selection of continent data
  9. continent <- readr::read_csv("./example/countryContinent.csv") %>%
  10. dplyr::select(country, continent)
  11. # Filter from year 2000 and aggregation by annual mean (not good approach)
  12. year_temperature <- temperature_countries %>%
  13. dplyr::group_by(Country) %>%
  14. dplyr::filter(dt > "2000-01-01") %>%
  15. dplyr::mutate(dt = lubridate::year(dt)) %>%
  16. dplyr::group_by(Country, dt) %>%
  17. dplyr::summarise(year_mean = mean(AverageTemperature))
  18. # Joining the continents by the name of the countries
  19. final_dataset <- year_temperature %>%
  20. dplyr::rename(country = Country) %>%
  21. dplyr::left_join(continent, by="country") %>%
  22. dplyr::filter(!is.na(continent)) %>%
  23. tidyr::pivot_wider(names_from = dt, values_from=year_mean) %>%
  24. dplyr::select(-`2013`)
  25. # Write the final dataset on ext/inst (you can use it directly)
  26. write.csv(final_dataset, "./inst/extdata/climate_changes_annual.csv")
  27. # Transforming into a matrix
  28. matrix_temperature <- final_dataset %>% dplyr::ungroup() %>%
  29. dplyr::select(-country, -continent) %>% as.matrix()
  30. # Creating a SOM network
  31. som_temperature <- kohonen::som(X = matrix_temperature,
  32. grid = kohonen::somgrid(xdim = 6,
  33. ydim = 6,
  34. neighbourhood.fct = "gaussian",
  35. topo = "rectangular"),
  36. rlen = 1000)
  37. # Using the ggsom tool
  38. ggsom::geom_class(som_obj, class = final_dataset$continent,
  39. x_o = 2.8, y_o = 1.3, x_e = 2.8, y_e = 7.4) +
  40. labs(x = "Year", y= "Temperature (C°)", title = "ggsom plot") +
  41. scale_color_manual(name = "Continents",
  42. labels = c("Africa", "Americas", "Asia", "Europe", "Oceania"),
  43. values = c("#7fc97f", "#beaed4", "#fdc086", "#ffff99", "#386cb0")) +
  44. background_grid(minor = 'none')

Acknowledgments

  • Rafael Santos

References

  • Kohonen, T. (1982). Self-organized formation of topologically correct feature maps. Biological cybernetics, 43(1):59–69.
  • Kohonen, T. (2013). Essentials of the self-organizing map. Neural Networks, 37:52 – 65.
  • Wehrens, R., Buydens, L. M., et al. (2007). Self-and super-organizing maps in r: the kohonen package. Journal of Statistical Software, 21(5):1–19.
  • Wickham, H. (2011). ggplot2. Wiley Interdisciplinary Reviews: Computational Statistics, 3(2):180–185.