项目作者: btskinner

项目描述 :
Rename and encode variables using external crosswalk files
高级语言: R
项目地址: git://github.com/btskinner/crosswalkr.git
创建时间: 2017-10-14T22:46:47Z
项目社区:https://github.com/btskinner/crosswalkr

开源协议:Other

下载


" class="reference-link">crosswalkr

R build
status
GitHub
release
CRAN\_Status\_Badge

Overview

This package offers a pair of functions, renamefrom() and
encodefrom(), for renaming and encoding data frames using external
crosswalk files. It is especially useful when constructing master data
sets from multiple smaller data sets that do not name or encode
variables consistently across files. Based on renamefrom and
encodefrom Stata commands written by Sally Hudson and
team
.

Installation

Install the latest release version from CRAN with

  1. install.packages('crosswalkr')

Install the latest development version from Github with

  1. devtools::install_github('btskinner/crosswalkr')

Usage

  1. library(crosswalkr)
  2. library(dplyr)
  3. library(haven)
  4. ## starting data frame
  5. df <- data.frame(state = c('Kentucky','Tennessee','Virginia'),
  6. fips = c(21,47,51),
  7. region = c('South','South','South'))
  8. df
  9. ## state fips region
  10. ## 1 Kentucky 21 South
  11. ## 2 Tennessee 47 South
  12. ## 3 Virginia 51 South
  13. ## crosswalk with which to convert old names to new names with labels
  14. cw <- data.frame(old_name = c('state','fips'),
  15. new_name = c('stname','stfips'),
  16. label = c('Full state name', 'FIPS code'))
  17. cw
  18. ## old_name new_name label
  19. ## 1 state stname Full state name
  20. ## 2 fips stfips FIPS code

Renaming

Convert old variable names to new names and add labels from crosswalk.

  1. df1 <- renamefrom(df, cw_file = cw, raw = old_name, clean = new_name, label = label)
  2. df1
  3. ## stname stfips
  4. ## 1 Kentucky 21
  5. ## 2 Tennessee 47
  6. ## 3 Virginia 51

Convert old variable names to new names using old names as labels
(ignoring labels in crosswalk).

  1. df2 <- renamefrom(df, cw_file = cw, raw = old_name, clean = new_name, name_label = TRUE)
  2. df2
  3. ## stname stfips
  4. ## 1 Kentucky 21
  5. ## 2 Tennessee 47
  6. ## 3 Virginia 51

Convert old variable names to new names, but keep unmatched old names in
the data frame.

  1. df3 <- renamefrom(df, cw_file = cw, raw = old_name, clean = new_name, drop_extra = FALSE)
  2. df3
  3. ## stname stfips region
  4. ## 1 Kentucky 21 South
  5. ## 2 Tennessee 47 South
  6. ## 3 Virginia 51 South

Encoding

  1. ## starting data frame
  2. df <- data.frame(state = c('Kentucky','Tennessee','Virginia'),
  3. stfips = c(21,47,51),
  4. cenregnm = c('South','South','South'))
  5. df
  6. ## state stfips cenregnm
  7. ## 1 Kentucky 21 South
  8. ## 2 Tennessee 47 South
  9. ## 3 Virginia 51 South
  10. ## use state crosswalk data file from package
  11. cw <- get(data(stcrosswalk))
  12. cw
  13. ## # A tibble: 51 x 7
  14. ## stfips stabbr stname cenreg cenregnm cendiv cendivnm
  15. ## <int> <chr> <chr> <int> <chr> <int> <chr>
  16. ## 1 1 AL Alabama 3 South 6 East South Central
  17. ## 2 2 AK Alaska 4 West 9 Pacific
  18. ## 3 4 AZ Arizona 4 West 8 Mountain
  19. ## 4 5 AR Arkansas 3 South 7 West South Central
  20. ## 5 6 CA California 4 West 9 Pacific
  21. ## 6 8 CO Colorado 4 West 8 Mountain
  22. ## 7 9 CT Connecticut 1 Northeast 1 New England
  23. ## 8 10 DE Delaware 3 South 5 South Atlantic
  24. ## 9 11 DC District of Columbia 3 South 5 South Atlantic
  25. ## 10 12 FL Florida 3 South 5 South Atlantic
  26. ## # … with 41 more rows

Create a new column with factor-encoded values

  1. df$state2 <- encodefrom(df, var = state, cw_file = cw, raw = stname, clean = stfips, label = stabbr)
  2. df
  3. ## state stfips cenregnm state2
  4. ## 1 Kentucky 21 South KY
  5. ## 2 Tennessee 47 South TN
  6. ## 3 Virginia 51 South VA

Create a new column with labelled values.

  1. ## convert to tbl_df
  2. df <- tibble::as_tibble(df)
  3. df$state3 <- encodefrom(df, var = state, cw_file = cw, raw = stname, clean = stfips, label = stabbr)

Create new column with factor-encoded values (ignores the fact that df
is a tibble)

  1. df$state4 <- encodefrom(df, var = state, cw_file = cw, raw = stname, clean = stfips, label = stabbr, ignore_tibble = TRUE)

Show factors with labels:

  1. as_factor(df)
  2. ## # A tibble: 3 x 6
  3. ## state stfips cenregnm state2 state3 state4
  4. ## <chr> <dbl> <chr> <fct> <fct> <fct>
  5. ## 1 Kentucky 21 South KY KY KY
  6. ## 2 Tennessee 47 South TN TN TN
  7. ## 3 Virginia 51 South VA VA VA

Show factors without labels:

  1. zap_labels(df)
  2. ## # A tibble: 3 x 6
  3. ## state stfips cenregnm state2 state3 state4
  4. ## <chr> <dbl> <chr> <fct> <int> <fct>
  5. ## 1 Kentucky 21 South KY 21 KY
  6. ## 2 Tennessee 47 South TN 47 TN
  7. ## 3 Virginia 51 South VA 51 VA