Angrist and Krueger (1991) データセットの整形

パッケージ sketching に含まれていますデータセット AK を整形するコードを関数としてまとめています。

以下の資料を引用参照しています。

  1. Angrist, J. D., & Keueger, A. B. (1991). Does Compulsory School Attendance Affect Schooling and Earnings? The Quarterly Journal of Economics, 106(4), 979–1014.
  2. https://r-packages.io/datasets/AK

始めにデータセット AK を確認します。

library(sketching)
data(AK)
dplyr::glimpse(AK)
Rows: 247,199
Columns: 42
$ EDUC     <int> 11, 12, 12, 12, 16, 12, 14, 9, 12, 17, 17, 16, 8, 10, 9, 12, …
$ LWKLYWGE <dbl> 5.023558, 5.061540, 5.378315, 5.178639, 6.378776, 4.997411, 5…
$ YR20     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
$ YR21     <dbl> 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
$ YR22     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0…
$ YR23     <dbl> 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0…
$ YR24     <dbl> 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0…
$ YR25     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
$ YR26     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0…
$ YR27     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
$ YR28     <dbl> 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 0, 1, 0…
$ QTR120   <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
$ QTR121   <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
$ QTR122   <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
$ QTR123   <dbl> 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
$ QTR124   <dbl> 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
$ QTR125   <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
$ QTR126   <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
$ QTR127   <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
$ QTR128   <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
$ QTR129   <dbl> 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1…
$ QTR220   <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
$ QTR221   <dbl> 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
$ QTR222   <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0…
$ QTR223   <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
$ QTR224   <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
$ QTR225   <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
$ QTR226   <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0…
$ QTR227   <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
$ QTR228   <dbl> 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
$ QTR229   <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
$ QTR320   <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
$ QTR321   <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
$ QTR322   <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
$ QTR323   <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0…
$ QTR324   <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
$ QTR325   <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
$ QTR326   <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
$ QTR327   <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
$ QTR328   <dbl> 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 1, 0…
$ QTR329   <dbl> 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
$ CNST     <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…

AK データセットを次のコードで EDUCLWKLYWGEyearquarterage の5列のデータフレームに整形します。

なお、全ての YR列 および QTR列 が 0 の行の yearquarterNA としています。置き換える場合は、AK[rrr,‘year’] <- NAAK[rrr,‘quarter’] <- NANA を変更してください。

library(dplyr)
function_reshape_angrist_and_krueger_1991_men_born_1920_1929_data <- function(rrr) {
  buf0 <- AK[rrr, grep("YR|QTR", colnames(AK))]
  if ((buf0[1, ] %>% sum()) == 0) {
    AK[rrr, "year"] <- NA
    AK[rrr, "quarter"] <- NA
  } else {
    buf1 <- buf0[, buf0[1, ] == 1, drop = F]
    if ((grepl("QTR", colnames(buf1)) %>% sum()) == 1) {
      AK[rrr, "year"] <- colnames(buf1) %>%
        gsub("QTR([0-9])([0-9]+)$", "\\2", .) %>%
        {
          as.numeric(.) + 1900
        } %>%
        na.omit()
      AK[rrr, "quarter"] <- colnames(buf1) %>%
        gsub("^QTR([0-9])([0-9]+)$", "\\1", .) %>%
        as.numeric(.) %>%
        na.omit()
    } else {
      AK[rrr, "year"] <- colnames(buf1) %>%
        gsub("^YR([0-9]+)$", "\\1", .) %>%
        {
          as.numeric(.) + 1900
        } %>%
        na.omit()
      AK[rrr, "quarter"] <- 4
    }
  }
  AK[rrr, "age"] <- (1970 - AK[rrr, "year"]) - round((AK[rrr, "quarter"] - 1) / 4, 1)
  # print(rrr)
  AK[rrr, c("EDUC", "LWKLYWGE", "year", "quarter", "age")]
}

行番号1から10までを整形します。

lapply(1:10, FUN = function(rrr) function_reshape_angrist_and_krueger_1991_men_born_1920_1929_data(rrr)) %>% Reduce(function(x, y) rbind(x, y), .)
   EDUC LWKLYWGE year quarter  age
1    11 5.023558 1929       3 40.5
2    12 5.061540 1929       1 41.0
3    12 5.378315 1928       3 41.5
4    12 5.178639 1923       4 46.2
5    16 6.378776 1924       1 46.0
6    12 4.997411 1923       1 47.0
7    14 5.273452 1921       2 48.8
8     9 5.061540 1928       2 41.8
9    12 6.827130 1922       3 47.5
10   17 5.835241 1925       2 44.8

行番号41から50までを整形します。

lapply(41:50, FUN = function(rrr) function_reshape_angrist_and_krueger_1991_men_born_1920_1929_data(rrr)) %>% Reduce(function(x, y) rbind(x, y), .)
   EDUC LWKLYWGE year quarter  age
41   18 5.969134 1921       2 48.8
42    9 5.232419 1921       3 48.5
43   12 6.196784   NA      NA   NA
44   18 6.113574 1921       1 49.0
45   11 5.061540 1927       1 43.0
46   12 5.337704 1927       2 42.8
47    9 4.554805 1921       1 49.0
48   11 4.934970 1927       4 42.2
49    5 4.826167 1921       3 48.5
50   13 5.422415 1924       3 45.5

行番号3961から3970までを整形します。

lapply(3961:3970, FUN = function(rrr) function_reshape_angrist_and_krueger_1991_men_born_1920_1929_data(rrr)) %>% Reduce(function(x, y) rbind(x, y), .)
     EDUC LWKLYWGE year quarter  age
3961    8 5.396253 1928       1 42.0
3962    7 4.634062 1921       1 49.0
3963   10 4.558491   NA      NA   NA
3964    8 4.270520 1925       1 45.0
3965    2 4.154431 1925       2 44.8
3966    1 2.083743 1923       4 46.2
3967    8 4.775905 1929       3 40.5
3968   11 4.824306 1922       4 47.2
3969   13 4.554805 1928       4 41.2
3970   11 6.224345 1926       2 43.8

行番号83001から83010までを整形します。

lapply(83001:83010, FUN = function(rrr) function_reshape_angrist_and_krueger_1991_men_born_1920_1929_data(rrr)) %>% Reduce(function(x, y) rbind(x, y), .)
      EDUC LWKLYWGE year quarter  age
83001    9 4.830518 1921       2 48.8
83002    8 4.707497 1925       3 44.5
83003   10 5.378315 1927       2 42.8
83004    8 4.424785 1921       4 48.2
83005    5 5.087951 1926       1 44.0
83006    1 4.249972   NA      NA   NA
83007   12 5.061540 1927       3 42.5
83008   12 5.776630 1926       4 43.2
83009    6 5.061540 1922       1 48.0
83010   10 4.249972 1925       3 44.5

行番号1、100000および200000を整形します。

lapply(c(1, 100000, 200000), FUN = function(rrr) function_reshape_angrist_and_krueger_1991_men_born_1920_1929_data(rrr)) %>% Reduce(function(x, y) rbind(x, y), .)
       EDUC LWKLYWGE year quarter  age
1        11 5.023558 1929       3 40.5
100000   12 5.073886 1928       2 41.8
200000   11 4.725042 1921       2 48.8

以上です。