purrr Like a Kitten till the Lake Pipes RoaR

I really should make a minimal effort to resist opening a data analysis blog post with Beach Boys’ lyrics, but this time the combination is too apt. We use the purrr package to show how to let your pipes roar in R.

The tidyverse GitHub site contains a simple example illustrating how well pipes and purrr work together. For more learning, try Jenny Bryan’s purrr tutorial.

First, load the tidyverse and the purrr package.

library(tidyverse)
library(purrr)
#devtools::install_github("jennybc/repurrrsive")
#library(repurrrsive)


If you want to be more adventurous, you can download Jenny’s repurrrsive package from GitHub. The code is hashed out in the chunk above.

You Don’t Know What I Got

The great thing about using pipes is you can tie together a series of steps to produce a single output of exactly what you want. In this case, we are going to start with the base R dataset airquality and apply a number of functions to come up with the adjusted R-squared value for the Ozone data for each month between May and September. Along the way we will run a linear regression on the data to generate the adjusted R-squared values.

Here is what the full process looks like, from beginning to end.

airquality %>% 
         split(.$Month) %>% 
         map(~ lm(Ozone ~ Temp, data = .)) %>% 
         map(summary) %>% 
         map_dbl('adj.r.squared')

     5      6      7      8      9 
0.2781 0.3676 0.5024 0.3307 0.6742

The problem, of course, is the output generated by the intermediate steps stays behind the scenes. For a beginner, this can be a bit confusing because it isn’t clear what is going on. So let’s break the full chunk into its constituent pieces so we can see what purrr is doing and how pipes tie the whole thing together.

Start at the beginning and take things step-by-step.

Run the airquality data set to see what we are dealing with. We can see the data is collected daily for five months. Ozone looks to be the target, so it is natural to wonder if there is any relationship between it and the other variables.

airquality

    Ozone Solar.R Wind Temp Month Day
1      41     190  7.4   67     5   1
2      36     118  8.0   72     5   2
3      12     149 12.6   74     5   3
4      18     313 11.5   62     5   4
5      NA      NA 14.3   56     5   5
6      28      NA 14.9   66     5   6
7      23     299  8.6   65     5   7
8      19      99 13.8   59     5   8
9       8      19 20.1   61     5   9
10     NA     194  8.6   69     5  10
11      7      NA  6.9   74     5  11
12     16     256  9.7   69     5  12
13     11     290  9.2   66     5  13
14     14     274 10.9   68     5  14
15     18      65 13.2   58     5  15
16     14     334 11.5   64     5  16
17     34     307 12.0   66     5  17
18      6      78 18.4   57     5  18
19     30     322 11.5   68     5  19
20     11      44  9.7   62     5  20
21      1       8  9.7   59     5  21
22     11     320 16.6   73     5  22
23      4      25  9.7   61     5  23
24     32      92 12.0   61     5  24
25     NA      66 16.6   57     5  25
26     NA     266 14.9   58     5  26
27     NA      NA  8.0   57     5  27
28     23      13 12.0   67     5  28
29     45     252 14.9   81     5  29
30    115     223  5.7   79     5  30
31     37     279  7.4   76     5  31
32     NA     286  8.6   78     6   1
33     NA     287  9.7   74     6   2
34     NA     242 16.1   67     6   3
35     NA     186  9.2   84     6   4
36     NA     220  8.6   85     6   5
37     NA     264 14.3   79     6   6
38     29     127  9.7   82     6   7
39     NA     273  6.9   87     6   8
40     71     291 13.8   90     6   9
41     39     323 11.5   87     6  10
42     NA     259 10.9   93     6  11
43     NA     250  9.2   92     6  12
44     23     148  8.0   82     6  13
45     NA     332 13.8   80     6  14
46     NA     322 11.5   79     6  15
47     21     191 14.9   77     6  16
48     37     284 20.7   72     6  17
49     20      37  9.2   65     6  18
50     12     120 11.5   73     6  19
51     13     137 10.3   76     6  20
52     NA     150  6.3   77     6  21
53     NA      59  1.7   76     6  22
54     NA      91  4.6   76     6  23
55     NA     250  6.3   76     6  24
56     NA     135  8.0   75     6  25
57     NA     127  8.0   78     6  26
58     NA      47 10.3   73     6  27
59     NA      98 11.5   80     6  28
60     NA      31 14.9   77     6  29
61     NA     138  8.0   83     6  30
62    135     269  4.1   84     7   1
63     49     248  9.2   85     7   2
64     32     236  9.2   81     7   3
65     NA     101 10.9   84     7   4
66     64     175  4.6   83     7   5
67     40     314 10.9   83     7   6
68     77     276  5.1   88     7   7
69     97     267  6.3   92     7   8
70     97     272  5.7   92     7   9
71     85     175  7.4   89     7  10
72     NA     139  8.6   82     7  11
73     10     264 14.3   73     7  12
74     27     175 14.9   81     7  13
75     NA     291 14.9   91     7  14
76      7      48 14.3   80     7  15
77     48     260  6.9   81     7  16
78     35     274 10.3   82     7  17
79     61     285  6.3   84     7  18
80     79     187  5.1   87     7  19
81     63     220 11.5   85     7  20
82     16       7  6.9   74     7  21
83     NA     258  9.7   81     7  22
84     NA     295 11.5   82     7  23
85     80     294  8.6   86     7  24
86    108     223  8.0   85     7  25
87     20      81  8.6   82     7  26
88     52      82 12.0   86     7  27
89     82     213  7.4   88     7  28
90     50     275  7.4   86     7  29
91     64     253  7.4   83     7  30
92     59     254  9.2   81     7  31
93     39      83  6.9   81     8   1
94      9      24 13.8   81     8   2
95     16      77  7.4   82     8   3
96     78      NA  6.9   86     8   4
97     35      NA  7.4   85     8   5
98     66      NA  4.6   87     8   6
99    122     255  4.0   89     8   7
100    89     229 10.3   90     8   8
101   110     207  8.0   90     8   9
102    NA     222  8.6   92     8  10
103    NA     137 11.5   86     8  11
104    44     192 11.5   86     8  12
105    28     273 11.5   82     8  13
106    65     157  9.7   80     8  14
107    NA      64 11.5   79     8  15
108    22      71 10.3   77     8  16
109    59      51  6.3   79     8  17
110    23     115  7.4   76     8  18
111    31     244 10.9   78     8  19
112    44     190 10.3   78     8  20
113    21     259 15.5   77     8  21
114     9      36 14.3   72     8  22
115    NA     255 12.6   75     8  23
116    45     212  9.7   79     8  24
117   168     238  3.4   81     8  25
118    73     215  8.0   86     8  26
119    NA     153  5.7   88     8  27
120    76     203  9.7   97     8  28
121   118     225  2.3   94     8  29
122    84     237  6.3   96     8  30
123    85     188  6.3   94     8  31
124    96     167  6.9   91     9   1
125    78     197  5.1   92     9   2
126    73     183  2.8   93     9   3
127    91     189  4.6   93     9   4
128    47      95  7.4   87     9   5
129    32      92 15.5   84     9   6
130    20     252 10.9   80     9   7
131    23     220 10.3   78     9   8
132    21     230 10.9   75     9   9
133    24     259  9.7   73     9  10
134    44     236 14.9   81     9  11
135    21     259 15.5   76     9  12
136    28     238  6.3   77     9  13
137     9      24 10.9   71     9  14
138    13     112 11.5   71     9  15
139    46     237  6.9   78     9  16
140    18     224 13.8   67     9  17
141    13      27 10.3   76     9  18
142    24     238 10.3   68     9  19
143    16     201  8.0   82     9  20
144    13     238 12.6   64     9  21
145    23      14  9.2   71     9  22
146    36     139 10.3   81     9  23
147     7      49 10.3   69     9  24
148    14      20 16.6   63     9  25
149    30     193  6.9   70     9  26
150    NA     145 13.2   77     9  27
151    14     191 14.3   75     9  28
152    18     131  8.0   76     9  29
153    20     223 11.5   68     9  30

The first step is to break the data up by month, so we make use of base R‘s split() function. Notice all the data is grouped by month.

airquality %>% 
    split(.$Month)

$`5`
   Ozone Solar.R Wind Temp Month Day
1     41     190  7.4   67     5   1
2     36     118  8.0   72     5   2
3     12     149 12.6   74     5   3
4     18     313 11.5   62     5   4
5     NA      NA 14.3   56     5   5
6     28      NA 14.9   66     5   6
7     23     299  8.6   65     5   7
8     19      99 13.8   59     5   8
9      8      19 20.1   61     5   9
10    NA     194  8.6   69     5  10
11     7      NA  6.9   74     5  11
12    16     256  9.7   69     5  12
13    11     290  9.2   66     5  13
14    14     274 10.9   68     5  14
15    18      65 13.2   58     5  15
16    14     334 11.5   64     5  16
17    34     307 12.0   66     5  17
18     6      78 18.4   57     5  18
19    30     322 11.5   68     5  19
20    11      44  9.7   62     5  20
21     1       8  9.7   59     5  21
22    11     320 16.6   73     5  22
23     4      25  9.7   61     5  23
24    32      92 12.0   61     5  24
25    NA      66 16.6   57     5  25
26    NA     266 14.9   58     5  26
27    NA      NA  8.0   57     5  27
28    23      13 12.0   67     5  28
29    45     252 14.9   81     5  29
30   115     223  5.7   79     5  30
31    37     279  7.4   76     5  31

$`6`
   Ozone Solar.R Wind Temp Month Day
32    NA     286  8.6   78     6   1
33    NA     287  9.7   74     6   2
34    NA     242 16.1   67     6   3
35    NA     186  9.2   84     6   4
36    NA     220  8.6   85     6   5
37    NA     264 14.3   79     6   6
38    29     127  9.7   82     6   7
39    NA     273  6.9   87     6   8
40    71     291 13.8   90     6   9
41    39     323 11.5   87     6  10
42    NA     259 10.9   93     6  11
43    NA     250  9.2   92     6  12
44    23     148  8.0   82     6  13
45    NA     332 13.8   80     6  14
46    NA     322 11.5   79     6  15
47    21     191 14.9   77     6  16
48    37     284 20.7   72     6  17
49    20      37  9.2   65     6  18
50    12     120 11.5   73     6  19
51    13     137 10.3   76     6  20
52    NA     150  6.3   77     6  21
53    NA      59  1.7   76     6  22
54    NA      91  4.6   76     6  23
55    NA     250  6.3   76     6  24
56    NA     135  8.0   75     6  25
57    NA     127  8.0   78     6  26
58    NA      47 10.3   73     6  27
59    NA      98 11.5   80     6  28
60    NA      31 14.9   77     6  29
61    NA     138  8.0   83     6  30

$`7`
   Ozone Solar.R Wind Temp Month Day
62   135     269  4.1   84     7   1
63    49     248  9.2   85     7   2
64    32     236  9.2   81     7   3
65    NA     101 10.9   84     7   4
66    64     175  4.6   83     7   5
67    40     314 10.9   83     7   6
68    77     276  5.1   88     7   7
69    97     267  6.3   92     7   8
70    97     272  5.7   92     7   9
71    85     175  7.4   89     7  10
72    NA     139  8.6   82     7  11
73    10     264 14.3   73     7  12
74    27     175 14.9   81     7  13
75    NA     291 14.9   91     7  14
76     7      48 14.3   80     7  15
77    48     260  6.9   81     7  16
78    35     274 10.3   82     7  17
79    61     285  6.3   84     7  18
80    79     187  5.1   87     7  19
81    63     220 11.5   85     7  20
82    16       7  6.9   74     7  21
83    NA     258  9.7   81     7  22
84    NA     295 11.5   82     7  23
85    80     294  8.6   86     7  24
86   108     223  8.0   85     7  25
87    20      81  8.6   82     7  26
88    52      82 12.0   86     7  27
89    82     213  7.4   88     7  28
90    50     275  7.4   86     7  29
91    64     253  7.4   83     7  30
92    59     254  9.2   81     7  31

$`8`
    Ozone Solar.R Wind Temp Month Day
93     39      83  6.9   81     8   1
94      9      24 13.8   81     8   2
95     16      77  7.4   82     8   3
96     78      NA  6.9   86     8   4
97     35      NA  7.4   85     8   5
98     66      NA  4.6   87     8   6
99    122     255  4.0   89     8   7
100    89     229 10.3   90     8   8
101   110     207  8.0   90     8   9
102    NA     222  8.6   92     8  10
103    NA     137 11.5   86     8  11
104    44     192 11.5   86     8  12
105    28     273 11.5   82     8  13
106    65     157  9.7   80     8  14
107    NA      64 11.5   79     8  15
108    22      71 10.3   77     8  16
109    59      51  6.3   79     8  17
110    23     115  7.4   76     8  18
111    31     244 10.9   78     8  19
112    44     190 10.3   78     8  20
113    21     259 15.5   77     8  21
114     9      36 14.3   72     8  22
115    NA     255 12.6   75     8  23
116    45     212  9.7   79     8  24
117   168     238  3.4   81     8  25
118    73     215  8.0   86     8  26
119    NA     153  5.7   88     8  27
120    76     203  9.7   97     8  28
121   118     225  2.3   94     8  29
122    84     237  6.3   96     8  30
123    85     188  6.3   94     8  31

$`9`
    Ozone Solar.R Wind Temp Month Day
124    96     167  6.9   91     9   1
125    78     197  5.1   92     9   2
126    73     183  2.8   93     9   3
127    91     189  4.6   93     9   4
128    47      95  7.4   87     9   5
129    32      92 15.5   84     9   6
130    20     252 10.9   80     9   7
131    23     220 10.3   78     9   8
132    21     230 10.9   75     9   9
133    24     259  9.7   73     9  10
134    44     236 14.9   81     9  11
135    21     259 15.5   76     9  12
136    28     238  6.3   77     9  13
137     9      24 10.9   71     9  14
138    13     112 11.5   71     9  15
139    46     237  6.9   78     9  16
140    18     224 13.8   67     9  17
141    13      27 10.3   76     9  18
142    24     238 10.3   68     9  19
143    16     201  8.0   82     9  20
144    13     238 12.6   64     9  21
145    23      14  9.2   71     9  22
146    36     139 10.3   81     9  23
147     7      49 10.3   69     9  24
148    14      20 16.6   63     9  25
149    30     193  6.9   70     9  26
150    NA     145 13.2   77     9  27
151    14     191 14.3   75     9  28
152    18     131  8.0   76     9  29
153    20     223 11.5   68     9  30

In the second step, we apply the purrr map() function to the linear regression model we create with lm(). We want the adjusted R-squared value for each month for Ozone. I played around with the variables a bit to find which one illustrated the adjusted R-squared values best and settled on Temp, but you can choose any other besides Month and Day.

The map() command applies the lm() function to each monthy group and yields the typical output for each month. We now have five linear regression models, one for each month, but no adjusted R-squared values.

airquality %>% 
    split(.$Month) %>% 
    map(~ lm(Ozone ~ Temp, data = .))

$`5`

Call:
lm(formula = Ozone ~ Temp, data = .)

Coefficients:
(Intercept)         Temp  
    -102.16         1.88  


$`6`

Call:
lm(formula = Ozone ~ Temp, data = .)

Coefficients:
(Intercept)         Temp  
     -91.99         1.55  


$`7`

Call:
lm(formula = Ozone ~ Temp, data = .)

Coefficients:
(Intercept)         Temp  
    -372.92         5.15  


$`8`

Call:
lm(formula = Ozone ~ Temp, data = .)

Coefficients:
(Intercept)         Temp  
    -238.86         3.56  


$`9`

Call:
lm(formula = Ozone ~ Temp, data = .)

Coefficients:
(Intercept)         Temp  
    -149.35         2.35  

To generate the adjusted R-squared values, we need to map() the summary() command to each group. This is the third step. Again, the results are familiar. We have the typical summary of the linear model, one for each month.

airquality %>% 
    split(.$Month) %>% 
    map(~ lm(Ozone ~ Temp, data = .)) %>%
    map(summary)

$`5`

Call:
lm(formula = Ozone ~ Temp, data = .)

Residuals:
   Min     1Q Median     3Q    Max 
-30.32  -8.62  -2.41   5.32  68.26 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)   
(Intercept) -102.159     38.750   -2.64   0.0145 * 
Temp           1.885      0.578    3.26   0.0033 **
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 18.9 on 24 degrees of freedom
  (5 observations deleted due to missingness)
Multiple R-squared:  0.307, Adjusted R-squared:  0.278 
F-statistic: 10.6 on 1 and 24 DF,  p-value: 0.00331


$`6`

Call:
lm(formula = Ozone ~ Temp, data = .)

Residuals:
   Min     1Q Median     3Q    Max 
-12.99  -9.34  -6.31  11.08  23.27 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)  
(Intercept)  -91.991     51.312   -1.79    0.116  
Temp           1.552      0.653    2.38    0.049 *
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 14.5 on 7 degrees of freedom
  (21 observations deleted due to missingness)
Multiple R-squared:  0.447, Adjusted R-squared:  0.368 
F-statistic: 5.65 on 1 and 7 DF,  p-value: 0.0491


$`7`

Call:
lm(formula = Ozone ~ Temp, data = .)

Residuals:
   Min     1Q Median     3Q    Max 
-32.11 -14.52  -1.16   7.58  75.29 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  -372.92      84.45   -4.42  0.00018 ***
Temp            5.15       1.01    5.12    3e-05 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 22.3 on 24 degrees of freedom
  (5 observations deleted due to missingness)
Multiple R-squared:  0.522, Adjusted R-squared:  0.502 
F-statistic: 26.2 on 1 and 24 DF,  p-value: 3.05e-05


$`8`

Call:
lm(formula = Ozone ~ Temp, data = .)

Residuals:
   Min     1Q Median     3Q    Max 
-40.42 -17.65  -8.07   9.97 118.58 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)   
(Intercept) -238.861     82.023   -2.91   0.0076 **
Temp           3.559      0.974    3.65   0.0013 **
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 32.5 on 24 degrees of freedom
  (5 observations deleted due to missingness)
Multiple R-squared:  0.357, Adjusted R-squared:  0.331 
F-statistic: 13.4 on 1 and 24 DF,  p-value: 0.00126


$`9`

Call:
lm(formula = Ozone ~ Temp, data = .)

Residuals:
   Min     1Q Median     3Q    Max 
-27.45  -8.59  -3.69  11.04  31.39 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) -149.347     23.688   -6.30  9.5e-07 ***
Temp           2.351      0.306    7.68  2.9e-08 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 13.8 on 27 degrees of freedom
  (1 observation deleted due to missingness)
Multiple R-squared:  0.686, Adjusted R-squared:  0.674 
F-statistic: 58.9 on 1 and 27 DF,  p-value: 2.95e-08

She Blows em Outta the Water Like You Never Seen

Now things get interesting, because we can put it all together.

Step 4 involves using the specialized map_dbl() command to pull the adjusted R-squared values from each month’s linear model summary and output them in a single line.

How do we know the adjusted R-squared value is a double? Well, it looks like one since it consists of a floating decimal. But we could guess, too. If we were to use the map_int() command we would get an error that tells us the value is a double. If we guessed character we could use mapZ_chr. That would work but we would have the output in character form, which isn’t what we want.

So we can recognize its data form or we can figure it out through trial and error. Either way we end up at the same place, back where we started.

airquality %>% 
    split(.$Month) %>% 
    map(~ lm(Ozone ~ Temp, data = .)) %>% 
    map(summary) %>% 
    map_dbl('adj.r.squared')

     5      6      7      8      9 
0.2781 0.3676 0.5024 0.3307 0.6742

And if That Ain’t Enough to Make You Flip Your Lid

That’s one more thing I’ve got to rethink, daddy.

UPDATE, 3 May 2018

Fellow R blogger Chuck Powell suggests this neat twist on the theme, which yields a more complete statistical table, changing only the last line. I like it. Thanks Chuck!

airquality %>% 
    split(.$Month) %>% 
    map(~ lm(Ozone ~ Temp, data = .)) %>% 
    map(summary) %>% 
    map_dfr(~ broom::glance(.), .id = "Month")

  Month r.squared adj.r.squared sigma statistic   p.value df
1     5    0.3070        0.2781 18.88    10.632 3.315e-03  2
2     6    0.4467        0.3676 14.48     5.651 4.909e-02  2
3     7    0.5223        0.5024 22.32    26.241 3.048e-05  2
4     8    0.3575        0.3307 32.46    13.353 1.256e-03  2
5     9    0.6858        0.6742 13.78    58.942 2.945e-08  2

One thought on “purrr Like a Kitten till the Lake Pipes RoaR”

Comments are closed.