Recently, reading an article by Megan McArdle about income inequality, she speculated about the idea that the share of income of the 1% gets worse during a recession. She posted a graph:
Income share of the top 1% in the US from 1913
I wasn't a fan of the graph. The ticks distracted from the data being presented, and recessions were not highlighted on the graph, as they are on graphs from places like FRED. Fortunately the data from the graph is available and we can make a run of it using R and ggplot.

Data:

-Inequality data available from Saez's website: Data updated through 2008 (.xls)  Look for the sheet "data-Fig2" for the top 1% income including capital gains from 1913-2008
-Recession data from FRED using code USREC, although a list that can be copy-pasted into R/SAS is available at the FRED FAQ page

The inspiration for this graph comes from a post about using R+ggplot to show recession bars and silver prices.  I thought the addition of recession bars and elimination of chartjunk would be nice.

Code is here, it includes all the data in a self contained package and will output a nice anti-aliased PNG using ggsave:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
#load dependencies
require(ggplot2)
 
 
#grab recession data and put it into a dataframe recessions.df
recessions.df = read.table(textConnection(
  "Peak, Trough
1857-06-01, 1858-12-01
  1860-10-01, 1861-06-01
  1865-04-01, 1867-12-01
  1869-06-01, 1870-12-01
  1873-10-01, 1879-03-01
  1882-03-01, 1885-05-01
  1887-03-01, 1888-04-01
  1890-07-01, 1891-05-01
  1893-01-01, 1894-06-01
  1895-12-01, 1897-06-01
  1899-06-01, 1900-12-01
  1902-09-01, 1904-08-01
  1907-05-01, 1908-06-01
  1910-01-01, 1912-01-01
  1913-01-01, 1914-12-01
  1918-08-01, 1919-03-01
  1920-01-01, 1921-07-01
  1923-05-01, 1924-07-01
  1926-10-01, 1927-11-01
  1929-08-01, 1933-03-01
  1937-05-01, 1938-06-01
  1945-02-01, 1945-10-01
  1948-11-01, 1949-10-01
  1953-07-01, 1954-05-01
  1957-08-01, 1958-04-01
  1960-04-01, 1961-02-01
  1969-12-01, 1970-11-01
  1973-11-01, 1975-03-01
  1980-01-01, 1980-07-01
  1981-07-01, 1982-11-01
  1990-07-01, 1991-03-01
  2001-03-01, 2001-11-01
  2007-12-01, 2009-06-01"), sep=',',
colClasses=c('Date', 'Date'), header=TRUE)
 
#grab inequality data and put it into a dataframe inequality.df
inequality.df = read.table(textConnection(
  "year,percent
1913,0.18
1914,0.182
1915,0.176
1916,0.193
1917,0.177
1918,0.16
1919,0.164
1920,0.148
1921,0.156
1922,0.171
1923,0.156
1924,0.174
1925,0.202
1926,0.199
1927,0.21
1928,0.239
1929,0.224
1930,0.172
1931,0.155
1932,0.156
1933,0.165
1934,0.164
1935,0.167
1936,0.193
1937,0.171
1938,0.158
1939,0.162
1940,0.165
1941,0.158
1942,0.134
1943,0.123
1944,0.113
1945,0.125
1946,0.133
1947,0.12
1948,0.122
1949,0.117
1950,0.128
1951,0.118
1952,0.108
1953,0.099
1954,0.108
1955,0.111
1956,0.107
1957,0.102
1958,0.102
1959,0.106
1960,0.1
1961,0.106
1962,0.099
1963,0.099
1964,0.105
1965,0.109
1966,0.102
1967,0.107
1968,0.112
1969,0.104
1970,0.09
1971,0.094
1972,0.096
1973,0.092
1974,0.091
1975,0.089
1976,0.089
1977,0.09
1978,0.09
1979,0.1
1980,0.1
1981,0.1
1982,0.108
1983,0.116
1984,0.12
1985,0.127
1986,0.159
1987,0.127
1988,0.155
1989,0.145
1990,0.143
1991,0.134
1992,0.147
1993,0.142
1994,0.142
1995,0.152
1996,0.167
1997,0.18
1998,0.191
1999,0.2
2000,0.215
2001,0.182
2002,0.169
2003,0.175
2004,0.198
2005,0.219
2006,0.228
2007,0.235
2008,0.209"), sep=',', colClasses=c('numeric', 'numeric'), header=TRUE)
 
#trim recessions to meet minimum date of data
recessions.trim <- subset(recessions.df, format(Peak,format="%Y") >= min(inequality.df$year))
 
#build a ggplot object with the inequality data, Note: X axis is "date", not "continuous
g <- ggplot(inequality.df) + geom_line(aes(x=as.Date(strptime(year, format = "%Y")), y=percent)) +theme_grey()
g <- g + scale_x_date('Year',format="%Y", major="5 years", minor="1 year")
g <- g + scale_y_continuous('Percent of total income',formatter='percent')
 
#add recession boxes
g = g + geom_rect(data=recessions.trim, aes(xmin=Peak, xmax=Trough, ymin=-Inf, ymax=+Inf), fill='blue', alpha=0.2)
 
#add title
g <- g+ opts(title="Income share of top 1%\n1913-2008")
#print graph
g
ggsave(file='output.png')

Some lessons learned from this project mainly involve the grammar of ggplot2, but more importantly how to use data types and axis scaling. Originally, I did this graph at 6am on Friday morning. As a shortcut, I simply collapsed recession data into yearly data so that I had two vectors with numeric data. In this more well-thought-out version, I use scale_x_date() instead of scale_x_continuous(). This allowed me to control axis formatting better, at the expense of requiring me to figure out how to convert both %Y and %Y-%M dates to POSIX dates using strptime(year, format = "%Y"). as.Date() and other functions would not allow a date without month/day.

In the end, instead of fussing with PNG graphics devices, which lack anti-aliasing under Windows, I used ggplot's built-in function ggsave(), and was very happy with the result:

Final graph of income share of the top 1%

Final graph of income share of the top 1% with recession bars