Recently, reading an article by Megan McArdle about income inequality, she speculated about the idea that the share of income of the 1% gets worse during a recession. She posted a graph:

I wasn't a fan of the graph. The ticks distracted from the data being presented, and recessions were not highlighted on the graph, as they are on graphs from places like FRED. Fortunately the data from the graph is available and we can make a run of it using R and ggplot.
Data:
-Inequality data available from Saez's website: Data updated through 2008 (.xls) Look for the sheet "data-Fig2" for the top 1% income including capital gains from 1913-2008
-Recession data from FRED using code USREC, although a list that can be copy-pasted into R/SAS is available at the FRED FAQ page
The inspiration for this graph comes from a post about using R+ggplot to show recession bars and silver prices. I thought the addition of recession bars and elimination of chartjunk would be nice.
Code is here, it includes all the data in a self contained package and will output a nice anti-aliased PNG using ggsave:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 | #load dependencies require(ggplot2) #grab recession data and put it into a dataframe recessions.df recessions.df = read.table(textConnection( "Peak, Trough 1857-06-01, 1858-12-01 1860-10-01, 1861-06-01 1865-04-01, 1867-12-01 1869-06-01, 1870-12-01 1873-10-01, 1879-03-01 1882-03-01, 1885-05-01 1887-03-01, 1888-04-01 1890-07-01, 1891-05-01 1893-01-01, 1894-06-01 1895-12-01, 1897-06-01 1899-06-01, 1900-12-01 1902-09-01, 1904-08-01 1907-05-01, 1908-06-01 1910-01-01, 1912-01-01 1913-01-01, 1914-12-01 1918-08-01, 1919-03-01 1920-01-01, 1921-07-01 1923-05-01, 1924-07-01 1926-10-01, 1927-11-01 1929-08-01, 1933-03-01 1937-05-01, 1938-06-01 1945-02-01, 1945-10-01 1948-11-01, 1949-10-01 1953-07-01, 1954-05-01 1957-08-01, 1958-04-01 1960-04-01, 1961-02-01 1969-12-01, 1970-11-01 1973-11-01, 1975-03-01 1980-01-01, 1980-07-01 1981-07-01, 1982-11-01 1990-07-01, 1991-03-01 2001-03-01, 2001-11-01 2007-12-01, 2009-06-01"), sep=',', colClasses=c('Date', 'Date'), header=TRUE) #grab inequality data and put it into a dataframe inequality.df inequality.df = read.table(textConnection( "year,percent 1913,0.18 1914,0.182 1915,0.176 1916,0.193 1917,0.177 1918,0.16 1919,0.164 1920,0.148 1921,0.156 1922,0.171 1923,0.156 1924,0.174 1925,0.202 1926,0.199 1927,0.21 1928,0.239 1929,0.224 1930,0.172 1931,0.155 1932,0.156 1933,0.165 1934,0.164 1935,0.167 1936,0.193 1937,0.171 1938,0.158 1939,0.162 1940,0.165 1941,0.158 1942,0.134 1943,0.123 1944,0.113 1945,0.125 1946,0.133 1947,0.12 1948,0.122 1949,0.117 1950,0.128 1951,0.118 1952,0.108 1953,0.099 1954,0.108 1955,0.111 1956,0.107 1957,0.102 1958,0.102 1959,0.106 1960,0.1 1961,0.106 1962,0.099 1963,0.099 1964,0.105 1965,0.109 1966,0.102 1967,0.107 1968,0.112 1969,0.104 1970,0.09 1971,0.094 1972,0.096 1973,0.092 1974,0.091 1975,0.089 1976,0.089 1977,0.09 1978,0.09 1979,0.1 1980,0.1 1981,0.1 1982,0.108 1983,0.116 1984,0.12 1985,0.127 1986,0.159 1987,0.127 1988,0.155 1989,0.145 1990,0.143 1991,0.134 1992,0.147 1993,0.142 1994,0.142 1995,0.152 1996,0.167 1997,0.18 1998,0.191 1999,0.2 2000,0.215 2001,0.182 2002,0.169 2003,0.175 2004,0.198 2005,0.219 2006,0.228 2007,0.235 2008,0.209"), sep=',', colClasses=c('numeric', 'numeric'), header=TRUE) #trim recessions to meet minimum date of data recessions.trim <- subset(recessions.df, format(Peak,format="%Y") >= min(inequality.df$year)) #build a ggplot object with the inequality data, Note: X axis is "date", not "continuous g <- ggplot(inequality.df) + geom_line(aes(x=as.Date(strptime(year, format = "%Y")), y=percent)) +theme_grey() g <- g + scale_x_date('Year',format="%Y", major="5 years", minor="1 year") g <- g + scale_y_continuous('Percent of total income',formatter='percent') #add recession boxes g = g + geom_rect(data=recessions.trim, aes(xmin=Peak, xmax=Trough, ymin=-Inf, ymax=+Inf), fill='blue', alpha=0.2) #add title g <- g+ opts(title="Income share of top 1%\n1913-2008") #print graph g ggsave(file='output.png') |
Some lessons learned from this project mainly involve the grammar of ggplot2, but more importantly how to use data types and axis scaling. Originally, I did this graph at 6am on Friday morning. As a shortcut, I simply collapsed recession data into yearly data so that I had two vectors with numeric data. In this more well-thought-out version, I use scale_x_date() instead of scale_x_continuous(). This allowed me to control axis formatting better, at the expense of requiring me to figure out how to convert both %Y and %Y-%M dates to POSIX dates using strptime(year, format = "%Y"). as.Date() and other functions would not allow a date without month/day.
In the end, instead of fussing with PNG graphics devices, which lack anti-aliasing under Windows, I used ggplot's built-in function ggsave(), and was very happy with the result:
