Growth in the cross-section (of countries)

NOTE: The present notebook is coded in R. It relies heavily on the tidyverse ecosystem of packages. We load the tidyverse below as a prerequisite for the rest of the notebook - along with a few other libraries.

\(\rightarrow\) Don’t forget that code flows sequentially. A random chunk may not work if the previous have have not been executed.

library(tidyverse)    # Package for data wrangling
library(readxl)       # Package to import MS Excel files
library(latex2exp)    # Package for LaTeX expressions
library(quantmod)     # Package for stock data extraction
library(highcharter)  # Package for reactive plots
library(ggcorrplot)   # Package for correlation plots
library(ggrepel)      # Package for neat annotations
library(plm)          # Package for panel models
library(fixest)       # Other Package for panels
library(reactable)    # Package for neat tables
library(WDI)          # Package for World Bank data
library(broom)        # Package for neat regression output
library(tidyfit)      # Package for grouped regressions
library(httr)         # package to fetch data online
impute <- function(v, n = 6){     # Imputation function
  for(j in 1:n){
    ind <- which(is.na(v))
    if(length(ind)>0){
      if(ind[1]==1){ind <- ind[-1]}
      v[ind] <- v[ind-1]
    }
  }
  return(v)
}

The content of the notebook is heavily inspired from the book Advanced Macro-economics - An Easy Guide.

Introduction

Until now, we have sought to explain growth through particular lenses (human capital, technology) but have mostly failed to do so.

In this session/notebook, we are interested in a broader and much more data-centric approach. We seek to analyze a large cross-section of countries as well as many explanatory factors at the same time.

Foundations

Theory (sort of)

Suppose that in all generality, the production function is a generalization of the Cobb-Douglas (multiplicative) form:

\[Y=\prod_{n=1}^N X_n^{a_n}, \quad X_n >0, \quad a_n >0,\] so that we envision \(N\) different factors contributing to the economic output. Note that returns to scale are determined by \(\sum_{n=1}^Na_n\).
Taking the logarithm: \[\log(Y) = \sum_{n=1}^N a_n \log(X_n)\] and differentiating (w.r.t. time) yields

\[\frac{\dot{Y}}{Y} = \sum_{n=1}^N a_n \frac{\dot{X}}{X}.\]

This is a sound grounding for simple (linear) regression models which we’ll cover below.

Fetching & wrangling the data

Below, we extract a sufficient number (~10) of potential predictors of growth and impute a few points along the way (to increase sample size).

wb_raw <- WDI(                              # World Bank data
  indicator = c(
    "labor" = "SL.TLF.TOTL.IN",             # Labor force (# individuals)
    "savings_rate" = "NY.GDS.TOTL.ZS",      # Savings rate (% GDP)
    "inflation" = "FP.CPI.TOTL.ZG",         # Inflation rate
    "trade" = "NE.TRD.GNFS.ZS",             # Trade as % of GDP 
    "pop" = "SP.POP.TOTL",                  # Population
    "pop_growth" = "SP.POP.GROW",           # Population growth
    "capital_formation" = "NE.GDI.TOTL.ZS", # Gross capital formation (% GDP)
    "gdp_percap" = "NY.GDP.PCAP.CD",        # GDP per capita
    "RD_percap" = "GB.XPD.RSDV.GD.ZS",      # R&D per capita
    "educ_level" = "SE.SEC.CUAT.LO.ZS",     # % pop reachiing second. educ. level
    "educ_spending" = "SE.XPD.TOTL.GD.ZS",  # Education spending (%GDP)
    "nb_researchers" = "SP.POP.SCIE.RD.P6", # Nb researchers per million inhab.
    "debt" = "GC.DOD.TOTL.GD.ZS",           # Central gov. debt (% of GDP)
    "gdp" = "NY.GDP.MKTP.CD"                # Gross Domestic Product (GDP)
  ), 
  extra = TRUE,
  start = 1960,
  end = 2024) |>
  mutate(across(everything(), as.vector)) |>
  select(-status, -lending, -iso2c, -iso3c) |>  
  filter(region != "Aggregates", income != "Aggregates") |>
  arrange(country, year) |>
  group_by(country) |>
  mutate(across(everything(), impute)) |>
  mutate(gdp_growth = gdp_percap/dplyr::lag(gdp_percap) - 1, 
        .before = "region") |>   
  mutate(gdp_percap = log(lag(gdp_percap))) |>       # log-lag transformation
  ungroup() |>
  filter(lastupdated == max(lastupdated)) |>
  arrange(country, year) |>
  mutate(capital_percap = capital_formation / labor, .before = "region")

We make a few adjustments to the data, adding GDP per capita growth.

wb_growth <- wb_raw |> 
  filter(region != "Aggregates", income != "Aggregates") 

wb_growth |> tail(3)
country year lastupdated labor savings_rate inflation trade pop pop_growth capital_formation gdp_percap RD_percap educ_level educ_spending nb_researchers debt gdp gdp_growth capital_percap region capital longitude latitude income
Zimbabwe 2022 2025-10-07 6118687 5.591073 104.7052 64.76361 16069056 1.706209 14.610339 7.620973 NA 64.94 2.0504899 NA NA 32789657378 0.1833459 2.4e-06 Sub-Saharan Africa Harare 31.0672 -17.8312 Lower middle income
Zimbabwe 2023 2025-10-07 6232464 8.637877 104.7052 50.79496 16340822 1.677096 16.274040 7.676026 NA 64.94 0.3847713 NA NA 35231369343 0.0565964 2.6e-06 Sub-Saharan Africa Harare 31.0672 -17.8312 Lower middle income
Zimbabwe 2024 2025-10-07 6386440 -3.993908 104.7052 52.67301 16634373 1.780482 4.467751 7.884731 NA NA 0.3847713 NA NA 44187704410 0.2320813 7.0e-07 Sub-Saharan Africa Harare 31.0672 -17.8312 Lower middle income

First analyses

Let’s have a look at missing data, after imputation!.
In some cases, when no data exists, forward-filling is not possible.

wb_growth |> select() |> is.na() |> colMeans()
numeric(0)
vars <- c("gdp_percap", "savings_rate", "inflation", "trade", "capital_formation" , "pop_growth", "educ_level", "educ_spending")

Debt and R&D cost a lot of data depletion: we will remove them from the analysis.
Which countries are the most represented in the sample (with no missing point, after imputation)?

wb_growth |>
  select(all_of(c(vars, "country"))) |>
  na.omit() |>
  group_by(country) |>
  count(sort = T) |>
  head(13)
country n
Canada 54
Portugal 45
Spain 44
Korea, Rep. 43
Sweden 42
Ecuador 41
Italy 38
Chile 35
Israel 35
Ireland 34
Indonesia 32
Mexico 32
Czechia 31

\(\rightarrow\) Many large and developed countries do not make it to the top, mostly due to the R&D field, but also debt variables.

Next, let us look if there is colinearity among variables. Indeed, high correlations between independent variables are likely to perturb inference.

wb_growth |>
  select(all_of(vars)) |>
  #na.omit() |>
  cor(use = "pairwise.complete") |>
  ggcorrplot(lab = TRUE, digits = 1L) +
  scale_fill_viridis_c(alpha = 0.7) +
  theme(legend.position = "none")

Usually, a correlation of 0.5 (in absolute value) is considered already high. A value above 0.7 is prohibitive…
So here, it seems we are relatively fine (education and GDP per capita are close, though).

Let’s have a brief look at numbers; take inflation.
Are there outliers or false values?

wb_raw |>
  group_by(country) |>
  summarise(avg_inflation = mean(inflation, na.rm = T)) |>
  na.omit() |> head(9)
country avg_inflation
Afghanistan 4.862333
Albania 2.599840
Algeria 5.018634
Angola 22.210968
Antigua and Barbuda 2.430727
Argentina 86.293375
Armenia 3.251960
Aruba 2.672916
Australia 4.679567

Panel models

We now turn to an exploration of the concepts and variables seen and mentioned until today. Indeed, models are only worthwhile if they are able to explain (or predict) salient empirical properties of the economy.

Note

We follow here a panel approach from Economic Growth in a Cross Section of Countries, though we apply it to GDP per capita and not to raw GDP:

\[g_{t,i}=\textbf{X}_{t,i}\boldsymbol{\beta}+a \log(y_{t-1,i}) + e_{t,i},\]

where \(g_{t,i}\) is the growth rate of country \(i\) at date (year) \(t\) and \(y_{t,i}\) is GDP per capita. The matrix \(\textbf{X}_{t,i}\) embeds all variables of interest.

Baseline estimation

As a first attempt, we proceed with the traditional two-way fixed effect (TWFE) model, meaning that the equation is: \[g_{t,i}=\textbf{X}_{t,i}\boldsymbol{\beta}+a \log(y_{t-1,i}) + b_i + c_t + e_{t,i},\] where the errors have zero means and the \(b_i\) and \(c_t\) are dummy variables that code the company and year rows, respectively.

The {plm} package is likely the most used for panel models in R. We use it below (at first).

Importantly, we need a enough data per country because of the fixed effects that generate additional dummy columns. This reduces the number of countries to just 14.

plm_data <- wb_growth |> dplyr::select(all_of(c("country", "year", "gdp_growth", "pop", vars))) |> 
  na.omit() |> group_by(country) |> mutate(n = n()) |> filter(n > 21) |> select(-n) |> ungroup()
fit_two_way <- plm(formula = gdp_growth ~ . , 
                   data = plm_data,
                   model = "within", effect = "twoways", 
                   index = c("country", "year"))
# summary(fit_two_way) # PLM hints towards singularity, upon verification, this is false...

We also store the results for future use.

fit_two_way <- fit_two_way$coef |> data.frame() |> 
  rownames_to_column(var = "variable") |> 
  mutate(type = "two_way")
colnames(fit_two_way)[2] <- "estimate"

To detect which variables matter, we look at p-values: they indicate the probability to obtain a value as “extreme” as the one observed under the assumption that the coefficient is equal to zero (this hypothesis is called the null). Hence if a p-value is close to zero, it signals support for the assumption that there is a link (not necessarily causal) between the dependent and independent variables.

Here, the savings rate, R&D both have (mildly) significant positive coefficients…
trade and population growth have negative coefficients.

Mono effect models

Below, we test to see the results remain robust if we only consider one dimension of effects.

fit_indiv <- plm(formula = gdp_growth ~ . , 
                 data = plm_data,
                 model = "within", effect = "individual",
                 index = c("country", "year"))$coef |> data.frame() |> 
  rownames_to_column(var = "variable") |> 
  mutate(type = "indiv")
colnames(fit_indiv)[2] <- "estimate"
fit_time <- plm(formula = gdp_growth ~ . , 
                data = plm_data,
                model = "within", effect = "time",
                index = c("country", "year"))$coef |> data.frame() |> 
  rownames_to_column(var = "variable") |> 
  mutate(type = "time")
colnames(fit_time)[2] <- "estimate" 

Let’s see the differences between models.

fit_two_way |> 
  bind_rows(fit_indiv) |> 
  bind_rows(fit_time) |> 
  filter(variable %in% vars) |> select(variable, estimate, type) |>
  pivot_wider(names_from = type, values_from = estimate) 
variable two_way indiv time
gdp_percap -0.0085779 -0.0085779 -0.0085779
savings_rate 0.0039724 0.0039724 0.0039724
inflation -0.0002962 -0.0002962 -0.0002962
trade -0.0006141 -0.0006141 -0.0006141
capital_formation 0.0022876 0.0022876 0.0022876
pop_growth -0.0222256 -0.0222256 -0.0222256
educ_level -0.0002145 -0.0002145 -0.0002145
educ_spending -0.0078161 -0.0078161 -0.0078161

The values are identical. This should not be the case; there must be a problem here!
We thus repeat this exercise, but with another library, the {fixest} package.

fml_0 <- "gdp_growth ~ gdp_percap + savings_rate + inflation + trade + "
fml_0 <- paste(fml_0, "capital_formation + pop_growth + educ_level + educ_spending")
fml <- as.formula(paste(fml_0, "| year"))
fit_time <- ((feols(fml, data = plm_data)$coefficients)/sqrt(diag(feols(fml, data = plm_data)$cov.iid))) |> 
  data.frame() |> rownames_to_column(var = "variable") |> mutate(type = "time")
colnames(fit_time)[2] <- "statistic"

fml <- as.formula(paste(fml_0, "| country"))
fit_indiv <- ((feols(fml, data = plm_data)$coefficients)/sqrt(diag(feols(fml, data = plm_data)$cov.iid))) |> 
  data.frame() |> rownames_to_column(var = "variable") |> mutate(type = "indiv")
colnames(fit_indiv)[2] <- "statistic"

fml <- as.formula(paste(fml_0, "| country + year"))
fit_twoway <- ((feols(fml , data = plm_data)$coefficients)/sqrt(diag(feols(fml, data = plm_data)$cov.iid))) |> 
  data.frame() |> rownames_to_column(var = "variable") |> mutate(type = "two_way")
colnames(fit_twoway)[2] <- "statistic"

NOTE that we store test statistics here and not raw coefficients. The former carry more information…

fit_twoway |> 
  bind_rows(fit_time) |>
  bind_rows(fit_indiv) |>
  pivot_wider(names_from = type, values_from = statistic)
variable two_way time indiv
gdp_percap -0.6361210 -4.636578 -2.573712
savings_rate 5.1617455 3.516396 4.692063
inflation -4.5424615 -4.443799 -4.147527
trade -2.8608038 -0.198431 -1.078309
capital_formation 1.7947934 3.517437 4.001261
pop_growth -3.2777516 -4.395138 -2.000779
educ_level -0.6274451 1.053713 -1.228028
educ_spending -2.0060849 -1.457869 -3.083658

Ok, so now we do find some differences in test statistics. The good news is that signs are mostly consistent across most variables.
But there are a few conflicts, too. There can be of two types:

  • either the sign changes between the models;
  • or the significance level changes (e.g., if some models have t-stats above 2 in absolute values, while others do not).

These results suggest that population growth, inflation and the savings rate (to a slightly lower extent) are solid drivers of growth in wealth.

Grouping

Until now, we have run the models on all of the data, but we could also consider sub-groups, either geographical ones, or income clusters. We could use grouping (in {tidyverse}/{dplyr} parlance) below, but in fact, for lower income countries, we do not have enough data points to proceed with estimation…

Indeed, if we look at the full sample,

wb_growth |> group_by(income) |> count()
income n
High income 5460
Low income 1625
Lower middle income 3315
Not classified 65
Upper middle income 3510
# wb_growth |> group_by(region) |> count()

It may seem manageable; but if we impose non-missing points…
Then the story is a bit different.

wb_growth |> na.omit() |> group_by(income) |> count()
income n
High income 236
Low income 20
Lower middle income 64
Upper middle income 250
# wb_growth |> na.omit() |> group_by(region) |> count()

The sample size becomes too small. Indeed, recall that with fixed effects, when estimated with OLS, the number of columns increases by a lot (+ number of countries + number of years in the sample).

Let’s proceed with manual filters.

plm_rich <- wb_growth |> filter(income == "High income") |>
  dplyr::select(all_of(c("country", "year", "gdp_growth", vars))) |> 
  na.omit() |> group_by(country) |> mutate(n = n()) |> filter(n > 8) |> select(-n)
fit_rich <- ((feols(fml, data = plm_rich)$coefficients)/sqrt(diag(feols(fml, data = plm_rich)$cov.iid))) |> 
  data.frame() |> rownames_to_column(var = "variable") |> mutate(income = "rich")
colnames(fit_rich)[2] <- "statistic"

plm_mid <- wb_growth |> filter(income == "Upper middle income") |>
  dplyr::select(all_of(c("country", "year", "gdp_growth", vars))) |> 
  na.omit() |> group_by(country) |> mutate(n = n()) |> filter(n > 8) |> select(-n)
fit_mid <- ((feols(fml, data = plm_mid)$coefficients)/sqrt(diag(feols(fml, data = plm_mid)$cov.iid))) |> 
  data.frame() |> rownames_to_column(var = "variable") |> mutate(income = "middle")
colnames(fit_mid)[2] <- "statistic"

fit_rich |> bind_rows(fit_mid) |> pivot_wider(names_from = income, values_from = statistic)
variable rich middle
gdp_percap -2.3797800 -0.0406204
savings_rate 8.8445465 5.6322502
inflation -4.6128496 -2.0029601
trade -3.1693687 -0.3766862
capital_formation 1.5203099 -1.0265394
pop_growth -5.4839572 -2.1934949
educ_level -1.0358756 0.7645079
educ_spending 0.5128888 -1.8905733

Pretty consistent! Except perhaps for a few variables…

Predictive models

The above regression model seeks to link growth with contemporaneous variables. But we could also be interested in lagged predictors. To examine if some characteristics of nations imply future growth. Because most features are persistent (highly autocorrelated), we should not see too much difference. But the devil can be in the details… and in the lags!

# To be done/discussed in class => grouping!

Country-specific estimates

Until now, estimations were grouped: we used the information in the cross-section of firms (and chronology) to see the “global” effect of some variables on growth. But perhaps is this effect country-specific? (or even maybe time-dependent).

Below, we run the same regressions (in terms of ‘independent’ variables), but we will display results one variable at a time. The models are nevertheless multivariate.

fit_countries <- wb_growth |> 
  dplyr::select(all_of(c("country", "year", "gdp_growth", vars))) |> 
  na.omit() |>
  group_by(country) |>
  mutate(n = n()) |>
  filter(n > 20) |>
  select(-year) |>
  regress(gdp_growth ~ ., m("lm")) |>
  coef() 

Let’s have a look at a few variables.

First, population growth.

fit_countries |>
  filter(term == "pop_growth") |>
  ggplot(aes(x = estimate, y = reorder(country, estimate))) + geom_col() +
  theme_classic() + theme(axis.title.y = element_blank()) +
  xlab("estimate (population growth)") 

A majority of estimates are negative, with a handful of exceptions, including France.

What about capital formation?

fit_countries |>
  filter(term == "capital_formation") |>
  ggplot(aes(x = estimate, y = reorder(country, estimate))) + geom_col() +
  theme_classic() + theme(axis.title.y = element_blank()) +
  xlab("estimate (capital formation)") 

Here again, a majority (~2/3) of countries with similar sign of coefficient, and France remains a clear outlier.

Finally, a focus on the savings rate.

fit_countries |>
  filter(term == "savings_rate") |>
  ggplot(aes(x = estimate, y = reorder(country, estimate))) + geom_col() +
  theme_classic() + theme(axis.title.y = element_blank()) +
  xlab("estimate (savings rate)") 

Again, a split that is not evenly balanced; which both suggest some consistency and some exceptions.
Hard to find a pattern, though (Eastern Europe is represented for the negative coefficients, but nothing jumps to the eye).

Data sources outside the World Bank

The Energy Institute compiles a valuable (& updated) dataset on energy production by source. The data is too rich (90 variables), hence we restrict it to a few items.

url <- "https://www.energyinst.org/__data/assets/excel_doc/0006/1656348/Statistical-Review-of-World-Energy-Data.xlsx"
GET(url, write_disk(tf <- tempfile(fileext = ".xlsx")))
Response [https://www.energyinst.org/__data/assets/excel_doc/0006/1656348/Statistical-Review-of-World-Energy-Data.xlsx]
  Date: 2025-10-14 15:19
  Status: 200
  Content-Type: application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
  Size: 15.2 MB
<ON DISK>  /var/folders/1d/5dm3k4954vl74y6x8l4h11d80000gn/T//RtmpbNu44s/file11261447c3d38.xlsx
df <- read_excel(tf) 
data_energy <- df |> 
  select(-ISO3166_alpha3, -ISO3166_numeric, -ISO3166_numeric, -CIS) |>
  mutate(Var = Var |> str_replace_all("_ej", "")) |>
  filter(Var %in% c("oilcons", "gascons", "coalcons", "renewables"))
colnames(data_energy) <- tolower(colnames(data_energy))
head(data_energy)
country year region subregion opec eu oecd var value
Algeria 1965 Africa Northern Africa 1 0 0 coalcons 0.0029308
Algeria 1965 Africa Northern Africa 1 0 0 gascons 0.0267498
Algeria 1965 Africa Northern Africa 1 0 0 oilcons 0.0554589
Algeria 1965 Africa Northern Africa 1 0 0 renewables 0.0014400
Algeria 1966 Africa Northern Africa 1 0 0 coalcons 0.0028470
Algeria 1966 Africa Northern Africa 1 0 0 gascons 0.0277893

NOTE: “ej” means exajoules (=\(10^{18}\)J). It’s an important unit in physics and energy-related data.

data_energy |> 
  filter(country == "Total World") |>
  ggplot(aes(x = year, y = value, fill = var)) + geom_area(alpha = 0.8) +
  theme_classic() + 
  theme(axis.title = element_blank(),
        legend.title = element_blank(),
        legend.position = c(0.2, 0.8)) +
  scale_fill_viridis_d()

We then need to join this data with the WB data…
(with a snapshot at the outome)

data_energy_wide <- data_energy |> 
  pivot_wider(names_from = var, values_from = value)
data_join <- plm_data |> 
  full_join(data_energy_wide, by = c("year", "country"))
data_join |> na.omit() |> select(-region, -subregion) |> head()
country year gdp_growth pop gdp_percap savings_rate inflation trade capital_formation pop_growth educ_level educ_spending opec eu oecd coalcons gascons oilcons renewables
Brazil 2001 -0.1567108 176301203 8.063469 16.54900 6.840359 26.93629 18.74186 1.303355 37.80 3.84468 0 0 0 0.5356368 0.4423030 3.981208 1.316308
Brazil 2002 -0.1008564 178503484 7.957156 18.29190 8.450164 27.61836 17.44908 1.241421 39.20 3.75037 0 0 0 0.5184374 0.5225851 3.885513 1.416803
Brazil 2003 0.0821680 180622688 8.036123 19.07787 14.714920 28.14038 16.85669 1.180214 40.80 3.75037 0 0 0 0.5379074 0.5849025 3.743762 1.492475
Brazil 2004 0.1854704 182675143 8.206262 21.32584 6.597185 29.67825 17.91257 1.129914 42.41 3.97448 0 0 0 0.5639648 0.6956094 3.873170 1.583042
Brazil 2005 0.3176896 184688101 8.482142 20.60575 6.869537 27.08680 17.20488 1.095906 43.10 4.47908 0 0 0 0.5439045 0.7245762 3.939635 1.670693
Brazil 2006 0.2291659 186653106 8.688478 20.52341 4.183568 26.04170 17.81647 1.058338 45.46 4.87060 0 0 0 0.5363063 0.7621121 4.035304 1.713275

We need to scale emissions by a proxy for the size of the country. This will reduce colinearity due to size.

Let us thus look at correlations

data_join <- data_join |>
  mutate(oilcons = oilcons / pop * 10^9,
         coalcons = coalcons / pop * 10^9,
         gascons = gascons / pop * 10^9,
         renewables = renewables / pop * 10^9) |>
  select(year, country, gdp_growth, coalcons, gascons, oilcons, renewables, vars) |>
  ungroup() |>
  distinct()

cor(data_join |> dplyr::select(all_of(c(vars, "coalcons", "gascons", "oilcons", "renewables"))), 
    use = "pairwise.complete") |>
  ggcorrplot() + 
  theme(text = element_text(size = 15),
        axis.text = element_text(size = 15))

Energy variables remain quite correlated.
Basically, we should only keep two, say renewables and gas… But let’s continue anyway with all four variables, out of curiosity.

We build a model omitting a few variables.
(this is just for illustration)

fml_0 <- "gdp_growth ~ gdp_percap + savings_rate + pop_growth + educ_level + coalcons + renewables"
fml <- as.formula(paste(fml_0, "| year + country"))
(feols(fml, data = data_join))$coeftable 
Estimate Std. Error t value Pr(>|t|)
gdp_percap 0.0417195 0.0162756 2.5633239 0.0105601
savings_rate 0.0024491 0.0007659 3.1976221 0.0014434
pop_growth -0.0225878 0.0073137 -3.0884106 0.0020860
educ_level 0.0000835 0.0004855 0.1719713 0.8635062
coalcons 0.0002241 0.0003797 0.5902965 0.5551685
renewables 0.0011278 0.0010598 1.0641805 0.2875872

Energy not very significant in the end…

Convergence?

Convergence in economics refers to the idea that poor countries will eventually catch-up with rich ones; that rich countries would stagnate in terms of wealth whereas developing countries, with cheaper labor force would benefit from higher growth rates, resulting in a reduction of inequalities. This line of reasoning may be subject to conditions, naturally.

This is something that we can test, too. First, let’s seen if countries with low income have growth rates superior to those with high income.

t.test(wb_growth |> filter(income == "High income") |> pull(gdp_growth),
       wb_growth |> filter(income == "Low income") |> pull(gdp_growth))

    Welch Two Sample t-test

data:  pull(filter(wb_growth, income == "High income"), gdp_growth) and pull(filter(wb_growth, income == "Low income"), gdp_growth)
t = 4.2416, df = 1884, p-value = 2.327e-05
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 0.01291440 0.03512813
sample estimates:
 mean of x  mean of y 
0.06943830 0.04541704 

What does this mean?

What about dynamic patterns?

wb_growth |> 
  filter(income %in% c("High income", "Low income")) |>
  ggplot(aes(x = year, y = gdp_growth, color = income)) + 
  geom_point(aes(alpha = income)) + geom_smooth(se = F) +
  theme_classic() +
  theme(legend.position = c(0.8, 0.95),
        legend.title = element_blank(),
        axis.title = element_blank()) + 
  scale_color_manual(values = c("#4A4443", "#63E693")) + 
  scale_alpha_manual(values = c(0.2, 0.4)) + 
  ylim(-0.065, 0.065)

Growth across income groups
  • the 1960-2010 period argues against convergence;
  • the most recent points (2024) indicate a status-quo, or a reversion…?

A step back: heuristic sources of growth

(see CSV, section 7.2)

  • luck: initial conditions may have an impact.
  • geography: natural resources (minerals, grains, cattle, etc.) are a key driver of growth. Diseases are also more frequent in some parts of the globe.
  • culture (customary beliefs and values): they can drive economic decisions, but are also hard to measure.
  • institutions: property rights, labor markets, regulation.