Growth in the cross-section (of countries)

NOTE: The present notebook is coded in R. It relies heavily on the tidyverse ecosystem of packages. We load the tidyverse below as a prerequisite for the rest of the notebook - along with a few other libraries.

\(\rightarrow\) Don’t forget that code flows sequentially. A random chunk may not work if the previous have have not been executed.

library(tidyverse)    # Package for data wrangling
library(readxl)       # Package to import MS Excel files
library(latex2exp)    # Package for LaTeX expressions
library(quantmod)     # Package for stock data extraction
library(highcharter)  # Package for reactive plots
library(ggcorrplot)   # Package for correlation plots
library(ggrepel)      # Package for neat annotations
library(plm)          # Package for panel models
library(fixest)       # Other Package for panels
library(reactable)    # Package for neat tables
library(WDI)          # Package for World Bank data
library(broom)        # Package for neat regression output
library(tidyfit)      # Package for grouped regressions
library(httr)         # package to fetch data online
impute <- function(v, n = 6){     # Imputation function
  for(j in 1:n){
    ind <- which(is.na(v))
    if(length(ind)>0){
      if(ind[1]==1){ind <- ind[-1]}
      v[ind] <- v[ind-1]
    }
  }
  return(v)
}

The content of the notebook is heavily inspired from the book Advanced Macro-economics - An Easy Guide.

Introduction

Until now, we have sought to explain growth through particular lenses (human capital, technology) but have mostly failed to do so.

In this session/notebook, we are interested in a broader and much more data-centric approach. We seek to analyze a large cross-section of countries as well as many explanatory factors at the same time.

Foundations

Theory (sort of)

Suppose that in all generality, the production function is a generalization of the Cobb-Douglas (multiplicative) form:

\[Y=\prod_{n=1}^N X_n^{a_n}, \quad X_n >0, \quad a_n >0,\] so that we envision \(N\) different factors contributing to the economic output. Note that returns to scale are determined by \(\sum_{n=1}^Na_n\).
Taking the logarithm: \[\log(Y) = \sum_{n=1}^N a_n \log(X_n)\] and differentiating (w.r.t. time) yields

\[\frac{\dot{Y}}{Y} = \sum_{n=1}^N a_n \frac{\dot{X}}{X}.\]

This is a sound grounding for simple (linear) regression models which we’ll cover below.

Fetching & wrangling the data

Below, we extract a sufficient number (~10) of potential predictors of growth and impute a few points along the way (to increase sample size).

wb_raw <- WDI(                              # World Bank data
  indicator = c(
    "labor" = "SL.TLF.TOTL.IN",             # Labor force (# individuals)
    "savings_rate" = "NY.GDS.TOTL.ZS",      # Savings rate (% GDP)
    "inflation" = "FP.CPI.TOTL.ZG",         # Inflation rate
    "trade" = "NE.TRD.GNFS.ZS",             # Trade as % of GDP 
    "pop" = "SP.POP.TOTL",                  # Population
    "pop_growth" = "SP.POP.GROW",           # Population growth
    "capital_formation" = "NE.GDI.TOTL.ZS", # Gross capital formation (% GDP)
    "gdp_percap" = "NY.GDP.PCAP.CD",        # GDP per capita
    "RD_percap" = "GB.XPD.RSDV.GD.ZS",      # R&D per capita
    "educ_level" = "SE.SEC.CUAT.LO.ZS",     # % pop reachiing second. educ. level
    "educ_spending" = "SE.XPD.TOTL.GD.ZS",  # Education spending (%GDP)
    "nb_researchers" = "SP.POP.SCIE.RD.P6", # Nb researchers per million inhab.
    "debt" = "GC.DOD.TOTL.GD.ZS",           # Central gov. debt (% of GDP)
    "gdp" = "NY.GDP.MKTP.CD"                # Gross Domestic Product (GDP)
  ), 
  extra = TRUE,
  start = 1960,
  end = 2024) |>
  mutate(across(everything(), as.vector)) |>
  select(-status, -lending, -iso2c, -iso3c) |>  
  filter(region != "Aggregates", income != "Aggregates") |>
  arrange(country, year) |>
  group_by(country) |>
  mutate(across(everything(), impute)) |>
  mutate(gdp_growth = gdp_percap/dplyr::lag(gdp_percap) - 1, 
        .before = "region") |>   
  mutate(gdp_percap = log(lag(gdp_percap))) |>       # log-lag transformation
  ungroup() |>
  filter(lastupdated == max(lastupdated)) |>
  arrange(country, year) |>
  mutate(capital_percap = capital_formation / labor, .before = "region")

We make a few adjustments to the data, adding GDP per capita growth.

wb_growth <- wb_raw |> 
  filter(region != "Aggregates", income != "Aggregates") 

wb_growth |> tail(3)

country	year	lastupdated	labor	savings_rate	inflation	trade	pop	pop_growth	capital_formation	gdp_percap	RD_percap	educ_level	educ_spending	nb_researchers	debt	gdp	gdp_growth	capital_percap	region	capital	longitude	latitude	income
Zimbabwe	2022	2025-10-07	6118687	5.591073	104.7052	64.76361	16069056	1.706209	14.610339	7.620973	NA	64.94	2.0504899	NA	NA	32789657378	0.1833459	2.4e-06	Sub-Saharan Africa	Harare	31.0672	-17.8312	Lower middle income
Zimbabwe	2023	2025-10-07	6232464	8.637877	104.7052	50.79496	16340822	1.677096	16.274040	7.676026	NA	64.94	0.3847713	NA	NA	35231369343	0.0565964	2.6e-06	Sub-Saharan Africa	Harare	31.0672	-17.8312	Lower middle income
Zimbabwe	2024	2025-10-07	6386440	-3.993908	104.7052	52.67301	16634373	1.780482	4.467751	7.884731	NA	NA	0.3847713	NA	NA	44187704410	0.2320813	7.0e-07	Sub-Saharan Africa	Harare	31.0672	-17.8312	Lower middle income

First analyses

Let’s have a look at missing data, after imputation!.
In some cases, when no data exists, forward-filling is not possible.

wb_growth |> select() |> is.na() |> colMeans()

numeric(0)

vars <- c("gdp_percap", "savings_rate", "inflation", "trade", "capital_formation" , "pop_growth", "educ_level", "educ_spending")

Debt and R&D cost a lot of data depletion: we will remove them from the analysis.
Which countries are the most represented in the sample (with no missing point, after imputation)?

wb_growth |>
  select(all_of(c(vars, "country"))) |>
  na.omit() |>
  group_by(country) |>
  count(sort = T) |>
  head(13)

country	n
Canada	54
Portugal	45
Spain	44
Korea, Rep.	43
Sweden	42
Ecuador	41
Italy	38
Chile	35
Israel	35
Ireland	34
Indonesia	32
Mexico	32
Czechia	31

\(\rightarrow\) Many large and developed countries do not make it to the top, mostly due to the R&D field, but also debt variables.

Next, let us look if there is colinearity among variables. Indeed, high correlations between independent variables are likely to perturb inference.

wb_growth |>
  select(all_of(vars)) |>
  #na.omit() |>
  cor(use = "pairwise.complete") |>
  ggcorrplot(lab = TRUE, digits = 1L) +
  scale_fill_viridis_c(alpha = 0.7) +
  theme(legend.position = "none")

Usually, a correlation of 0.5 (in absolute value) is considered already high. A value above 0.7 is prohibitive…
So here, it seems we are relatively fine (education and GDP per capita are close, though).

Let’s have a brief look at numbers; take inflation.
Are there outliers or false values?

wb_raw |>
  group_by(country) |>
  summarise(avg_inflation = mean(inflation, na.rm = T)) |>
  na.omit() |> head(9)

country	avg_inflation
Afghanistan	4.862333
Albania	2.599840
Algeria	5.018634
Angola	22.210968
Antigua and Barbuda	2.430727
Argentina	86.293375
Armenia	3.251960
Aruba	2.672916
Australia	4.679567

Panel models

We now turn to an exploration of the concepts and variables seen and mentioned until today. Indeed, models are only worthwhile if they are able to explain (or predict) salient empirical properties of the economy.

Note

We follow here a panel approach from Economic Growth in a Cross Section of Countries, though we apply it to GDP per capita and not to raw GDP:

\[g_{t,i}=\textbf{X}_{t,i}\boldsymbol{\beta}+a \log(y_{t-1,i}) + e_{t,i},\]

where \(g_{t,i}\) is the growth rate of country \(i\) at date (year) \(t\) and \(y_{t,i}\) is GDP per capita. The matrix \(\textbf{X}_{t,i}\) embeds all variables of interest.

Baseline estimation

As a first attempt, we proceed with the traditional two-way fixed effect (TWFE) model, meaning that the equation is: \[g_{t,i}=\textbf{X}_{t,i}\boldsymbol{\beta}+a \log(y_{t-1,i}) + b_i + c_t + e_{t,i},\] where the errors have zero means and the \(b_i\) and \(c_t\) are dummy variables that code the company and year rows, respectively.

The {plm} package is likely the most used for panel models in R. We use it below (at first).

Importantly, we need a enough data per country because of the fixed effects that generate additional dummy columns. This reduces the number of countries to just 14.

plm_data <- wb_growth |> dplyr::select(all_of(c("country", "year", "gdp_growth", "pop", vars))) |> 
  na.omit() |> group_by(country) |> mutate(n = n()) |> filter(n > 21) |> select(-n) |> ungroup()
fit_two_way <- plm(formula = gdp_growth ~ . , 
                   data = plm_data,
                   model = "within", effect = "twoways", 
                   index = c("country", "year"))
# summary(fit_two_way) # PLM hints towards singularity, upon verification, this is false...

We also store the results for future use.

fit_two_way <- fit_two_way$coef |> data.frame() |> 
  rownames_to_column(var = "variable") |> 
  mutate(type = "two_way")
colnames(fit_two_way)[2] <- "estimate"

To detect which variables matter, we look at p-values: they indicate the probability to obtain a value as “extreme” as the one observed under the assumption that the coefficient is equal to zero (this hypothesis is called the null). Hence if a p-value is close to zero, it signals support for the assumption that there is a link (not necessarily causal) between the dependent and independent variables.

Here, the savings rate, R&D both have (mildly) significant positive coefficients…
trade and population growth have negative coefficients.

Mono effect models

Below, we test to see the results remain robust if we only consider one dimension of effects.

fit_indiv <- plm(formula = gdp_growth ~ . , 
                 data = plm_data,
                 model = "within", effect = "individual",
                 index = c("country", "year"))$coef |> data.frame() |> 
  rownames_to_column(var = "variable") |> 
  mutate(type = "indiv")
colnames(fit_indiv)[2] <- "estimate"
fit_time <- plm(formula = gdp_growth ~ . , 
                data = plm_data,
                model = "within", effect = "time",
                index = c("country", "year"))$coef |> data.frame() |> 
  rownames_to_column(var = "variable") |> 
  mutate(type = "time")
colnames(fit_time)[2] <- "estimate"

Let’s see the differences between models.

fit_two_way |> 
  bind_rows(fit_indiv) |> 
  bind_rows(fit_time) |> 
  filter(variable %in% vars) |> select(variable, estimate, type) |>
  pivot_wider(names_from = type, values_from = estimate)

variable	two_way	indiv	time
gdp_percap	-0.0085779	-0.0085779	-0.0085779
savings_rate	0.0039724	0.0039724	0.0039724
inflation	-0.0002962	-0.0002962	-0.0002962
trade	-0.0006141	-0.0006141	-0.0006141
capital_formation	0.0022876	0.0022876	0.0022876
pop_growth	-0.0222256	-0.0222256	-0.0222256
educ_level	-0.0002145	-0.0002145	-0.0002145
educ_spending	-0.0078161	-0.0078161	-0.0078161

The values are identical. This should not be the case; there must be a problem here!
We thus repeat this exercise, but with another library, the {fixest} package.

fml_0 <- "gdp_growth ~ gdp_percap + savings_rate + inflation + trade + "
fml_0 <- paste(fml_0, "capital_formation + pop_growth + educ_level + educ_spending")
fml <- as.formula(paste(fml_0, "| year"))
fit_time <- ((feols(fml, data = plm_data)$coefficients)/sqrt(diag(feols(fml, data = plm_data)$cov.iid))) |> 
  data.frame() |> rownames_to_column(var = "variable") |> mutate(type = "time")
colnames(fit_time)[2] <- "statistic"

fml <- as.formula(paste(fml_0, "| country"))
fit_indiv <- ((feols(fml, data = plm_data)$coefficients)/sqrt(diag(feols(fml, data = plm_data)$cov.iid))) |> 
  data.frame() |> rownames_to_column(var = "variable") |> mutate(type = "indiv")
colnames(fit_indiv)[2] <- "statistic"

fml <- as.formula(paste(fml_0, "| country + year"))
fit_twoway <- ((feols(fml , data = plm_data)$coefficients)/sqrt(diag(feols(fml, data = plm_data)$cov.iid))) |> 
  data.frame() |> rownames_to_column(var = "variable") |> mutate(type = "two_way")
colnames(fit_twoway)[2] <- "statistic"

NOTE that we store test statistics here and not raw coefficients. The former carry more information…

fit_twoway |> 
  bind_rows(fit_time) |>
  bind_rows(fit_indiv) |>
  pivot_wider(names_from = type, values_from = statistic)

variable	two_way	time	indiv
gdp_percap	-0.6361210	-4.636578	-2.573712
savings_rate	5.1617455	3.516396	4.692063
inflation	-4.5424615	-4.443799	-4.147527
trade	-2.8608038	-0.198431	-1.078309
capital_formation	1.7947934	3.517437	4.001261
pop_growth	-3.2777516	-4.395138	-2.000779
educ_level	-0.6274451	1.053713	-1.228028
educ_spending	-2.0060849	-1.457869	-3.083658

Ok, so now we do find some differences in test statistics. The good news is that signs are mostly consistent across most variables.
But there are a few conflicts, too. There can be of two types:

either the sign changes between the models;
or the significance level changes (e.g., if some models have t-stats above 2 in absolute values, while others do not).

These results suggest that population growth, inflation and the savings rate (to a slightly lower extent) are solid drivers of growth in wealth.

Grouping

Until now, we have run the models on all of the data, but we could also consider sub-groups, either geographical ones, or income clusters. We could use grouping (in {tidyverse}/{dplyr} parlance) below, but in fact, for lower income countries, we do not have enough data points to proceed with estimation…

Indeed, if we look at the full sample,

wb_growth |> group_by(income) |> count()

income	n
High income	5460
Low income	1625
Lower middle income	3315
Not classified	65
Upper middle income	3510

# wb_growth |> group_by(region) |> count()

It may seem manageable; but if we impose non-missing points…
Then the story is a bit different.

wb_growth |> na.omit() |> group_by(income) |> count()

income	n
High income	236
Low income	20
Lower middle income	64
Upper middle income	250

# wb_growth |> na.omit() |> group_by(region) |> count()

The sample size becomes too small. Indeed, recall that with fixed effects, when estimated with OLS, the number of columns increases by a lot (+ number of countries + number of years in the sample).

Let’s proceed with manual filters.

plm_rich <- wb_growth |> filter(income == "High income") |>
  dplyr::select(all_of(c("country", "year", "gdp_growth", vars))) |> 
  na.omit() |> group_by(country) |> mutate(n = n()) |> filter(n > 8) |> select(-n)
fit_rich <- ((feols(fml, data = plm_rich)$coefficients)/sqrt(diag(feols(fml, data = plm_rich)$cov.iid))) |> 
  data.frame() |> rownames_to_column(var = "variable") |> mutate(income = "rich")
colnames(fit_rich)[2] <- "statistic"

plm_mid <- wb_growth |> filter(income == "Upper middle income") |>
  dplyr::select(all_of(c("country", "year", "gdp_growth", vars))) |> 
  na.omit() |> group_by(country) |> mutate(n = n()) |> filter(n > 8) |> select(-n)
fit_mid <- ((feols(fml, data = plm_mid)$coefficients)/sqrt(diag(feols(fml, data = plm_mid)$cov.iid))) |> 
  data.frame() |> rownames_to_column(var = "variable") |> mutate(income = "middle")
colnames(fit_mid)[2] <- "statistic"

fit_rich |> bind_rows(fit_mid) |> pivot_wider(names_from = income, values_from = statistic)

variable	rich	middle
gdp_percap	-2.3797800	-0.0406204
savings_rate	8.8445465	5.6322502
inflation	-4.6128496	-2.0029601
trade	-3.1693687	-0.3766862
capital_formation	1.5203099	-1.0265394
pop_growth	-5.4839572	-2.1934949
educ_level	-1.0358756	0.7645079
educ_spending	0.5128888	-1.8905733

Pretty consistent! Except perhaps for a few variables…

Predictive models

The above regression model seeks to link growth with contemporaneous variables. But we could also be interested in lagged predictors. To examine if some characteristics of nations imply future growth. Because most features are persistent (highly autocorrelated), we should not see too much difference. But the devil can be in the details… and in the lags!

# To be done/discussed in class => grouping!

Country-specific estimates

Until now, estimations were grouped: we used the information in the cross-section of firms (and chronology) to see the “global” effect of some variables on growth. But perhaps is this effect country-specific? (or even maybe time-dependent).

Below, we run the same regressions (in terms of ‘independent’ variables), but we will display results one variable at a time. The models are nevertheless multivariate.

fit_countries <- wb_growth |> 
  dplyr::select(all_of(c("country", "year", "gdp_growth", vars))) |> 
  na.omit() |>
  group_by(country) |>
  mutate(n = n()) |>
  filter(n > 20) |>
  select(-year) |>
  regress(gdp_growth ~ ., m("lm")) |>
  coef()

Let’s have a look at a few variables.

First, population growth.

fit_countries |>
  filter(term == "pop_growth") |>
  ggplot(aes(x = estimate, y = reorder(country, estimate))) + geom_col() +
  theme_classic() + theme(axis.title.y = element_blank()) +
  xlab("estimate (population growth)")

A majority of estimates are negative, with a handful of exceptions, including France.

What about capital formation?

fit_countries |>
  filter(term == "capital_formation") |>
  ggplot(aes(x = estimate, y = reorder(country, estimate))) + geom_col() +
  theme_classic() + theme(axis.title.y = element_blank()) +
  xlab("estimate (capital formation)")

Here again, a majority (~2/3) of countries with similar sign of coefficient, and France remains a clear outlier.

Finally, a focus on the savings rate.

fit_countries |>
  filter(term == "savings_rate") |>
  ggplot(aes(x = estimate, y = reorder(country, estimate))) + geom_col() +
  theme_classic() + theme(axis.title.y = element_blank()) +
  xlab("estimate (savings rate)")

Again, a split that is not evenly balanced; which both suggest some consistency and some exceptions.
Hard to find a pattern, though (Eastern Europe is represented for the negative coefficients, but nothing jumps to the eye).

Data sources outside the World Bank

The Energy Institute compiles a valuable (& updated) dataset on energy production by source. The data is too rich (90 variables), hence we restrict it to a few items.

url <- "https://www.energyinst.org/__data/assets/excel_doc/0006/1656348/Statistical-Review-of-World-Energy-Data.xlsx"
GET(url, write_disk(tf <- tempfile(fileext = ".xlsx")))

Response [https://www.energyinst.org/__data/assets/excel_doc/0006/1656348/Statistical-Review-of-World-Energy-Data.xlsx]
  Date: 2025-10-16 11:05
  Status: 200
  Content-Type: application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
  Size: 15.2 MB
<ON DISK>  /var/folders/1d/5dm3k4954vl74y6x8l4h11d80000gn/T//RtmpdlZQIV/filefa32410f31c6.xlsx

df <- read_excel(tf) 
data_energy <- df |> 
  select(-ISO3166_alpha3, -ISO3166_numeric, -ISO3166_numeric, -CIS) |>
  mutate(Var = Var |> str_replace_all("_ej", "")) |>
  filter(Var %in% c("oilcons", "gascons", "coalcons", "renewables"))
colnames(data_energy) <- tolower(colnames(data_energy))
head(data_energy)

country	year	region	subregion	opec	var	value
Algeria	1965	Africa	Northern Africa	1	coalcons	0.0029308
Algeria	1965	Africa	Northern Africa	1	gascons	0.0267498
Algeria	1965	Africa	Northern Africa	1	oilcons	0.0554589
Algeria	1965	Africa	Northern Africa	1	renewables	0.0014400
Algeria	1966	Africa	Northern Africa	1	coalcons	0.0028470
Algeria	1966	Africa	Northern Africa	1	gascons	0.0277893

NOTE: “ej” means exajoules (=\(10^{18}\)J). It’s an important unit in physics and energy-related data.

data_energy |> 
  filter(country == "Total World") |>
  ggplot(aes(x = year, y = value, fill = var)) + geom_area(alpha = 0.8) +
  theme_classic() + 
  theme(axis.title = element_blank(),
        legend.title = element_blank(),
        legend.position = c(0.2, 0.8)) +
  scale_fill_viridis_d()

We then need to join this data with the WB data…
(with a snapshot at the outome)

data_energy_wide <- data_energy |> 
  pivot_wider(names_from = var, values_from = value)
data_join <- plm_data |> 
  full_join(data_energy_wide, by = c("year", "country"))
data_join |> na.omit() |> select(-region, -subregion) |> head()

country	year	gdp_growth	pop	gdp_percap	savings_rate	inflation	trade	capital_formation	pop_growth	educ_level	educ_spending	coalcons	gascons	oilcons	renewables
Brazil	2001	-0.1567108	176301203	8.063469	16.54900	6.840359	26.93629	18.74186	1.303355	37.80	3.84468	0.5356368	0.4423030	3.981208	1.316308
Brazil	2002	-0.1008564	178503484	7.957156	18.29190	8.450164	27.61836	17.44908	1.241421	39.20	3.75037	0.5184374	0.5225851	3.885513	1.416803
Brazil	2003	0.0821680	180622688	8.036123	19.07787	14.714920	28.14038	16.85669	1.180214	40.80	3.75037	0.5379074	0.5849025	3.743762	1.492475
Brazil	2004	0.1854704	182675143	8.206262	21.32584	6.597185	29.67825	17.91257	1.129914	42.41	3.97448	0.5639648	0.6956094	3.873170	1.583042
Brazil	2005	0.3176896	184688101	8.482142	20.60575	6.869537	27.08680	17.20488	1.095906	43.10	4.47908	0.5439045	0.7245762	3.939635	1.670693
Brazil	2006	0.2291659	186653106	8.688478	20.52341	4.183568	26.04170	17.81647	1.058338	45.46	4.87060	0.5363063	0.7621121	4.035304	1.713275

We need to scale emissions by a proxy for the size of the country. This will reduce colinearity due to size.

Let us thus look at correlations…

data_join <- data_join |>
  mutate(oilcons = oilcons / pop * 10^9,
         coalcons = coalcons / pop * 10^9,
         gascons = gascons / pop * 10^9,
         renewables = renewables / pop * 10^9) |>
  select(year, country, gdp_growth, coalcons, gascons, oilcons, renewables, vars) |>
  ungroup() |>
  distinct()

cor(data_join |> dplyr::select(all_of(c(vars, "coalcons", "gascons", "oilcons", "renewables"))), 
    use = "pairwise.complete") |>
  ggcorrplot() + 
  theme(text = element_text(size = 15),
        axis.text = element_text(size = 15))

Energy variables remain quite correlated.
Basically, we should only keep two, say renewables and gas… But let’s continue anyway with all four variables, out of curiosity.

We build a model omitting a few variables.
(this is just for illustration)

fml_0 <- "gdp_growth ~ gdp_percap + savings_rate + pop_growth + educ_level + coalcons + renewables"
fml <- as.formula(paste(fml_0, "| year + country"))
(feols(fml, data = data_join))$coeftable

	Estimate	Std. Error	t value	Pr(>\|t\|)
gdp_percap	0.0417195	0.0162756	2.5633239	0.0105601
savings_rate	0.0024491	0.0007659	3.1976221	0.0014434
pop_growth	-0.0225878	0.0073137	-3.0884106	0.0020860
educ_level	0.0000835	0.0004855	0.1719713	0.8635062
coalcons	0.0002241	0.0003797	0.5902965	0.5551685
renewables	0.0011278	0.0010598	1.0641805	0.2875872

Energy not very significant in the end…

Convergence?

Convergence in economics refers to the idea that poor countries will eventually catch-up with rich ones; that rich countries would stagnate in terms of wealth whereas developing countries, with cheaper labor force would benefit from higher growth rates, resulting in a reduction of inequalities. This line of reasoning may be subject to conditions, naturally.

This is something that we can test, too. First, let’s seen if countries with low income have growth rates superior to those with high income.

t.test(wb_growth |> filter(income == "High income") |> pull(gdp_growth),
       wb_growth |> filter(income == "Low income") |> pull(gdp_growth))


    Welch Two Sample t-test

data:  pull(filter(wb_growth, income == "High income"), gdp_growth) and pull(filter(wb_growth, income == "Low income"), gdp_growth)
t = 4.2416, df = 1884, p-value = 2.327e-05
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 0.01291440 0.03512813
sample estimates:
 mean of x  mean of y 
0.06943830 0.04541704

What does this mean?

What about dynamic patterns?

wb_growth |> 
  filter(income %in% c("High income", "Low income")) |>
  ggplot(aes(x = year, y = gdp_growth, color = income)) + 
  geom_point(aes(alpha = income)) + geom_smooth(se = F) +
  theme_classic() +
  theme(legend.position = c(0.8, 0.95),
        legend.title = element_blank(),
        axis.title = element_blank()) + 
  scale_color_manual(values = c("#4A4443", "#63E693")) + 
  scale_alpha_manual(values = c(0.2, 0.4)) + 
  ylim(-0.065, 0.065)

the 1960-2010 period argues against convergence;
the most recent points (2024) indicate a status-quo, or a reversion…?

This is supported by What Remains of Cross-Country Convergence?:

The above table shows the transitions from income groups in periods with and without crises. It shows that moving upward (to a higher income group) is much more likely outside crisis periods, but that the probability of moving downwards increases during downturns.

A step back: heuristic sources of growth

(see CSV, section 7.2)

luck: initial conditions may have an impact.
geography: natural resources (minerals, grains, cattle, etc.) are a key driver of growth. Diseases are also more frequent in some parts of the globe.
culture (customary beliefs and values): they can drive economic decisions, but are also hard to measure.
institutions: property rights, labor markets, regulation.