Growth in the cross-section (of countries)
NOTE: The present notebook is coded in R. It relies heavily on the tidyverse ecosystem of packages. We load the tidyverse below as a prerequisite for the rest of the notebook - along with a few other libraries.
\(\rightarrow\) Don’t forget that code flows sequentially. A random chunk may not work if the previous have have not been executed.
library(tidyverse) # Package for data wrangling
library(readxl) # Package to import MS Excel files
library(latex2exp) # Package for LaTeX expressions
library(quantmod) # Package for stock data extraction
library(highcharter) # Package for reactive plots
library(ggcorrplot) # Package for correlation plots
library(ggrepel) # Package for neat annotations
library(plm) # Package for panel models
library(fixest) # Other Package for panels
library(reactable) # Package for neat tables
library(WDI) # Package for World Bank data
library(broom) # Package for neat regression output
library(tidyfit) # Package for grouped regressions
library(httr) # package to fetch data online
<- function(v, n = 6){ # Imputation function
impute for(j in 1:n){
<- which(is.na(v))
ind if(length(ind)>0){
if(ind[1]==1){ind <- ind[-1]}
<- v[ind-1]
v[ind]
}
}return(v)
}
The content of the notebook is heavily inspired from the book Advanced Macro-economics - An Easy Guide.
Introduction
Until now, we have sought to explain growth through particular lenses (human capital, technology) but have mostly failed to do so.
In this session/notebook, we are interested in a broader and much more data-centric approach. We seek to analyze a large cross-section of countries as well as many explanatory factors at the same time.
Foundations
Suppose that in all generality, the production function is a generalization of the Cobb-Douglas (multiplicative) form:
\[Y=\prod_{n=1}^N X_n^{a_n}, \quad X_n >0, \quad a_n >0,\] so that we envision \(N\) different factors contributing to the economic output. Note that returns to scale are determined by \(\sum_{n=1}^Na_n\).
Taking the logarithm: \[\log(Y) = \sum_{n=1}^N a_n \log(X_n)\] and differentiating (w.r.t. time) yields
\[\frac{\dot{Y}}{Y} = \sum_{n=1}^N a_n \frac{\dot{X}}{X}.\]
This is a sound grounding for simple (linear) regression models which we’ll cover below.
Fetching & wrangling the data
Below, we extract a sufficient number (~10) of potential predictors of growth and impute a few points along the way (to increase sample size).
<- WDI( # World Bank data
wb_raw indicator = c(
"labor" = "SL.TLF.TOTL.IN", # Labor force (# individuals)
"savings_rate" = "NY.GDS.TOTL.ZS", # Savings rate (% GDP)
"inflation" = "FP.CPI.TOTL.ZG", # Inflation rate
"trade" = "NE.TRD.GNFS.ZS", # Trade as % of GDP
"pop" = "SP.POP.TOTL", # Population
"pop_growth" = "SP.POP.GROW", # Population growth
"capital_formation" = "NE.GDI.TOTL.ZS", # Gross capital formation (% GDP)
"gdp_percap" = "NY.GDP.PCAP.CD", # GDP per capita
"RD_percap" = "GB.XPD.RSDV.GD.ZS", # R&D per capita
"educ_level" = "SE.SEC.CUAT.LO.ZS", # % pop reachiing second. educ. level
"educ_spending" = "SE.XPD.TOTL.GD.ZS", # Education spending (%GDP)
"nb_researchers" = "SP.POP.SCIE.RD.P6", # Nb researchers per million inhab.
"debt" = "GC.DOD.TOTL.GD.ZS", # Central gov. debt (% of GDP)
"gdp" = "NY.GDP.MKTP.CD" # Gross Domestic Product (GDP)
), extra = TRUE,
start = 1960,
end = 2024) |>
mutate(across(everything(), as.vector)) |>
select(-status, -lending, -iso2c, -iso3c) |>
filter(region != "Aggregates", income != "Aggregates") |>
arrange(country, year) |>
group_by(country) |>
mutate(across(everything(), impute)) |>
mutate(gdp_growth = gdp_percap/dplyr::lag(gdp_percap) - 1,
.before = "region") |>
mutate(gdp_percap = log(lag(gdp_percap))) |> # log-lag transformation
ungroup() |>
filter(lastupdated == max(lastupdated)) |>
arrange(country, year) |>
mutate(capital_percap = capital_formation / labor, .before = "region")
We make a few adjustments to the data, adding GDP per capita growth.
<- wb_raw |>
wb_growth filter(region != "Aggregates", income != "Aggregates")
|> tail(3) wb_growth
country | year | lastupdated | labor | savings_rate | inflation | trade | pop | pop_growth | capital_formation | gdp_percap | RD_percap | educ_level | educ_spending | nb_researchers | debt | gdp | gdp_growth | capital_percap | region | capital | longitude | latitude | income |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Zimbabwe | 2022 | 2025-10-07 | 6118687 | 5.591073 | 104.7052 | 64.76361 | 16069056 | 1.706209 | 14.610339 | 7.620973 | NA | 64.94 | 2.0504899 | NA | NA | 32789657378 | 0.1833459 | 2.4e-06 | Sub-Saharan Africa | Harare | 31.0672 | -17.8312 | Lower middle income |
Zimbabwe | 2023 | 2025-10-07 | 6232464 | 8.637877 | 104.7052 | 50.79496 | 16340822 | 1.677096 | 16.274040 | 7.676026 | NA | 64.94 | 0.3847713 | NA | NA | 35231369343 | 0.0565964 | 2.6e-06 | Sub-Saharan Africa | Harare | 31.0672 | -17.8312 | Lower middle income |
Zimbabwe | 2024 | 2025-10-07 | 6386440 | -3.993908 | 104.7052 | 52.67301 | 16634373 | 1.780482 | 4.467751 | 7.884731 | NA | NA | 0.3847713 | NA | NA | 44187704410 | 0.2320813 | 7.0e-07 | Sub-Saharan Africa | Harare | 31.0672 | -17.8312 | Lower middle income |
First analyses
Let’s have a look at missing data, after imputation!.
In some cases, when no data exists, forward-filling is not possible.
|> select() |> is.na() |> colMeans() wb_growth
numeric(0)
<- c("gdp_percap", "savings_rate", "inflation", "trade", "capital_formation" , "pop_growth", "educ_level", "educ_spending") vars
Debt and R&D cost a lot of data depletion: we will remove them from the analysis.
Which countries are the most represented in the sample (with no missing point, after imputation)?
|>
wb_growth select(all_of(c(vars, "country"))) |>
na.omit() |>
group_by(country) |>
count(sort = T) |>
head(13)
country | n |
---|---|
Canada | 54 |
Portugal | 45 |
Spain | 44 |
Korea, Rep. | 43 |
Sweden | 42 |
Ecuador | 41 |
Italy | 38 |
Chile | 35 |
Israel | 35 |
Ireland | 34 |
Indonesia | 32 |
Mexico | 32 |
Czechia | 31 |
\(\rightarrow\) Many large and developed countries do not make it to the top, mostly due to the R&D field, but also debt variables.
Next, let us look if there is colinearity among variables. Indeed, high correlations between independent variables are likely to perturb inference.
|>
wb_growth select(all_of(vars)) |>
#na.omit() |>
cor(use = "pairwise.complete") |>
ggcorrplot(lab = TRUE, digits = 1L) +
scale_fill_viridis_c(alpha = 0.7) +
theme(legend.position = "none")
Usually, a correlation of 0.5 (in absolute value) is considered already high. A value above 0.7 is prohibitive…
So here, it seems we are relatively fine (education and GDP per capita are close, though).
Let’s have a brief look at numbers; take inflation.
Are there outliers or false values?
|>
wb_raw group_by(country) |>
summarise(avg_inflation = mean(inflation, na.rm = T)) |>
na.omit() |> head(9)
country | avg_inflation |
---|---|
Afghanistan | 4.862333 |
Albania | 2.599840 |
Algeria | 5.018634 |
Angola | 22.210968 |
Antigua and Barbuda | 2.430727 |
Argentina | 86.293375 |
Armenia | 3.251960 |
Aruba | 2.672916 |
Australia | 4.679567 |
Panel models
We now turn to an exploration of the concepts and variables seen and mentioned until today. Indeed, models are only worthwhile if they are able to explain (or predict) salient empirical properties of the economy.
We follow here a panel approach from Economic Growth in a Cross Section of Countries, though we apply it to GDP per capita and not to raw GDP:
\[g_{t,i}=\textbf{X}_{t,i}\boldsymbol{\beta}+a \log(y_{t-1,i}) + e_{t,i},\]
where \(g_{t,i}\) is the growth rate of country \(i\) at date (year) \(t\) and \(y_{t,i}\) is GDP per capita. The matrix \(\textbf{X}_{t,i}\) embeds all variables of interest.
Baseline estimation
As a first attempt, we proceed with the traditional two-way fixed effect (TWFE) model, meaning that the equation is: \[g_{t,i}=\textbf{X}_{t,i}\boldsymbol{\beta}+a \log(y_{t-1,i}) + b_i + c_t + e_{t,i},\] where the errors have zero means and the \(b_i\) and \(c_t\) are dummy variables that code the company and year rows, respectively.
The {plm}
package is likely the most used for panel models in R. We use it below (at first).
Importantly, we need a enough data per country because of the fixed effects that generate additional dummy columns. This reduces the number of countries to just 14.
<- wb_growth |> dplyr::select(all_of(c("country", "year", "gdp_growth", "pop", vars))) |>
plm_data na.omit() |> group_by(country) |> mutate(n = n()) |> filter(n > 21) |> select(-n) |> ungroup()
<- plm(formula = gdp_growth ~ . ,
fit_two_way data = plm_data,
model = "within", effect = "twoways",
index = c("country", "year"))
# summary(fit_two_way) # PLM hints towards singularity, upon verification, this is false...
We also store the results for future use.
<- fit_two_way$coef |> data.frame() |>
fit_two_way rownames_to_column(var = "variable") |>
mutate(type = "two_way")
colnames(fit_two_way)[2] <- "estimate"
To detect which variables matter, we look at p-values: they indicate the probability to obtain a value as “extreme” as the one observed under the assumption that the coefficient is equal to zero (this hypothesis is called the null). Hence if a p-value is close to zero, it signals support for the assumption that there is a link (not necessarily causal) between the dependent and independent variables.
Here, the savings rate, R&D both have (mildly) significant positive coefficients…
trade and population growth have negative coefficients.
Mono effect models
Below, we test to see the results remain robust if we only consider one dimension of effects.
<- plm(formula = gdp_growth ~ . ,
fit_indiv data = plm_data,
model = "within", effect = "individual",
index = c("country", "year"))$coef |> data.frame() |>
rownames_to_column(var = "variable") |>
mutate(type = "indiv")
colnames(fit_indiv)[2] <- "estimate"
<- plm(formula = gdp_growth ~ . ,
fit_time data = plm_data,
model = "within", effect = "time",
index = c("country", "year"))$coef |> data.frame() |>
rownames_to_column(var = "variable") |>
mutate(type = "time")
colnames(fit_time)[2] <- "estimate"
Let’s see the differences between models.
|>
fit_two_way bind_rows(fit_indiv) |>
bind_rows(fit_time) |>
filter(variable %in% vars) |> select(variable, estimate, type) |>
pivot_wider(names_from = type, values_from = estimate)
variable | two_way | indiv | time |
---|---|---|---|
gdp_percap | -0.0085779 | -0.0085779 | -0.0085779 |
savings_rate | 0.0039724 | 0.0039724 | 0.0039724 |
inflation | -0.0002962 | -0.0002962 | -0.0002962 |
trade | -0.0006141 | -0.0006141 | -0.0006141 |
capital_formation | 0.0022876 | 0.0022876 | 0.0022876 |
pop_growth | -0.0222256 | -0.0222256 | -0.0222256 |
educ_level | -0.0002145 | -0.0002145 | -0.0002145 |
educ_spending | -0.0078161 | -0.0078161 | -0.0078161 |
The values are identical. This should not be the case; there must be a problem here!
We thus repeat this exercise, but with another library, the {fixest}
package.
<- "gdp_growth ~ gdp_percap + savings_rate + inflation + trade + "
fml_0 <- paste(fml_0, "capital_formation + pop_growth + educ_level + educ_spending")
fml_0 <- as.formula(paste(fml_0, "| year"))
fml <- ((feols(fml, data = plm_data)$coefficients)/sqrt(diag(feols(fml, data = plm_data)$cov.iid))) |>
fit_time data.frame() |> rownames_to_column(var = "variable") |> mutate(type = "time")
colnames(fit_time)[2] <- "statistic"
<- as.formula(paste(fml_0, "| country"))
fml <- ((feols(fml, data = plm_data)$coefficients)/sqrt(diag(feols(fml, data = plm_data)$cov.iid))) |>
fit_indiv data.frame() |> rownames_to_column(var = "variable") |> mutate(type = "indiv")
colnames(fit_indiv)[2] <- "statistic"
<- as.formula(paste(fml_0, "| country + year"))
fml <- ((feols(fml , data = plm_data)$coefficients)/sqrt(diag(feols(fml, data = plm_data)$cov.iid))) |>
fit_twoway data.frame() |> rownames_to_column(var = "variable") |> mutate(type = "two_way")
colnames(fit_twoway)[2] <- "statistic"
NOTE that we store test statistics here and not raw coefficients. The former carry more information…
|>
fit_twoway bind_rows(fit_time) |>
bind_rows(fit_indiv) |>
pivot_wider(names_from = type, values_from = statistic)
variable | two_way | time | indiv |
---|---|---|---|
gdp_percap | -0.6361210 | -4.636578 | -2.573712 |
savings_rate | 5.1617455 | 3.516396 | 4.692063 |
inflation | -4.5424615 | -4.443799 | -4.147527 |
trade | -2.8608038 | -0.198431 | -1.078309 |
capital_formation | 1.7947934 | 3.517437 | 4.001261 |
pop_growth | -3.2777516 | -4.395138 | -2.000779 |
educ_level | -0.6274451 | 1.053713 | -1.228028 |
educ_spending | -2.0060849 | -1.457869 | -3.083658 |
Ok, so now we do find some differences in test statistics. The good news is that signs are mostly consistent across most variables.
But there are a few conflicts, too. There can be of two types:
- either the sign changes between the models;
- or the significance level changes (e.g., if some models have t-stats above 2 in absolute values, while others do not).
These results suggest that population growth, inflation and the savings rate (to a slightly lower extent) are solid drivers of growth in wealth.
Grouping
Until now, we have run the models on all of the data, but we could also consider sub-groups, either geographical ones, or income clusters. We could use grouping (in {tidyverse}
/{dplyr}
parlance) below, but in fact, for lower income countries, we do not have enough data points to proceed with estimation…
Indeed, if we look at the full sample,
|> group_by(income) |> count() wb_growth
income | n |
---|---|
High income | 5460 |
Low income | 1625 |
Lower middle income | 3315 |
Not classified | 65 |
Upper middle income | 3510 |
# wb_growth |> group_by(region) |> count()
It may seem manageable; but if we impose non-missing points…
Then the story is a bit different.
|> na.omit() |> group_by(income) |> count() wb_growth
income | n |
---|---|
High income | 236 |
Low income | 20 |
Lower middle income | 64 |
Upper middle income | 250 |
# wb_growth |> na.omit() |> group_by(region) |> count()
The sample size becomes too small. Indeed, recall that with fixed effects, when estimated with OLS, the number of columns increases by a lot (+ number of countries + number of years in the sample).
Let’s proceed with manual filters.
<- wb_growth |> filter(income == "High income") |>
plm_rich ::select(all_of(c("country", "year", "gdp_growth", vars))) |>
dplyrna.omit() |> group_by(country) |> mutate(n = n()) |> filter(n > 8) |> select(-n)
<- ((feols(fml, data = plm_rich)$coefficients)/sqrt(diag(feols(fml, data = plm_rich)$cov.iid))) |>
fit_rich data.frame() |> rownames_to_column(var = "variable") |> mutate(income = "rich")
colnames(fit_rich)[2] <- "statistic"
<- wb_growth |> filter(income == "Upper middle income") |>
plm_mid ::select(all_of(c("country", "year", "gdp_growth", vars))) |>
dplyrna.omit() |> group_by(country) |> mutate(n = n()) |> filter(n > 8) |> select(-n)
<- ((feols(fml, data = plm_mid)$coefficients)/sqrt(diag(feols(fml, data = plm_mid)$cov.iid))) |>
fit_mid data.frame() |> rownames_to_column(var = "variable") |> mutate(income = "middle")
colnames(fit_mid)[2] <- "statistic"
|> bind_rows(fit_mid) |> pivot_wider(names_from = income, values_from = statistic) fit_rich
variable | rich | middle |
---|---|---|
gdp_percap | -2.3797800 | -0.0406204 |
savings_rate | 8.8445465 | 5.6322502 |
inflation | -4.6128496 | -2.0029601 |
trade | -3.1693687 | -0.3766862 |
capital_formation | 1.5203099 | -1.0265394 |
pop_growth | -5.4839572 | -2.1934949 |
educ_level | -1.0358756 | 0.7645079 |
educ_spending | 0.5128888 | -1.8905733 |
Pretty consistent! Except perhaps for a few variables…
Predictive models
The above regression model seeks to link growth with contemporaneous variables. But we could also be interested in lagged predictors. To examine if some characteristics of nations imply future growth. Because most features are persistent (highly autocorrelated), we should not see too much difference. But the devil can be in the details… and in the lags!
# To be done/discussed in class => grouping!
Country-specific estimates
Until now, estimations were grouped: we used the information in the cross-section of firms (and chronology) to see the “global” effect of some variables on growth. But perhaps is this effect country-specific? (or even maybe time-dependent).
Below, we run the same regressions (in terms of ‘independent’ variables), but we will display results one variable at a time. The models are nevertheless multivariate.
<- wb_growth |>
fit_countries ::select(all_of(c("country", "year", "gdp_growth", vars))) |>
dplyrna.omit() |>
group_by(country) |>
mutate(n = n()) |>
filter(n > 20) |>
select(-year) |>
regress(gdp_growth ~ ., m("lm")) |>
coef()
Let’s have a look at a few variables.
First, population growth.
|>
fit_countries filter(term == "pop_growth") |>
ggplot(aes(x = estimate, y = reorder(country, estimate))) + geom_col() +
theme_classic() + theme(axis.title.y = element_blank()) +
xlab("estimate (population growth)")
A majority of estimates are negative, with a handful of exceptions, including France.
What about capital formation?
|>
fit_countries filter(term == "capital_formation") |>
ggplot(aes(x = estimate, y = reorder(country, estimate))) + geom_col() +
theme_classic() + theme(axis.title.y = element_blank()) +
xlab("estimate (capital formation)")
Here again, a majority (~2/3) of countries with similar sign of coefficient, and France remains a clear outlier.
Finally, a focus on the savings rate.
|>
fit_countries filter(term == "savings_rate") |>
ggplot(aes(x = estimate, y = reorder(country, estimate))) + geom_col() +
theme_classic() + theme(axis.title.y = element_blank()) +
xlab("estimate (savings rate)")
Again, a split that is not evenly balanced; which both suggest some consistency and some exceptions.
Hard to find a pattern, though (Eastern Europe is represented for the negative coefficients, but nothing jumps to the eye).
Data sources outside the World Bank
The Energy Institute compiles a valuable (& updated) dataset on energy production by source. The data is too rich (90 variables), hence we restrict it to a few items.
<- "https://www.energyinst.org/__data/assets/excel_doc/0006/1656348/Statistical-Review-of-World-Energy-Data.xlsx"
url GET(url, write_disk(tf <- tempfile(fileext = ".xlsx")))
Response [https://www.energyinst.org/__data/assets/excel_doc/0006/1656348/Statistical-Review-of-World-Energy-Data.xlsx]
Date: 2025-10-14 15:19
Status: 200
Content-Type: application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
Size: 15.2 MB
<ON DISK> /var/folders/1d/5dm3k4954vl74y6x8l4h11d80000gn/T//RtmpbNu44s/file11261447c3d38.xlsx
<- read_excel(tf)
df <- df |>
data_energy select(-ISO3166_alpha3, -ISO3166_numeric, -ISO3166_numeric, -CIS) |>
mutate(Var = Var |> str_replace_all("_ej", "")) |>
filter(Var %in% c("oilcons", "gascons", "coalcons", "renewables"))
colnames(data_energy) <- tolower(colnames(data_energy))
head(data_energy)
country | year | region | subregion | opec | eu | oecd | var | value |
---|---|---|---|---|---|---|---|---|
Algeria | 1965 | Africa | Northern Africa | 1 | 0 | 0 | coalcons | 0.0029308 |
Algeria | 1965 | Africa | Northern Africa | 1 | 0 | 0 | gascons | 0.0267498 |
Algeria | 1965 | Africa | Northern Africa | 1 | 0 | 0 | oilcons | 0.0554589 |
Algeria | 1965 | Africa | Northern Africa | 1 | 0 | 0 | renewables | 0.0014400 |
Algeria | 1966 | Africa | Northern Africa | 1 | 0 | 0 | coalcons | 0.0028470 |
Algeria | 1966 | Africa | Northern Africa | 1 | 0 | 0 | gascons | 0.0277893 |
NOTE: “ej” means exajoules (=\(10^{18}\)J). It’s an important unit in physics and energy-related data.
|>
data_energy filter(country == "Total World") |>
ggplot(aes(x = year, y = value, fill = var)) + geom_area(alpha = 0.8) +
theme_classic() +
theme(axis.title = element_blank(),
legend.title = element_blank(),
legend.position = c(0.2, 0.8)) +
scale_fill_viridis_d()
We then need to join this data with the WB data…
(with a snapshot at the outome)
<- data_energy |>
data_energy_wide pivot_wider(names_from = var, values_from = value)
<- plm_data |>
data_join full_join(data_energy_wide, by = c("year", "country"))
|> na.omit() |> select(-region, -subregion) |> head() data_join
country | year | gdp_growth | pop | gdp_percap | savings_rate | inflation | trade | capital_formation | pop_growth | educ_level | educ_spending | opec | eu | oecd | coalcons | gascons | oilcons | renewables |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Brazil | 2001 | -0.1567108 | 176301203 | 8.063469 | 16.54900 | 6.840359 | 26.93629 | 18.74186 | 1.303355 | 37.80 | 3.84468 | 0 | 0 | 0 | 0.5356368 | 0.4423030 | 3.981208 | 1.316308 |
Brazil | 2002 | -0.1008564 | 178503484 | 7.957156 | 18.29190 | 8.450164 | 27.61836 | 17.44908 | 1.241421 | 39.20 | 3.75037 | 0 | 0 | 0 | 0.5184374 | 0.5225851 | 3.885513 | 1.416803 |
Brazil | 2003 | 0.0821680 | 180622688 | 8.036123 | 19.07787 | 14.714920 | 28.14038 | 16.85669 | 1.180214 | 40.80 | 3.75037 | 0 | 0 | 0 | 0.5379074 | 0.5849025 | 3.743762 | 1.492475 |
Brazil | 2004 | 0.1854704 | 182675143 | 8.206262 | 21.32584 | 6.597185 | 29.67825 | 17.91257 | 1.129914 | 42.41 | 3.97448 | 0 | 0 | 0 | 0.5639648 | 0.6956094 | 3.873170 | 1.583042 |
Brazil | 2005 | 0.3176896 | 184688101 | 8.482142 | 20.60575 | 6.869537 | 27.08680 | 17.20488 | 1.095906 | 43.10 | 4.47908 | 0 | 0 | 0 | 0.5439045 | 0.7245762 | 3.939635 | 1.670693 |
Brazil | 2006 | 0.2291659 | 186653106 | 8.688478 | 20.52341 | 4.183568 | 26.04170 | 17.81647 | 1.058338 | 45.46 | 4.87060 | 0 | 0 | 0 | 0.5363063 | 0.7621121 | 4.035304 | 1.713275 |
We need to scale emissions by a proxy for the size of the country. This will reduce colinearity due to size.
Let us thus look at correlations…
<- data_join |>
data_join mutate(oilcons = oilcons / pop * 10^9,
coalcons = coalcons / pop * 10^9,
gascons = gascons / pop * 10^9,
renewables = renewables / pop * 10^9) |>
select(year, country, gdp_growth, coalcons, gascons, oilcons, renewables, vars) |>
ungroup() |>
distinct()
cor(data_join |> dplyr::select(all_of(c(vars, "coalcons", "gascons", "oilcons", "renewables"))),
use = "pairwise.complete") |>
ggcorrplot() +
theme(text = element_text(size = 15),
axis.text = element_text(size = 15))
Energy variables remain quite correlated.
Basically, we should only keep two, say renewables and gas… But let’s continue anyway with all four variables, out of curiosity.
We build a model omitting a few variables.
(this is just for illustration)
<- "gdp_growth ~ gdp_percap + savings_rate + pop_growth + educ_level + coalcons + renewables"
fml_0 <- as.formula(paste(fml_0, "| year + country"))
fml feols(fml, data = data_join))$coeftable (
Estimate | Std. Error | t value | Pr(>|t|) | |
---|---|---|---|---|
gdp_percap | 0.0417195 | 0.0162756 | 2.5633239 | 0.0105601 |
savings_rate | 0.0024491 | 0.0007659 | 3.1976221 | 0.0014434 |
pop_growth | -0.0225878 | 0.0073137 | -3.0884106 | 0.0020860 |
educ_level | 0.0000835 | 0.0004855 | 0.1719713 | 0.8635062 |
coalcons | 0.0002241 | 0.0003797 | 0.5902965 | 0.5551685 |
renewables | 0.0011278 | 0.0010598 | 1.0641805 | 0.2875872 |
Energy not very significant in the end…
Convergence?
Convergence in economics refers to the idea that poor countries will eventually catch-up with rich ones; that rich countries would stagnate in terms of wealth whereas developing countries, with cheaper labor force would benefit from higher growth rates, resulting in a reduction of inequalities. This line of reasoning may be subject to conditions, naturally.
This is something that we can test, too. First, let’s seen if countries with low income have growth rates superior to those with high income.
t.test(wb_growth |> filter(income == "High income") |> pull(gdp_growth),
|> filter(income == "Low income") |> pull(gdp_growth)) wb_growth
Welch Two Sample t-test
data: pull(filter(wb_growth, income == "High income"), gdp_growth) and pull(filter(wb_growth, income == "Low income"), gdp_growth)
t = 4.2416, df = 1884, p-value = 2.327e-05
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
0.01291440 0.03512813
sample estimates:
mean of x mean of y
0.06943830 0.04541704
What does this mean?
What about dynamic patterns?
|>
wb_growth filter(income %in% c("High income", "Low income")) |>
ggplot(aes(x = year, y = gdp_growth, color = income)) +
geom_point(aes(alpha = income)) + geom_smooth(se = F) +
theme_classic() +
theme(legend.position = c(0.8, 0.95),
legend.title = element_blank(),
axis.title = element_blank()) +
scale_color_manual(values = c("#4A4443", "#63E693")) +
scale_alpha_manual(values = c(0.2, 0.4)) +
ylim(-0.065, 0.065)
- the 1960-2010 period argues against convergence;
- the most recent points (2024) indicate a status-quo, or a reversion…?
A step back: heuristic sources of growth
(see CSV, section 7.2)
- luck: initial conditions may have an impact.
- geography: natural resources (minerals, grains, cattle, etc.) are a key driver of growth. Diseases are also more frequent in some parts of the globe.
- culture (customary beliefs and values): they can drive economic decisions, but are also hard to measure.
- institutions: property rights, labor markets, regulation.