Technological change

NOTE: The present notebook is coded in R. It relies heavily on the tidyverse ecosystem of packages. We load the tidyverse below as a prerequisite for the rest of the notebook - along with a few other libraries.

\(\rightarrow\) Don’t forget that code flows sequentially. A random chunk may not work if the previous have have not been executed.

library(tidyverse)    # Package for data wrangling
library(readxl)       # Package to import MS Excel files
library(latex2exp)    # Package for LaTeX expressions
library(quantmod)     # Package for stock data extraction
library(highcharter)  # Package for reactive plots
library(ggcorrplot)   # Package for correlation plots
library(plm)          # Package for panel models
library(WDI)          # Package for World Bank data
library(broom)        # Package for neat regression output

The content of the notebook is heavily inspired from the book Advanced Macro-economics - An Easy Guide.

Context

It’s not easy to explain growth endogenously. Adding new factors (human capital) to the production function does not help if CRS are assumed.

Back to Solow

Recall a Cobb-Douglas production function, \(y=Ak^\alpha\). Suppose now that technology allow \(A\) to grow, i.e., \(\dot{A}_t/A_t=\gamma_A\) (\(A_t=A_0e^{\gamma_A t}\)). Then,

\[\dot{y}_t= \dot{A}_t k_t^\alpha + A_t\alpha \dot{k}_tk_t^{\alpha-1}\]

and \[\frac{\dot{y}_t}{y_t}= \frac{\dot{A}_t}{A_t}+\alpha \frac{\dot{k}_t}{k_t}=\gamma_A + \alpha \gamma_k.\] The trick is that technology does not enter the budget constraint (unlike capital). This is why it can have a nonzero growth rate upon equilibrium. Capital remains constant, but technology grows indefinitely.

This is cheating, i.e., growth is exogenous.

A seemingly segmented economy

One possible “way out” is to posit a novel form of the production function. Here we follow Romer’s Endogenous Technological Change. Originally, the model assumes a variety of products, \(X(i)\) - \(i\) being the index. Now, there are also different types of products: the final ones and the intermediate (or raw) ones, which serve as inputs for final output. \(X(i)\) refers to the quantity of intermediate input of variety \(i\) used by the economy. But in fact, in the end (as we’ll see), the only thing that matters is the dichotomy between intermediate products and the final output. Indeed, the production function is seemingly more intricate:

\[Y(X)=\left(\int_0^M X(i)^\alpha di \right)^{1/\alpha},\] where \(M\) is the range of varieties (basically, the integral is a sum). Another way to see this is to imagine sectors that are infinitely small (hence the integral). Labor is left out to ease the computations - but can also be viewed as already incorporated in the \(X(i)\).

The firms that produce the final output take the prices of intermediate products (\(p(i)\)) as given. They seek to minimize costs for a given unit of good produced, i.e.,

\[\min_{X(i)} \int_0^Mp(i)X(i)di, \quad s.t. \quad \int_0^MX(i)^\alpha di=1 \tag{1}\]

The Lagrange formulation is

\[L=\int_0^Mp(i)X(i)di - \lambda \left(\int_0^MX(i)^\alpha di-1 \right)\]

and

\[\frac{\partial L}{\partial X(i)}=p(i)-\lambda \alpha X(i)^{\alpha-1}\]

so that the FOCs lead to \[X(i)=\left(\frac{\alpha \lambda}{p(i)} \right)^{1/(1-\alpha)}.\]

Demand is logically downward sloping: if variety \(i\) costs more, then demand for it will shrink.

But in fact, upon simplifying assumption on the lack of heterogeneity in the cross-section of intermediate products, the distinction vanishes. Indeed, upon setting \(X(i)=Z/M\) where \(Z\) represents the total resources required to produce the intermediate inputs, we get

\[Y=(M(Z/M)^\alpha)^{1/\alpha}=ZM^{1/\alpha+1},\]

which, from the perspective of \(Z\), is equivalent to the \(AK\) model.

Importantly, the varieties are not fixed once and for all; they may change, due to innovations and R&D. Hence, while \(Z\) is fixed, \(M\) is not and for simplicity, we assume \(\dot{M}_t=\gamma_M M_t\) (i.e., \(M_t=M_0e^{\gamma_M t}\)). It holds that

\[\dot{Y}_t=Z\dot{M}_t (1/\alpha+1)M_t^{1/\alpha}\] so

\[\frac{\dot{Y}_t}{Y_t}=(1/\alpha+1)\frac{\dot{M}_t}{M_t}=(1/\alpha+1)\gamma_M.\] In the original paper, \(\gamma_M\) depends on \(\alpha\), on labor and, crucially, on the productivity of innovation. It is linearly increasing in the latter two variables.

Semi-endogenous growth

Here we follow R&D-based models of economic growth.

Here, the total labor force is split in two: \(L_Y\) for the labor that directly produces output and \(L_A\) for the workforce that works in R&D… The production function is

\[Y=K^{1-a}(AL_Y)^a\] and the interesting part here is the evolution of \(A\), which is defined as productivity of knowledge. We know that specifying \(\dot{A}/A=\delta\) is cheating as this leads to exogenous growth. Instead, suppose \[\dot{A}=\tilde{\delta} L_A^\lambda, \quad \lambda \in (0,1],\] i.e., change in innovation is driven by the R&D headcount but possibly at a power smaller than one. \(\tilde{\delta}\) is the rate at which “scientists” discover new ideas and products. This rate could depend on the level of knowledge in the economy. Here we assume that \[\tilde{\delta}=\delta A^\phi,\] where \(\phi\) determines the returns of knowledge. Note that it can be negative! In the end, \[\dot{A}=\delta A^\phi L_A^\lambda \quad \Leftrightarrow \quad \gamma_A= \frac{\dot{A}}{A}=\delta A^{\phi-1}L_A^\lambda \] If we differentiate with respect to \(t\), we get \[\frac{\partial \gamma_A}{\partial t}=\delta(\lambda L_A^{\lambda-1}A^{\phi-1}\dot{L}_A+\dot{A}(\phi-1)A^{\phi-2}L_A^\lambda)\] If the growth rate of \(A\) remains constant, this means the above quantity is zero, i.e., \[\frac{\lambda}{1-\phi}\frac{\dot{L_A}}{L_A}=\frac{\dot{A}}{A}\] If the growth rate of \(L_A\) is \(n\), then we have \[\gamma_A=\frac{\lambda n}{1-\phi}, \tag{2}\] hence the parameter \(\phi\) plays a crucial role. This is all the more evident if we recall that under standard assumptions, it holds that production factors follow dynamics such as: \[ \gamma_x= \frac{\dot{x}_t}{x_t}=s\frac{y_t}{x_t}-(\delta+n),\] hence if \(\gamma_x\) is constant, it means that the ratio \(y_t/x_t\) should be constant too, i.e., that all quantities grow at the same rate, which will be given by Equation 2 in the model.

What the data says

The model

We now turn to an empirical exploration of the concepts and variables seen and mentioned until today. Indeed, models are only worthwhile if they are able to explain and predict salient empirical properties of the economy.

We follow here a panel approach from Economic Growth in a Cross Section of Countries:

\[g_{t,i}=\textbf{X}_{t,i}\boldsymbol{\beta}+a \log(y_{t-1,i}) + e_{t,i},\]

where \(g_{t,i}\) is the growth rate of country \(i\) at date (year) \(t\) and \(y_{t,i}\) is GDP per capita. The matrix \(\textbf{X}_{t,i}\) embeds all variables of interest.

Fetching & wrangling the data

impute <- function(v, n = 12){     # Imputation function
  for(j in 1:n){
    ind <- which(is.na(v))
    if(length(ind)>0){
      if(ind[1]==1){ind <- ind[-1]}
      v[ind] <- v[ind-1]
    }
  }
  return(v)
}

wb_growth <- WDI(                           # World Bank data
  indicator = c(
    "labor" = "SL.TLF.TOTL.IN",             # Labor force (# individuals)
    "savings_rate" = "NY.GDS.TOTL.ZS",      # Savings rate (% GDP)
    "inflation" = "FP.CPI.TOTL.ZG",         # Inflation rate
    "trade" = "NE.TRD.GNFS.ZS",             # Trade as % of GDP 
    "pop" = "SP.POP.TOTL",                  # Population
    "pop_growth" = "SP.POP.GROW",           # Population growth
    "capital_formation" = "NE.GDI.TOTL.ZS", # Gross capital formation (% GDP)
    "gdp_percap" = "NY.GDP.PCAP.CD",        # GDP per capita
    "RD_percap" = "GB.XPD.RSDV.GD.ZS",      # R&D per capita
    "educ_level" = "SE.SEC.CUAT.LO.ZS",     # % pop reachiing second. educ. level
    "educ_spending" = "SE.XPD.TOTL.GD.ZS",  # Education spending (%GDP)
    "debt" = "GC.DOD.TOTL.GD.ZS",           # Central gov. debt (% of GDP)
    "gdp" = "NY.GDP.MKTP.CD"                # Gross Domestic Product (GDP)
  ), 
  extra = TRUE,
  start = 1960,
  end = 2024) |>
  mutate(across(everything(), as.vector)) |>
  select(-status, -lending, -iso2c, -iso3c) |>
  filter(lastupdated == max(lastupdated)) |>
  arrange(country, year) |>
  mutate(capital_percap = capital_formation / labor, .before = "region")

We make a few adjustments to the data, adding GDP per capita growth and imputing a few points along the way (to increase sample size).

wb_growth <- wb_growth |> 
  filter(region != "Aggregates") |> # Remove continents & co. 
  group_by(country) |>
  mutate(gdp_growth = gdp_percap/dplyr::lag(gdp_percap) - 1, .before = "region") |>
  mutate(across(labor:capital_percap, ~ impute(.x, n = 3))) |>
  ungroup()

wb_growth |> head(9)

country	year	lastupdated	labor	savings_rate	inflation	trade	pop	pop_growth	capital_formation	gdp_percap	RD_percap	educ_level	educ_spending	debt	gdp	capital_percap	gdp_growth	region	capital	longitude	latitude	income
Afghanistan	1960	2024-09-19	NA	NA	NA	NA	8622466	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	South Asia	Kabul	69.1761	34.5228	Low income
Afghanistan	1961	2024-09-19	NA	NA	NA	NA	8790140	1.925952	NA	NA	NA	NA	NA	NA	NA	NA	NA	South Asia	Kabul	69.1761	34.5228	Low income
Afghanistan	1962	2024-09-19	NA	NA	NA	NA	8969047	2.014879	NA	NA	NA	NA	NA	NA	NA	NA	NA	South Asia	Kabul	69.1761	34.5228	Low income
Afghanistan	1963	2024-09-19	NA	NA	NA	NA	9157465	2.078997	NA	NA	NA	NA	NA	NA	NA	NA	NA	South Asia	Kabul	69.1761	34.5228	Low income
Afghanistan	1964	2024-09-19	NA	NA	NA	NA	9355514	2.139651	NA	NA	NA	NA	NA	NA	NA	NA	NA	South Asia	Kabul	69.1761	34.5228	Low income
Afghanistan	1965	2024-09-19	NA	NA	NA	NA	9565147	2.216007	NA	NA	NA	NA	NA	NA	NA	NA	NA	South Asia	Kabul	69.1761	34.5228	Low income
Afghanistan	1966	2024-09-19	NA	NA	NA	NA	9783147	2.253524	NA	NA	NA	NA	NA	NA	NA	NA	NA	South Asia	Kabul	69.1761	34.5228	Low income
Afghanistan	1967	2024-09-19	NA	NA	NA	NA	10010030	2.292638	NA	NA	NA	NA	NA	NA	NA	NA	NA	South Asia	Kabul	69.1761	34.5228	Low income
Afghanistan	1968	2024-09-19	NA	NA	NA	NA	10247780	2.347351	NA	NA	NA	NA	NA	NA	NA	NA	NA	South Asia	Kabul	69.1761	34.5228	Low income

First analyses

Let’s have a look at missing data.

vars <- c("savings_rate", "inflation", "trade", "debt", "capital_formation" , "pop_growth",
          "RD_percap", "educ_level", "educ_spending")

wb_growth |> select(all_of(vars)) |> is.na() |> colMeans()

     savings_rate         inflation             trade              debt 
       0.37827035        0.36271802        0.35268895        0.82957849 
capital_formation        pop_growth         RD_percap        educ_level 
       0.38190407        0.01780523        0.78953488        0.71722384 
    educ_spending 
       0.53393895

Debt and R&D cost a lot of data depletion.
Which countries are the most represented in the data?

vars <- c("savings_rate", "inflation", "trade", "debt", "capital_formation" , "pop_growth",
          "RD_percap", "educ_level", "educ_spending")
wb_growth |>
  select(all_of(c(vars, "country"))) |>
  na.omit() |>
  group_by(country) |>
  count(sort = T) |>
  head(12)

country	n
Canada	28
Portugal	26
Hungary	23
Italy	23
Luxembourg	23
Bulgaria	22
Malaysia	22
Romania	22
Sweden	22
Australia	21
Croatia	21
Denmark	21

Canada and Portugal make it to the top.

Next, let us look if there is colinearity among variables. Indeed, high correlations between independent variables are likely to perturb inference.

wb_growth |>
  select(all_of(vars)) |>
  na.omit() |>
  cor() |>
  ggcorrplot(lab = TRUE, digits = 1L) +
  scale_fill_viridis_c(alpha = 0.7)

Usually, a correlation of 0.5 (in absolute value) is considered already high. A value above 0.7 is prohibitive…
So here, it seems we are fine.

Panel estimation

plm(formula = gdp_growth ~ . , 
    data = wb_growth |> 
      dplyr::select(all_of(c("country", "year", "gdp_growth", vars))) |> 
      na.omit() |>
      group_by(country) |>
      mutate(n = n()) |>
      filter(n > 10),
    effect = "twoways",
    index = c("country", "year"),
    model = "within") |>
  tidy()

term	estimate	std.error	statistic	p.value
savings_rate	0.0033133	0.0008066	4.1078015	0.0000440
inflation	-0.0003243	0.0007822	-0.4146078	0.6785385
trade	-0.0001284	0.0001632	-0.7867243	0.4316728
debt	0.0000349	0.0001578	0.2210084	0.8251415
capital_formation	0.0027687	0.0007905	3.5022791	0.0004865
pop_growth	-0.0167985	0.0042139	-3.9864333	0.0000731
RD_percap	0.0078383	0.0109675	0.7146836	0.4750100
educ_level	-0.0002141	0.0005232	-0.4092601	0.6824567
educ_spending	0.0011688	0.0044388	0.2633038	0.7923833

To detect which variables matter, we look at p-values: they indicate the probability to obtain a value as “extreme” as the one observed under the assumption that the coefficient is equal to zero (this hypothesis is called the null). Hence if a p-value is close to zero, it signals support for the assumption that there is a link (not necessarily causal) between the dependent and independent variables.

Here, the savings rate and capital formation both have significant positive coefficients…

A step back: heuristic sources of growth

(see CSV, section 7.2)

luck: initial conditions may have an impact.
geography: natural resources (minerals, grains, cattle, etc.) are a key driver of growth. Diseases are also more frequent in some parts of the globe.
culture (customary beliefs and values): they can drive economic decisions, but are also hard to measure.
institutions: property rights, labor markets, regulation.