Movies

var_dist <- c("duration", "budget", "earnings", "imdb_score")
movies_2 <- movies %>%
    mutate(duration = (duration-min(duration)) / max(duration),
           budget = (log(budget)-min(log(budget))) / max(log(budget)),
           earnings = (log(earnings+1)-min(log(earnings+1))) / max(log(earnings+1)), # 0 earnings!
           imdb_score = (imdb_score-min(imdb_score)) / max(imdb_score)) 
target <- movies_2[1,]
neighbors <- get.knnx(data = movies_2 %>% select(var_dist),   # Data source: be careful with the columns!
                      query = target %>% select(var_dist),  # Target (with the right columns)
                      k = 10)                               # Nb of neighbors
movies[neighbors$nn.index,]

Clothing data

In this dataset, the interesting variable is RESPONSERATE. It’s the one we want to predict. Indeed, it’s missing for the last customer (#1001).

  1. Format the data appropriately: scale all variables so that they are comparable.
  2. Perform a k-NN search of the last customer (#1001).
  3. Try to predict the RESPONSERATE for this customer via some average value. The true value is 10%!
  4. Perform a k-means clustering of the clients over some columns (potentially all, except the first one (KEY) and RESPONSERATE).
  5. Plot the clusters and look at what happens when the \(x\)-axis and \(y\)-axis changes.
  6. Build a decision tree that seeks to explain RESPONSERATE with other variables (all but first and last of course).

Gapminder

  1. Pick your native country. In the dataset on the year 2007, find the 5 countries that are closest to it over the following criteria: population, gdpPercap and lifeExp.
  2. Over these variables, perform a \(k\)-means clustering.
  3. Build a tree that seeks to explain lifeExp with population and gdpPercap.
LS0tCnRpdGxlOiAiUzc6IEV4ZXJjaXNlcyIKb3V0cHV0OiBodG1sX25vdGVib29rCi0tLQoKIyMgTW92aWVzCgotIHVzaW5nIHRoZSBtb3ZpZXMgZGF0YWJhc2UsIGZpbmQgdGhlIDEwIGNsb3Nlc3QgbmVpZ2hib3JzIG9mICIqQXZhdGFyKiIuICAgIAotIGJ1aWxkIGEgZGVjaXNpb24gdHJlZSB0aGF0IHNlZWtzIHRvIGV4cGxhaW4gdGhlICoqZWFybmluZ3MqKiBvZiB0aGUgbW92aWVzIHdpdGggb3RoZXIgdmFyaWFibGVzIChhdm9pZCB0aXRsZSwgZGlyZWN0b3IsIGFjdG9ycyBpbiB0aGUgbW9kZWwpLiAKCmBgYHtyfQp2YXJfZGlzdCA8LSBjKCJkdXJhdGlvbiIsICJidWRnZXQiLCAiZWFybmluZ3MiLCAiaW1kYl9zY29yZSIpCm1vdmllc18yIDwtIG1vdmllcyAlPiUKICAgIG11dGF0ZShkdXJhdGlvbiA9IChkdXJhdGlvbi1taW4oZHVyYXRpb24pKSAvIG1heChkdXJhdGlvbiksCiAgICAgICAgICAgYnVkZ2V0ID0gKGxvZyhidWRnZXQpLW1pbihsb2coYnVkZ2V0KSkpIC8gbWF4KGxvZyhidWRnZXQpKSwKICAgICAgICAgICBlYXJuaW5ncyA9IChsb2coZWFybmluZ3MrMSktbWluKGxvZyhlYXJuaW5ncysxKSkpIC8gbWF4KGxvZyhlYXJuaW5ncysxKSksICMgMCBlYXJuaW5ncyEKICAgICAgICAgICBpbWRiX3Njb3JlID0gKGltZGJfc2NvcmUtbWluKGltZGJfc2NvcmUpKSAvIG1heChpbWRiX3Njb3JlKSkgCnRhcmdldCA8LSBtb3ZpZXNfMlsxLF0KbmVpZ2hib3JzIDwtIGdldC5rbm54KGRhdGEgPSBtb3ZpZXNfMiAlPiUgc2VsZWN0KHZhcl9kaXN0KSwgICAjIERhdGEgc291cmNlOiBiZSBjYXJlZnVsIHdpdGggdGhlIGNvbHVtbnMhCiAgICAgICAgICAgICAgICAgICAgICBxdWVyeSA9IHRhcmdldCAlPiUgc2VsZWN0KHZhcl9kaXN0KSwgICMgVGFyZ2V0ICh3aXRoIHRoZSByaWdodCBjb2x1bW5zKQogICAgICAgICAgICAgICAgICAgICAgayA9IDEwKSAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAjIE5iIG9mIG5laWdoYm9ycwptb3ZpZXNbbmVpZ2hib3JzJG5uLmluZGV4LF0KYGBgCgoKIyMgQ2xvdGhpbmcgZGF0YQoKSW4gdGhpcyBkYXRhc2V0LCB0aGUgaW50ZXJlc3RpbmcgdmFyaWFibGUgaXMgKipSRVNQT05TRVJBVEUqKi4gSXQncyB0aGUgb25lIHdlIHdhbnQgdG8gcHJlZGljdC4gCkluZGVlZCwgaXQncyBtaXNzaW5nIGZvciB0aGUgbGFzdCBjdXN0b21lciAoIzEwMDEpLgoKMS4gRm9ybWF0IHRoZSBkYXRhIGFwcHJvcHJpYXRlbHk6IHNjYWxlIGFsbCB2YXJpYWJsZXMgc28gdGhhdCB0aGV5IGFyZSBjb21wYXJhYmxlLiAgICAgCjIuIFBlcmZvcm0gYSAqayotTk4gc2VhcmNoIG9mIHRoZSBsYXN0IGN1c3RvbWVyICgjMTAwMSkuICAgCjMuIFRyeSB0byBwcmVkaWN0IHRoZSAqKlJFU1BPTlNFUkFURSoqIGZvciB0aGlzIGN1c3RvbWVyIHZpYSBzb21lIGF2ZXJhZ2UgdmFsdWUuIFRoZSB0cnVlIHZhbHVlIGlzIDEwJSEgICAgCjQuIFBlcmZvcm0gYSAqayotbWVhbnMgY2x1c3RlcmluZyBvZiB0aGUgY2xpZW50cyBvdmVyIHNvbWUgY29sdW1ucyAocG90ZW50aWFsbHkgYWxsLCBleGNlcHQgdGhlIGZpcnN0IG9uZSAoKipLRVkqKikgYW5kICoqUkVTUE9OU0VSQVRFKiopLiAgIAo1LiBQbG90IHRoZSBjbHVzdGVycyBhbmQgbG9vayBhdCB3aGF0IGhhcHBlbnMgd2hlbiB0aGUgJHgkLWF4aXMgYW5kICR5JC1heGlzIGNoYW5nZXMuICAgIAo2LiBCdWlsZCBhIGRlY2lzaW9uIHRyZWUgdGhhdCBzZWVrcyB0byBleHBsYWluICoqUkVTUE9OU0VSQVRFKiogd2l0aCBvdGhlciB2YXJpYWJsZXMgKGFsbCBidXQgZmlyc3QgYW5kIGxhc3Qgb2YgY291cnNlKS4gCgpgYGB7cn0KCmBgYAoKCiMjIEdhcG1pbmRlcgoKMS4gUGljayB5b3VyIG5hdGl2ZSBjb3VudHJ5LiBJbiB0aGUgZGF0YXNldCBvbiB0aGUgeWVhciAyMDA3LCBmaW5kIHRoZSA1IGNvdW50cmllcyB0aGF0IGFyZSBjbG9zZXN0IHRvIGl0IG92ZXIgdGhlIGZvbGxvd2luZyBjcml0ZXJpYTogKipwb3B1bGF0aW9uKiosICoqZ2RwUGVyY2FwKiogYW5kICoqbGlmZUV4cCoqLiAgIAoyLiBPdmVyIHRoZXNlIHZhcmlhYmxlcywgcGVyZm9ybSBhICRrJC1tZWFucyBjbHVzdGVyaW5nLiAgIAozLiBCdWlsZCBhIHRyZWUgdGhhdCBzZWVrcyB0byBleHBsYWluICoqbGlmZUV4cCoqIHdpdGggKipwb3B1bGF0aW9uKiogYW5kICoqZ2RwUGVyY2FwKiou