Movies
- using the movies database, find the 10 closest neighbors of “Avatar”.
- build a decision tree that seeks to explain the earnings of the movies with other variables (avoid title, director, actors in the model).
var_dist <- c("duration", "budget", "earnings", "imdb_score")
movies_2 <- movies %>%
mutate(duration = (duration-min(duration)) / max(duration),
budget = (log(budget)-min(log(budget))) / max(log(budget)),
earnings = (log(earnings+1)-min(log(earnings+1))) / max(log(earnings+1)), # 0 earnings!
imdb_score = (imdb_score-min(imdb_score)) / max(imdb_score))
target <- movies_2[1,]
neighbors <- get.knnx(data = movies_2 %>% select(var_dist), # Data source: be careful with the columns!
query = target %>% select(var_dist), # Target (with the right columns)
k = 10) # Nb of neighbors
movies[neighbors$nn.index,]
Clothing data
In this dataset, the interesting variable is RESPONSERATE. It’s the one we want to predict. Indeed, it’s missing for the last customer (#1001).
- Format the data appropriately: scale all variables so that they are comparable.
- Perform a k-NN search of the last customer (#1001).
- Try to predict the RESPONSERATE for this customer via some average value. The true value is 10%!
- Perform a k-means clustering of the clients over some columns (potentially all, except the first one (KEY) and RESPONSERATE).
- Plot the clusters and look at what happens when the \(x\)-axis and \(y\)-axis changes.
- Build a decision tree that seeks to explain RESPONSERATE with other variables (all but first and last of course).
Gapminder
- Pick your native country. In the dataset on the year 2007, find the 5 countries that are closest to it over the following criteria: population, gdpPercap and lifeExp.
- Over these variables, perform a \(k\)-means clustering.
- Build a tree that seeks to explain lifeExp with population and gdpPercap.
LS0tCnRpdGxlOiAiUzc6IEV4ZXJjaXNlcyIKb3V0cHV0OiBodG1sX25vdGVib29rCi0tLQoKIyMgTW92aWVzCgotIHVzaW5nIHRoZSBtb3ZpZXMgZGF0YWJhc2UsIGZpbmQgdGhlIDEwIGNsb3Nlc3QgbmVpZ2hib3JzIG9mICIqQXZhdGFyKiIuICAgIAotIGJ1aWxkIGEgZGVjaXNpb24gdHJlZSB0aGF0IHNlZWtzIHRvIGV4cGxhaW4gdGhlICoqZWFybmluZ3MqKiBvZiB0aGUgbW92aWVzIHdpdGggb3RoZXIgdmFyaWFibGVzIChhdm9pZCB0aXRsZSwgZGlyZWN0b3IsIGFjdG9ycyBpbiB0aGUgbW9kZWwpLiAKCmBgYHtyfQp2YXJfZGlzdCA8LSBjKCJkdXJhdGlvbiIsICJidWRnZXQiLCAiZWFybmluZ3MiLCAiaW1kYl9zY29yZSIpCm1vdmllc18yIDwtIG1vdmllcyAlPiUKICAgIG11dGF0ZShkdXJhdGlvbiA9IChkdXJhdGlvbi1taW4oZHVyYXRpb24pKSAvIG1heChkdXJhdGlvbiksCiAgICAgICAgICAgYnVkZ2V0ID0gKGxvZyhidWRnZXQpLW1pbihsb2coYnVkZ2V0KSkpIC8gbWF4KGxvZyhidWRnZXQpKSwKICAgICAgICAgICBlYXJuaW5ncyA9IChsb2coZWFybmluZ3MrMSktbWluKGxvZyhlYXJuaW5ncysxKSkpIC8gbWF4KGxvZyhlYXJuaW5ncysxKSksICMgMCBlYXJuaW5ncyEKICAgICAgICAgICBpbWRiX3Njb3JlID0gKGltZGJfc2NvcmUtbWluKGltZGJfc2NvcmUpKSAvIG1heChpbWRiX3Njb3JlKSkgCnRhcmdldCA8LSBtb3ZpZXNfMlsxLF0KbmVpZ2hib3JzIDwtIGdldC5rbm54KGRhdGEgPSBtb3ZpZXNfMiAlPiUgc2VsZWN0KHZhcl9kaXN0KSwgICAjIERhdGEgc291cmNlOiBiZSBjYXJlZnVsIHdpdGggdGhlIGNvbHVtbnMhCiAgICAgICAgICAgICAgICAgICAgICBxdWVyeSA9IHRhcmdldCAlPiUgc2VsZWN0KHZhcl9kaXN0KSwgICMgVGFyZ2V0ICh3aXRoIHRoZSByaWdodCBjb2x1bW5zKQogICAgICAgICAgICAgICAgICAgICAgayA9IDEwKSAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAjIE5iIG9mIG5laWdoYm9ycwptb3ZpZXNbbmVpZ2hib3JzJG5uLmluZGV4LF0KYGBgCgoKIyMgQ2xvdGhpbmcgZGF0YQoKSW4gdGhpcyBkYXRhc2V0LCB0aGUgaW50ZXJlc3RpbmcgdmFyaWFibGUgaXMgKipSRVNQT05TRVJBVEUqKi4gSXQncyB0aGUgb25lIHdlIHdhbnQgdG8gcHJlZGljdC4gCkluZGVlZCwgaXQncyBtaXNzaW5nIGZvciB0aGUgbGFzdCBjdXN0b21lciAoIzEwMDEpLgoKMS4gRm9ybWF0IHRoZSBkYXRhIGFwcHJvcHJpYXRlbHk6IHNjYWxlIGFsbCB2YXJpYWJsZXMgc28gdGhhdCB0aGV5IGFyZSBjb21wYXJhYmxlLiAgICAgCjIuIFBlcmZvcm0gYSAqayotTk4gc2VhcmNoIG9mIHRoZSBsYXN0IGN1c3RvbWVyICgjMTAwMSkuICAgCjMuIFRyeSB0byBwcmVkaWN0IHRoZSAqKlJFU1BPTlNFUkFURSoqIGZvciB0aGlzIGN1c3RvbWVyIHZpYSBzb21lIGF2ZXJhZ2UgdmFsdWUuIFRoZSB0cnVlIHZhbHVlIGlzIDEwJSEgICAgCjQuIFBlcmZvcm0gYSAqayotbWVhbnMgY2x1c3RlcmluZyBvZiB0aGUgY2xpZW50cyBvdmVyIHNvbWUgY29sdW1ucyAocG90ZW50aWFsbHkgYWxsLCBleGNlcHQgdGhlIGZpcnN0IG9uZSAoKipLRVkqKikgYW5kICoqUkVTUE9OU0VSQVRFKiopLiAgIAo1LiBQbG90IHRoZSBjbHVzdGVycyBhbmQgbG9vayBhdCB3aGF0IGhhcHBlbnMgd2hlbiB0aGUgJHgkLWF4aXMgYW5kICR5JC1heGlzIGNoYW5nZXMuICAgIAo2LiBCdWlsZCBhIGRlY2lzaW9uIHRyZWUgdGhhdCBzZWVrcyB0byBleHBsYWluICoqUkVTUE9OU0VSQVRFKiogd2l0aCBvdGhlciB2YXJpYWJsZXMgKGFsbCBidXQgZmlyc3QgYW5kIGxhc3Qgb2YgY291cnNlKS4gCgpgYGB7cn0KCmBgYAoKCiMjIEdhcG1pbmRlcgoKMS4gUGljayB5b3VyIG5hdGl2ZSBjb3VudHJ5LiBJbiB0aGUgZGF0YXNldCBvbiB0aGUgeWVhciAyMDA3LCBmaW5kIHRoZSA1IGNvdW50cmllcyB0aGF0IGFyZSBjbG9zZXN0IHRvIGl0IG92ZXIgdGhlIGZvbGxvd2luZyBjcml0ZXJpYTogKipwb3B1bGF0aW9uKiosICoqZ2RwUGVyY2FwKiogYW5kICoqbGlmZUV4cCoqLiAgIAoyLiBPdmVyIHRoZXNlIHZhcmlhYmxlcywgcGVyZm9ybSBhICRrJC1tZWFucyBjbHVzdGVyaW5nLiAgIAozLiBCdWlsZCBhIHRyZWUgdGhhdCBzZWVrcyB0byBleHBsYWluICoqbGlmZUV4cCoqIHdpdGggKipwb3B1bGF0aW9uKiogYW5kICoqZ2RwUGVyY2FwKiou