Movies
- using the movies database, find the 10 closest neighbors of “Avatar”.
- build a decision tree that seeks to explain the earnings of the movies with other variables (avoid title, director, actors in the model).
Clothing data
In this dataset, the interesting variable is RESPONSERATE. It’s the one we want to predict. Indeed, it’s missing for the last customer (#1001).
- Format the data appropriately: scale all variables so that they are comparable.
- Perform a k-NN search of the last customer (#1001).
- Try to predict the RESPONSERATE for this customer via some average value. The true value is 10%!
- Perform a k-means clustering of the clients over some columns (potentially all, except the first one (KEY) and RESPONSERATE).
- Plot the clusters and look at what happens when the \(x\)-axis and \(y\)-axis changes.
- Build a decision tree that seeks to explain RESPONSERATE with other variables (all but first and last of course).
Gapminder
- Pick your native country. In the dataset on the year 2007, find the 5 countries that are closest to it over the following criteria: population, gdpPercap and lifeExp.
- Over these variables, perform a \(k\)-means clustering.
- Build a tree that seeks to explain lifeExp with population and gdpPercap.
LS0tCnRpdGxlOiAiUzc6IEV4ZXJjaXNlcyIKb3V0cHV0OiBodG1sX25vdGVib29rCi0tLQoKIyMgTW92aWVzCgotIHVzaW5nIHRoZSBtb3ZpZXMgZGF0YWJhc2UsIGZpbmQgdGhlIDEwIGNsb3Nlc3QgbmVpZ2hib3JzIG9mICIqQXZhdGFyKiIuICAgIAotIGJ1aWxkIGEgZGVjaXNpb24gdHJlZSB0aGF0IHNlZWtzIHRvIGV4cGxhaW4gdGhlICoqZWFybmluZ3MqKiBvZiB0aGUgbW92aWVzIHdpdGggb3RoZXIgdmFyaWFibGVzIChhdm9pZCB0aXRsZSwgZGlyZWN0b3IsIGFjdG9ycyBpbiB0aGUgbW9kZWwpLiAKCmBgYHtyfQoKYGBgCgoKIyMgQ2xvdGhpbmcgZGF0YQoKSW4gdGhpcyBkYXRhc2V0LCB0aGUgaW50ZXJlc3RpbmcgdmFyaWFibGUgaXMgKipSRVNQT05TRVJBVEUqKi4gSXQncyB0aGUgb25lIHdlIHdhbnQgdG8gcHJlZGljdC4gCkluZGVlZCwgaXQncyBtaXNzaW5nIGZvciB0aGUgbGFzdCBjdXN0b21lciAoIzEwMDEpLgoKMS4gRm9ybWF0IHRoZSBkYXRhIGFwcHJvcHJpYXRlbHk6IHNjYWxlIGFsbCB2YXJpYWJsZXMgc28gdGhhdCB0aGV5IGFyZSBjb21wYXJhYmxlLiAgICAgCjIuIFBlcmZvcm0gYSAqayotTk4gc2VhcmNoIG9mIHRoZSBsYXN0IGN1c3RvbWVyICgjMTAwMSkuICAgCjMuIFRyeSB0byBwcmVkaWN0IHRoZSAqKlJFU1BPTlNFUkFURSoqIGZvciB0aGlzIGN1c3RvbWVyIHZpYSBzb21lIGF2ZXJhZ2UgdmFsdWUuIFRoZSB0cnVlIHZhbHVlIGlzIDEwJSEgICAgCjQuIFBlcmZvcm0gYSAqayotbWVhbnMgY2x1c3RlcmluZyBvZiB0aGUgY2xpZW50cyBvdmVyIHNvbWUgY29sdW1ucyAocG90ZW50aWFsbHkgYWxsLCBleGNlcHQgdGhlIGZpcnN0IG9uZSAoKipLRVkqKikgYW5kICoqUkVTUE9OU0VSQVRFKiopLiAgIAo1LiBQbG90IHRoZSBjbHVzdGVycyBhbmQgbG9vayBhdCB3aGF0IGhhcHBlbnMgd2hlbiB0aGUgJHgkLWF4aXMgYW5kICR5JC1heGlzIGNoYW5nZXMuICAgIAo2LiBCdWlsZCBhIGRlY2lzaW9uIHRyZWUgdGhhdCBzZWVrcyB0byBleHBsYWluICoqUkVTUE9OU0VSQVRFKiogd2l0aCBvdGhlciB2YXJpYWJsZXMgKGFsbCBidXQgZmlyc3QgYW5kIGxhc3Qgb2YgY291cnNlKS4gCgpgYGB7cn0KCmBgYAoKCiMjIEdhcG1pbmRlcgoKMS4gUGljayB5b3VyIG5hdGl2ZSBjb3VudHJ5LiBJbiB0aGUgZGF0YXNldCBvbiB0aGUgeWVhciAyMDA3LCBmaW5kIHRoZSA1IGNvdW50cmllcyB0aGF0IGFyZSBjbG9zZXN0IHRvIGl0IG92ZXIgdGhlIGZvbGxvd2luZyBjcml0ZXJpYTogKipwb3B1bGF0aW9uKiosICoqZ2RwUGVyY2FwKiogYW5kICoqbGlmZUV4cCoqLiAgIAoyLiBPdmVyIHRoZXNlIHZhcmlhYmxlcywgcGVyZm9ybSBhICRrJC1tZWFucyBjbHVzdGVyaW5nLiAgIAozLiBCdWlsZCBhIHRyZWUgdGhhdCBzZWVrcyB0byBleHBsYWluICoqbGlmZUV4cCoqIHdpdGggKipwb3B1bGF0aW9uKiogYW5kICoqZ2RwUGVyY2FwKiou