Software


We will be using Python 3, if you use linux, it will be easy to install the python interpreter and standard libraries on your computer.

The anaconda distribution is specifically designed for data analysis and can be installed on Windows, Mac and Linux. The default installation has most of the software that we will use.

Most of the code libraries you can install using easy_install or pip in your python distribution.
  • Code for the examples and Python notebooks for the course
  • kemlglearn (python library of the course in github, installable using pip)
  • scikit-learn (machine learning algorithms)
  • pyclustering (clustering algorithms)
  • scipy (numerical routines and data structures)
  • numpy (numerical routines and datastructures)
  • pandas (datastructures and routines for data analysis)
  • matplotlib (graphics)
  • seaborn (graphics)
  • networkx (datastructures and algorithms for graphs)
  • python-louvain (graph communities)

How to run the jupyter notebooks

  • You can install jupyter notebook and all its dependencies on your computer
  • You can upload the notebooks and the data to the Microsoft Azure Notebooks website (you will need a microsoft account)
  • You can upload the notebooks and the data to IBM's Data Scientist Workbench (you will need a google account)
  • You can install docker an the data-science notebook image that has installed jupyter and most of the python dependences