Python Data Science Environment Setup Options – Getting Started

In our previous post, we looked at the overall components required for a Python data science/analytics environment. Setting up these components individually can be challenging. In this post, we look at some options to get your Python data science environment up and running quickly.

Setting up a Python Data Science environment on your machine

Setting up your Python data science environment on your local machine

To setup the Python scripting environment and install the various individual Python libraries can be a pain. Thankfully, there is no need to go through that trouble.

Set up your data analytics platform in a few clicks with Anaconda

All you need to do is to head to Anaconda.com to download the Anaconda Python Data Science Platform distribution software. When you run this software, it will install the Python environment and the relevant Python data science libraries and analytics clients like Jupyter on your local machine (See the diagram above).

It also provides you with a package manager, giving you the flexibility to manage the Python libraries installed on your machine, all through a point and click interface.

Setup your database and data storage on a local machine

Postgres SQL and MySQL offer open source editions of their database software. Download them and install it on your machine.

You will need database administration software to connect and manage the tables on your database. Download the following:

Cloud-hosted database solutions

Data analytics environment with AWSAlternatively, you can skip the hassle of setting up the databases on your own physical machine and opt for cloud-hosted PostGres SQL and MySQL solutions offered by cloud computing platforms like Google Cloud Platform, Microsoft Azure and Amazon Web Services. These services offer some free plans but also charge depending on data usage. Be it cloud hosting or local machine, you will still need knowledge about database administration and installation.

Simple data warehouse hosted on the cloud

If you are looking for a less complex data warehouse solution, the following products are a good alternative:

  • Google BigQuery from Google Cloud Platform
  • Amazon Redshift from Amazon Web Services

I do recommend Google BigQuery as it is much simpler to create, maintain and query your database tables.

Cloud-hosted Python Data Science Environment

Finally, there’s also an option to go entirely cloud-based. If you are looking to quickly get started and utilise standard frequently used Python data analytics libraries, this is a highly recommended option.

Instead of installing Anaconda on your local machine, sign up for an account on Azure Machine Learning Studio. You will be able to access a virtual machine with the Python data science environment installed.

The diagram below describes the framework behind a cloud-hosted environment. It is an environment that facilitates quick setup and is great for building prototype data science projects.


About The Author