In our previous post, we looked at the overall components required for a Python data science/analytics environment. Setting up these components individually can be challenging. In this post, we look at some options to get your Python data science environment up and running quickly.
Setting up a Python Data Science environment on your machine
To setup the Python scripting environment and install the various individual Python libraries can be a pain. Thankfully, there is no need to go through that trouble.
Set up your data analytics platform in a few clicks with Anaconda
All you need to do is to head to Anaconda.com to download the Anaconda Python Data Science Platform distribution software. When you run this software, it will install the Python environment and the relevant Python data science libraries and analytics clients like Jupyter on your local machine (See the diagram above).
It also provides you with a package manager, giving you the flexibility to manage the Python libraries installed on your machine, all through a point and click interface.
Setup your database and data storage on a local machine
Postgres SQL and MySQL offer open source editions of their database software. Download them and install it on your machine.
You will need database administration software to connect and manage the tables on your database. Download the following:
- PgAdmin Tools to manage Postgres SQL databases
- MySQL Workbench to manage MySQL databases
- or 3rd party tools that allow you to perform the same admin functions and run queries across different database types
Cloud-hosted database solutions
Alternatively, you can skip the hassle of setting up the databases on your own physical machine and opt for cloud-hosted PostGres SQL and MySQL solutions offered by cloud computing platforms like Google Cloud Platform, Microsoft Azure and Amazon Web Services. These services offer some free plans but also charge depending on data usage. Be it cloud hosting or local machine, you will still need knowledge about database administration and installation.
Simple data warehouse hosted on the cloud
If you are looking for a less complex data warehouse solution, the following products are a good alternative:
- Google BigQuery from Google Cloud Platform
- Amazon Redshift from Amazon Web Services
I do recommend Google BigQuery as it is much simpler to create, maintain and query your database tables.
Cloud-hosted Python Data Science Environment
Finally, there’s also an option to go entirely cloud-based. If you are looking to quickly get started and utilise standard frequently used Python data analytics libraries, this is a highly recommended option.
Instead of installing Anaconda on your local machine, sign up for an account on Azure Machine Learning Studio. You will be able to access a virtual machine with the Python data science environment installed.
The diagram below describes the framework behind a cloud-hosted environment. It is an environment that facilitates quick setup and is great for building prototype data science projects.