Setting up a data analytics environment that runs Python

Setting up your data analytics environment

Before we start analysing our data, we need to set up a data analytics environment and ensure that we have the right software and tools to get started.

In this post, I will look at a setting up a data analytics environment that runs on Python. Python is a computer programming language and enables one to develop computer applications. It has recently become more popular in the data science/data analytics field.

The Python Developer community have built open source data analysis, machine learning and scientific computing modules and libraries. Using Python, Data scientists and analysts are able to easily access these modules to analyse and apply predictive algorithms on their data.

Setting up a Python Data Analytics Environment

The diagram provides an overview of what a Python data analytics environment looks like.

Setting up a data analytics environment that runs Python

Data environment for analytics

You will want any of the following databases installed: PostGres SQL, MySQL or cloud-hosted databases like Google BigQuery. Storage space for text files or Excel files containing data may also be required.

Scripting environment for analytics

Python for data analysisFor your computer to interpret and execute Python scripts, you will need to have the Python scripting environment installed on your computer.

Installation of Python analytics libraries, modules and packages

Once your scripting environment is setup, you will have access to the a wide range of Python libraries/modules that will make your analytics work easier and more efficient.

Apply these modules on your data and you will be able to run various data aggregation, data manipulation and statistical computing functions.

  • Pandas DataFrames
    Store data in a 2-dimensional data structure that enables fast labelling, manipulation, aggregation and filtering of data.
  • Numpy and Scipy
    Transform your data to N-dimensional objects to perform complex scientific computing calculations
  • Scikit Learn
    Access a library of machine learning modules. Run your prepared data against these modules, get deep insights and learn and forecast
  • ggplot
    package that helps your plot and visualise your data

Execute Python scripts, access Python libraries and view outputs through web analytics client, Jupyter Notebook

Jupyter is a browser based app that allows you to write and execute Python code. You can code scripts that

  • access data in your data environment,
  • run your data against Python libraries and modules
  • view output (such as charted and printed output) of your Python scripts

Installation of dashboard applications to produce custom reports

You can build visually appealing and interactive custom reports with tools like Google Data Studio and Qlik. These tools are able to access and load data from your data environment. Google Data Studio is a cloud hosted app while Qlik is a free app that you can download and install on your desktop.

In future posts, I will look into how to set up your data analytics environment quickly.

About The Author