top of page

Data Science Life Cycle

Today, We will be looking at a basic overview of how a typical Data Science Project Life Cycle will look like. In any analytics project, the cycle fits into & adds value to the Project Planning & execution.

Based on my past experiences, I have up a small pictorial representation of how it actually works. Indeed, there can be changes to the structure but on a high level, it remains the same.


The following is the Step-By-Step Approach:

Data Collection

In most cases, the client comes up with the data for the experiment. Nevertheless, there are scenarios where we have to collect data or create data.

Data collection can be done in 3 different ways:

  • Taking data from open source forums or platforms

  • Collection of educational data that are put in various websites

  • Data Augmentation where we create data as per client need (For Ex. Data Augmentation is widely used in Image Based Projects where the same sample image is tilted,rotated,flipped etc. to create fresh data samples)

Data Understanding

In order to understand more about the data, we need to ask the right set of questions to clients & get to know the answers and assumptions. Some of the assumptions taken might be costly down the lane so proper care has to be taken before considering any assumption.Whenever we create data pipelines, understanding the core data is still more important. Sometimes, the data might have to be transformed in a different manner so that it is usable.

Data Pre-Processing

Data processing is one of the key steps in analytics experiments as they form the bottom of the pyramid. Almost all the other steps are dependent on this area as they directly affect the output.

Some of processing strategies are as follows:

  1. Handling of Skewness

  2. Missing value treatment

  3. Handling of characters & numerical values in a single variable

  4. Binning of values

  5. Statistical Transformation of variables


Model Building & Evaluation

I strongly believe that its not always Machine Learning model that provides the solution.

One must evaluate the need of a computational model rather than just basic analytics. Its not always mandated that only a machine learning model is going to provide a solution to the client's.

In one of my past experiences at a top Insurance company, the clients clearly mentioned that irrespective of technique/model/method, the objectives of the business remained the same. That is a powerful statement.

Once the idea is validated, model building practice involves lots of techniques. The basics of handling the model mechanism remains the same irrespective of the technique. The following have to be taken care while running a machine learning model.

  • Hyper-Parameter Tuning

  • Model Parameters Selection

  • Over-fitting of the model

  • ROC Evaluation

  • Accuracy/Precision/Recall/MAPE metrics has to be verified

All the above mentioned pointers helps in Model Testing & Refinement of the same.

Business Insights

The most important part of the entire project work is bringing value to the customers. Customers are completely fine with any tech stack or models which has least vulnerable areas & provide the best insights. The business decisions taken on top of these models must ensure that it is improving the Customer Experience but not otherwise.

Some of the common strategies around bringing insights are:

  • Building a visualization that brings the KPI required for the Client

  • Regular revision of models if required & returning back revised insights

  • Using Model Coefficients as a comparative study between different Business variables

  • Use of % metrics always enhances the study as its easier to take decisions

2 views0 comments


bottom of page