Data Analytics - What is it?

You’ve likely heard something about the massive amount of data that is generated on a daily basis all around the world. Tech companies seem to have an unquenchable thirst for your data. But why?

Data Analytics - What is it?
Photo by path digital / Unsplash

I am now a Teacher and Student

I made what I think has been the best move in my career a little over three years ago now. Up until then had done nearly five years of full-stack web development, followed by two years of front-end web development. Then, somewhat on a whim, I applied for a new job within the company I was already working for.

The job listing was posted by a development group manager within the company looking for a Sr. Software Engineer to join the analytics/reporting team. Incidentally, she was one of the leaders I was interviewed by when I first joined the company in 2018. Thankfully, my second interview with her was even easier than the first, and my transfer to the new team was as fast as she could make it happen.

On this team, I work with data. A lot of it. While our group is not directly responsible for the collection and management of analytics data (that is handled by another team), the group I am a part of is responsible for the meaningful organization and aggregation of that data for a huge number of reports, data export, among other things.

During my time with this team, I have discovered that I love working with servers, architecture, and large amounts of data. The problems that need to be solved are fascinating to me, particularly around dealing with large amounts of data efficiently. Further, when seeing how much data is available, I can’t help but wonder what else we could do with it.

I have recently begun pursuing a Master’s degree in my field for Data Analytics. Having so much data available and not understanding its potential is something I want to address. I know how to write code - I’ve been doing it for over ten years. Now I want to learn how to solve more business problems

As a part of my study, I want to share the things I learn as I go along so you can come with me and see if data analytics is something that might interest you too. It will be interesting to be both enrolled in classes while teaching undergrad students at the same time. I look forward to the experience and the insights I can find experiencing life as a formal student once again.

Why so much data?

You’ve likely heard something about the massive amount of data that is generated on a daily basis all around the world. Tech companies seem to have an unquenchable thirst for your data.

But why? Why hoard so much data? Why is that that we hear things like “data rules the world” these days?

It comes down to intelligence. Not the “how smart you are” intelligence, but the other definition - information that can influence action.

Think about it - if you want to know how a piece of software is doing, what’s the most efficient way to get information about it? Not from hearing about issues from customers that’s for sure. It’s about the information that you can collect yourself. As a software engineer myself, I can attest to the value of good informative logs vs. a customer complaint that says “the app doesn’t work sometimes.” Not much you can do with that.

But it goes deeper than that. Much deeper. Data analytics influence the direction of billion-dollar companies. It influences governments and policies. Essentially, data influences just about every business decision that is made anymore. But how are these analyses done? That’s what we’re going to take a look at first.

The Data Analytics Journey

As one might imagine, you cannot simply just look at a bunch of data and know what to do with it. Often, you don’t even know what information lies in the data - you just know that you have a ton of it. There are a lot of options available to you, but there is a general process that is followed when trying to gain intelligence about data. This is referred to as the Data Analytics Life Cycle.

The Data Analytics Lifecycle

The data analytics life cycle outlines various steps or phases that can take place over the course of a data analytics project. Depending on the context, this could be for a single analytics project, or it could be an ongoing process for an ever-changing data set. There are generally 7 phases to the life cycle:

  • Discovery
  • Data Acquisition
  • Data Cleaning
  • Data Exploration
  • Predictive Modeling
  • Data Mining
  • Representation and reporting

We’ll cover each phase at a high level. Note again that there is not usually a specific time/order that these phases are executed, with some exceptions. You cannot report on a data analytics project until you have actually done it. Barring that though, the process tends to be more iterative, particularly in cases where the analysis is ongoing.

For simplicity though, we’ll talk about each phase as if we were going in order on a single project.

The Discovery Phase

The first phase of the data analytics life cycle is the discovery phase, sometimes also referred to as the “business understanding” or “planning” phase. This is when you, along with any project stakeholders (those interested in the outcome of the analysis) will determine what the goals of the project are.

In addition to deciding what questions a project is meant to answer, this is the time when project resources will also be determined, along with any additional needs of the stakeholders, like formatting, the manner of reporting, etc. Once everything is in place, the project is ready to begin.

Data Acquisition

Just like it sounds, once you have your goals and objective established, it is time to go and get the data. The ways in which data is gathered vary widely, as do the kinds of data. The kind of data you collect and the way you collect it will largely depend on the objectives of the project.

For example, if you are looking for sales information and insights, you would potentially pull all of the data you need from existing databases that you have available. Or, if you are working on answering a question about a piece of software, then you could start storing information in the form of logs with relevant information that you will need for your analysis. If you are a large tech company, you could just log the movements of every person on the planet with the GPS on the phones in their pocket for whatever you want. Creepy? A bit. But that’s how data is collected sometimes.

Data Cleaning

Just as important as having data to work with is making sure you have data that you can actually use. Often, the data that is available or collected contains way more information than you need if you are trying to answer a very specific question.

Let’s take sales data for example. If we collect a large amount of data regarding the sales of products for a company, and we want to do an analysis on the kinds of things that were sold, then there is likely to be some data that we do not need. We may not care about the specific prices of data. We may not care about the time of day the items were purchased. Data is the process of traversing the data that you have acquired and stripping out anything you do not need.

If you are pulling data from multiple sources, the data cleaning phase is also the time when you would arrange the data so it is consistent. If you are pulling from multiple sources, such as multiple database tables, it is highly likely that the data is not all going to be formatted in exactly the same way. This makes any kind of analysis much more difficult, and potentially more error-prone.

By cleaning the data, you can get a single representation of clear, concise data that will help you answer the specific objectives that were outlined during the discovery phase of the process.

Data Exploration

Once you have collected enough data to analyze, you first have to learn about it. Just because you have a trove of data does not mean that you are going to instantly have the answers you are looking for. Real statistical and data analytics is a lot more complicated than that.

This is the phase where an analyst would start exploring the data. What kind of correlations can be found in the data? Do different data points belong to different groups or classifications? Are there any groups or classifications at all?

A good analyst will make sure they understand the data that they are dealing with before they start trying to model the data or find insights about it. Without this insight, an analyst would not be able to draw meaningful or even correct information from the data.

There are a lot of tools that can aid with data exploration like data visualization software, histograms, and other charts. This, however, is not where the mane analysis is done. Exploration will help you understand and describe the data, but what about using the data to influence business decisions? That’s where Predictive Modeling comes in.

Predictive Modeling

Predictive modeling is the part that I think most of us think about when we hear the term “Data Analytics”. This is the phase where we move beyond just describing the data, and try to create models that enable us to make predictions regarding events or outcomes we have an interest in.

Some examples of questions that predictive modeling could answer are these:

“Based on the number of positive test results of a disease, how can we expect it to spread?”

“Based on the number of sales during the year, when can we predict our website will see the most traffic?”

Having answers to these questions can help businesses and organizations prepare for the future. A hospital can schedule more staff to be on hand when they expect to see more cases of the flu for example. A company can know when to bolster its server infrastructure to handle an influx of web traffic during peak sale times to avoid outages.

By answering these questions, companies, and organizations and optimize their business practices, and make decisions based on hard data, rather than just having to guess.

Data Mining

Data mining and predictive modeling are two phases that can go hand in hand with one another. They both are looking at data to look for patterns to identify and/or classify, which we could use to further analyze or gain insights from.

The difference with data mining is that it tends to be applied more to really large sets of data. Often, machine learning is employed to help look for patterns. The creation of training and testing data sets is done during the data mining phase which can then be used to generate predictive models from.

This is a phase that is of particular interest to me. Sometimes, there is so much data that there is no way to know exactly what you have just by throwing together some histograms. It is only by unleashing computing power on it that we can begin to understand what you have in a data set.

Representation and Reporting

The final phase of a project is when an analyst tells the story of the data to the stakeholders. Think of giving a presentation, or completing a report.

This is when the results of the project are presented to the stakeholders in a manner from which they can then make business decisions. The exact way this is done depends on the project and the objectives of the project. Sometimes it could be a report, sometimes it could be a new software tool that can be used to gain real-time insights into business interests. It all depends on the project.

Just scratching the surface

There is a ton more to be learned. What kinds of questions can you answer with data analytics? What kinds and forms of data can you use?

There is also a large selection of tools at your disposal to complete and assist with data analytics tasks.

Next time, we’ll explore some of the analytics tools and techniques that are available to analysts.