Basic Steps for Designing Big Data Architecture

In my earlier post I talked about the basics of Big Data and how it can become a Future Nightmare, followed by Must Know Facts of Big Data. Today, let us talk about a very important and basic step for working with Big Data, i.e. “Big Data Architecture”.

Big data architecture is the logical and/or physical structure of how big data will be stored, accessed and managed within a big data or IT environment. It logically defines how big data solutions will work based on core components (hardware, database, software, storage) used, flow of information, security, and more. Big data architecture primarily serves as the key design reference for big data infrastructures and solutions.

Big Data Types:

Big data can be stored, acquired, processed, and analyzed in many ways. Every big data source has different characteristics, including frequency, volume, velocity, type, and veracity of data. When big data is processed and stored, additional dimensions come into play, such as governance, security, and policies.

Designing a Big Data architecture is already a complex task. Adding to that is the speed of technological innovations and competitive products in market, and this becomes quite a magnanimous task for any Big Data Architect.

Before designing big data reference architecture, the most vital step is identifying whether a particular business scenario is a Big Data Problem or not. These problems can be further categorized into types. Categorizing big data problems by type, make it easy to determine the individual characteristics of each data type. Big Data types can be categorized as follows:

Machine-Generated Data
Web and Social Data
Transaction Data
Human Generated
Biometrics

Classification of Big Data Characteristics using Big Data Type

Data from different sources have different characteristics; for example, social media data can have video, images, and unstructured text such as blog posts. Once data is classified according to its characteristics, it can easily be matched with the appropriate big data pattern. Listed below are some of the common characteristics how data is assessed and categorized.

Analysis Type: Real time Analysis or Batched Analysis

Give careful consideration to choosing the analysis type, since it affects several other decisions about products, tools, hardware, data sources, and expected data frequency. A mix of both types may be required by the use case:

Fraud Detection: Real-Time Analysis required
Trend Analysis / Business Decisions: Batch Mode Analysis

Processing Methodology: Type of technique to be applied for processing data

Selected methodology helps in choosing the appropriate Tools and Techniques for Big Data Solution.

Data Frequency and Size: Amount of data and the speed at which it will be obtained.

This characteristic of data helps in deciding the storage mechanism, format and pre-processing tools. Size and Frequency vary for different data sources:

On Demand – Social Media Data
Continuous Feed / Real Time – Weather Data, Transactional Data
Time Series – Time Based Data

Data Type: Type of Data to be processed.

Knowing the data type helps in segregation of data in the storage.

Content Format: Format of Incoming Data

Format tells us about how the incoming data needs to be processed and what tools and techniques should be used. Format could be Structured (RDBMS) or Un-Structured (Audio, Video, Images) or Semi-Structured.

Data Source: Sources of Data Generation

Identifying the Data Sources is vital in determining the scope from a business perspective. E.g. Web and Social Media, machine generated, human generated etc.

Data Consumers: List of possible consumers of processed data

Business Processes
Business Users
Enterprise Applications
Individual people in Various Business Roles
Part of process flows
Other data repositories or enterprise applications

Hardware: Hardware on which the Big Data Solution is to be implemented

Understanding the limitations of hardware helps inform the choice of Big Data Solution

6 Basic Steps of Big Data Architecture Designing:

Once we have analyzed the big data scenario of the company, characteristics of the Data and the type of Big Data Pattern, we can move to the planning of Big Data Reference Architecture. We could design the Reference architecture just by following the listed 6 Easy Steps:

Analyze the Problem:

The task to be performed at this step is similar to what have been explained in the former sections. We need to analyze whether we need the Big Data Solution or not, characteristics of the Data and the type of Big Data Pattern.

Vendor Selection:

This decision is solely made on the basis of what type of functionality we have to achieve through the tools. There are lot many vendors in the market with a very large range of tools for different tasks. It’s all up to the organization to decide what kind of tool they would like to opt for.

Deployment Strategy:

It determines whether it will be on premise, cloud based or a mix of both.

An on premise solution tends to be more secure, however the hardware maintenance would cost a lot more money, effort and time.
A cloud based solution is more cost effective in terms scalability, procurement and maintenance.
A mix deployment strategy gives us bit s of both worlds and data storing could be planned as per it’s use.

Capacity Planning:

At this step we evaluate hardware and infrastructure sizing considering the below factors:

Data Volume for One-Historical Load
Daily data ingestion volume
Retention period of Data
Data Replication for critical Data
Time period for which the cluster is sized, after which the cluster is scaled horizontally
Multi Datacenter deployment

Infrastructure Sizing:

The inferences from former step helps in infrastructure planning like type of hardware required. It also involves deciding the number of environments required. Important Factors to be considered:

Types of processing Memory or I/O intensive
Type of Disk
No of disks per machine
Memory Size HDD size
No of CPU and cores
Data retained and stored in each environment

Backup and Disaster Recovery Sizing:

Backup and disaster recovery is a very important part of planning, and involves the following considerations:

The criticality of data stored
RPO (Recovery Point Objective) and RTO (Recovery Time Objective) requirements
Active-Active or Active-Passive Disaster recovery
Multi datacenter deployment
Backup Interval (can be different for different types of data)

In my next post I will discuss about the different layer of architecture and functionalities of each one them. Till then let me know if I have left out something in planning steps through comments below.

People Who Read This Post Also Like

January 23, 2023

One COMMENT

not_your_business
switch off the annoying flashing banner at the top!

5 years ago

Simple & quick Tech solutions!

Basic Steps for Designing Big Data Architecture

Big Data Types:

Classification of Big Data Characteristics using Big Data Type

6 Basic Steps of Big Data Architecture Designing:

People Who Read This Post Also Like

How to Download Spotify Playlist Without Premium

4 Ways To Fix Google Drive Not Syncing On Windows

How to Record Your Omegle Video Chat

One COMMENT

not_your_business

Leave a Reply Cancel reply

Recent Posts

Simple & quick Tech solutions!

Browse posts by popular tags

Basic Steps for Designing Big Data Architecture

Big Data Types:

Classification of Big Data Characteristics using Big Data Type

6 Basic Steps of Big Data Architecture Designing:

People Who Read This Post Also Like

One COMMENT

not_your_business

Leave a Reply Cancel reply

Recent Posts

Subscribe & be the first to know!

Subscribe Now & Never Miss The Latest Tech Updates!