Big data Testing vs. Traditional database Testing

October 14, 2016

Big Data :

Big data has emerged as a modern day technological phenomenon that encompasses a variety of arenas in terms of database testing.

In today's world of digitization, we are overloaded with data that needs to be recorded for reference. Any mishandling of data may lead to an increased number of disgruntled users. Big data comes as a solution to such problems, as it facilitates managing large datasets which are otherwise difficult to handle with old/traditional database systems. Big data basically means creating, storing, retrieval and analysis of data that is huge in terms of its volume. The strategy is to use various big data tools for performing such activities.

There has been a radical shift in the concept of database and its maintenance. Data warehouse is the term one may think of associating with big data. Data warehousing is simply about storing huge volumes of data at a place, so does big data. The idea is to manage both structured and unstructured forms of data in an utmost secure and consistent manner.

In the present day scenario, we are talking about petabytes and terabytes of data handled by entities like - Facebook, Linkedin, Amazon, Ebay and many more, on an hourly or daily basis. The primary issue is about handling data from disparate sources and maintain a consistency within the database.

Big data comes as a solution to the problem of maintaining consistency between the frontiers of the globe. The advent of big data commenced when organisations started facing huge volumes of data, its streaming and the varied data formats (numeric, symbolic, video, audio, email etc.) . the size of data grows almost exponentially.

The current situation is that a vast range of services are at our disposal over the internet, one may access any website or its information from any part of the globe. Hence, data provided by the users in some form or the other, on any occasion, has to be stored in a manner that can be accessible at any point of time by anyone (provided they have their credentials ready) irrespective of the location they are situated.

Big data solution:

Data warehouses
Massively parallel processing databases
Data mining grids
Distributed file system
Distributed databases
Cloud computing
Internet

One of the biggest change that has brought tremendous ease in handling big data is a framework named - Hadoop. Hadoop is an open source project from Apache. It is a framework that is used to distribute large datasets across various computers. Hadoop is designed to easily handle huge datasets which has high volume, velocity and variety of data and it can also detect any failures at the application layer.

Hadoop comes as a quick solution to big processing tasks like scientific analysis, business and sales planning and a wide range of other activities. Hadoop's foundation lies somewhere in Google's MapReduce framework. Hadoop got a wide recognition due to the fact that it helps to access a wide variety of data sources both structured and unstructured. Data can be fetched from varied file formats like relational tables, fixed size records etc.

Big Data Testing:

Big data testing is about verifying accuracy of data processing rather than testing individual components of a software application. Functional and performance testing are the key components of big data testing.

Types of big data testing:

Data type validation - This is to check whether the input provided by the user matches the expected number of characters, as defined in the algorithm.
Range and constraint validation - There is a certain range within which the user is allowed to input data. For instance, name field can contain at most 25 characters. So this test is to ensure that the minimum and maximum range constraint is maintained.
Code and cross-reference validation - This type of validation is to check conformity to the rules, data types and other validity constraints. The input data is cross referenced with the predefined set of rules to check if it matches the criteria.
Structured validation - It consists of a combination of different types of basic data type validation in addition to more complex algorithms. This may include testing of complex set of process operations within the system.

Traditional database:

The traditional or the ancient concept of database was simple, records were maintained in paper based systems. Things were not very complex, as the number of users/customers were very few or countable.

Traditional database management was very cumbersome and required a large number of people to be involved in the process of collecting and maintaining records of end-users. The records were mostly hand written that too on papers, so it was prone to damage, loss or theft.

However, with an advancement in technology and with growing number of users, the shift gradually moved towards a more robust and secure form of data storage. So databases like Oracle, IBM, MySQL etc. came to our rescue, which shifted towards online web services/databases. This required people using the database to have knowledge about programming languages like Java, .Net, C++ etc. to be able to interact with the backend database.

Demerits of traditional databases :

Storage capacity was definitely an issue.
It was easy for someone to mishandle data as that was easily accessible by people handling it.
Data duplicacy was a frequent phenomenon.
Searching of records took a lot of time as things were not computerised.
Shortlisting of data was quite a cumbersome task.