Blog

Data and Big Data

Data is information in raw or unorganized form (such as alphabets, numbers, or symbols) that refer to, or represent, conditions, ideas, or objects. Data is limitless and present everywhere in the universe.

Big Data refers to humongous volumes of data that cannot be processed effectively with the traditional applications that exist.

                         Data                           Big Data
Data is small enough for human comprehension. In a volume and format that makes it accessible, informative and actionable.
Big Data is the data sets that are so large or complex that traditional data processing applications cannot deal with them.
In most cases, data is in range of tens or hundreds of GB (Giga Bytes).
More than few TB (Tera Bytes).
Controlled and steady data flow.
Data can arrive at very fast speeds.
Data accumulation is slow.
Enormous data can accumulate within very short periods of time.
Structured data in tabular format with fixed schema and semi structured data in JSON or XML format.
High variety data sets which include Tabular data, Text files, Images, Videos, Audio, XML, JSON, Logs, Sensor data, etc.
Contains less noise as data collected in controlled manner.
Usually quality of data not guaranteed. Rigourous data validation is required before processing.
Business Intelligence, Analysis and Reporting.
Complex data mining for prediction, recommendation, pattern finding, etc.
Historically data sets are equally valid as data represent solid business interactions.
In some cases data gets older soon. For ex: Fraud detection.
Data present within enterprise, Local servers, etc.
Mostly in distributed storages on Cloud or in external file systems.
Predictable resource allocation. Mostly vertically scalable hardware.
More agile infrastructure with horizontally scalable architecture. Load on system varies a lot.

Leave a Reply

Your email address will not be published. Required fields are marked *