Subscribe via E-mail

Your email:

Follow StackIQ

Resources

White Paper - Boosting Retail Revenue and Efficiency with Big Data Analytics

 

GigaOM Research report - Scaling Hadoop Clusters: The Role of Cluster Management:

Download

White Paper - The StackIQ Apache Hadoop Reference Architecture:

Download

Current Articles | RSS Feed RSS Feed

Big Data 101: Some Big Data technologies you should know about

  
  
  

bigdata bigbrainstormsBig Data is the new buzz phrase in the field of database computing. It refers to very large data sets containing tens of millions of records, and generally measured in petabytes. However, Big Data is not defined solely by the size of the data set. The term is usually applied to data that is unstructured, or when there is a mix of structured and unstructured data in the set. The advance of Big Data has spurred technological advancement in analytics, cluster management, and means of storage and access. Here listed are a few of the key Big Data technologies. While this list is by no means exhaustive, it may help you get started if you are new to Big Data.

Hadoop

This open source framework from the Apache Software Foundation is based on Google’s MapReduce. Hadoop handles and distributes massive amounts of data across giant clusters of commodity servers that process and store data with no scaling limits. With Hadoop, you can run applications on thousands of nodes, processing petabytes of data. Hadoop uses a distributed file system, HDFS, which allows it to read data very quickly from the Hadoop cluster. 

MapReduce

This software framework enables developers to craft programs that can sort through and generate enormous amounts data in parallel over a wide network of processors and individual computers. MapReduce was developed by Google to increase the efficiency of indexing web pages. MapReduce excels at performing calculations on very large data sets, splitting jobs up and distributing the pieces across a number of computers (or nodes) for processing. 

HDFS

The Hadoop Distributed File System (HDFS) is designed to run on low-cost, commodity hardware, and offers fault-tolerant features to increase reliability. It was designed for, and is particularly suited to providing high-speed access to very large data sets.

Hive

Developed by Facebook, Apache Hive was built on top of Hadoop and works in conjunction with it. Apache Hive allows business intelligence (BI) applications to run queries against Hadoop clusters through a “SQL-like” bridge. Hive is now open source and allows anyone familiar with SQL to make queries against data sets in Hadoop clusters. Hive allows users to access the Hadoop cluster as if it was a traditional data store, thus increasing ease of use.

NoSQL

Non-relational, or “NoSQL” is a database format that is better suited to processing unstructured data than the quarter century old relational database management system model. NoSQL database is not built from tables, and doesn’t rely on the SQL query language for data control. NoSQL databases can be greatly optimized for quick retrieval of data. NoSQL databases are highly useful in Big Data applications when large amounts of data are needed to be stored and recalled but relationships between the data is not well defined. This new database structure allows for elastic scaling to take advantage of advances in cloud-based storage systems. Elastic scaling allows for storage across new nodes as they are added.

Cloud-based Storage

Cloud-based solutions have made rapid advancements in storage technology across the computing world. These advances are most evident in the ability to handle Big Data. Cloud-based solutions have revolutionized supply chain data gathering and sharing. Before a cloud solution, linking thousands of suppliers together was laborious and consumed substantial IT resources that were needed to address endless amounts of compatibility issues. However, when using a cloud solution, the cloud provider maintains the shared data pool forf anyone with authorized access to the network. This has greatly simplified supply chain information management systems, as well as increasing efficiency.

 

photo credit: Big_Data cc
Tags: , ,

Comments

Currently, there are no comments. Be the first to post one!
Post Comment
Name
 *
Email
 *
Website (optional)
Comment
 *

Allowed tags: <a> link, <b> bold, <i> italics

Follow Us