Enormous amounts of data and rapid data availability is a technical challenge for data-driven companies. The larger the amount of data, the more expensive the storage and the slower the data processing. Hadoop promises to solve this Big Data problem: intensive computing processes with huge amounts of data are distributed to computer clusters in order to enable faster and more stable work.

Distributed File System

Data distribution with the Hadoop Distributed File System (HDFS) enables high data throughput

YARN Resources Management

Yet Another Resource Negotiator (YARN) enables job scheduling and cluster resources management

Parallel Data Processing

MapReduce parallel processing through splitting and distributing large amounts of data for processing across distributed locations

Hadoop is a free framework from Apache. Hadoop offers companies a solution, with which they can quickly process huge amounts of data. Based on the MapReduce algorithm, Douglas Cutting intended to develop an Open Source search engine, which could be made available to all. Hadoop addresses the challenges posed by the “Big Three”: Variety, Volume and Velocity.

The Open Source framework Hadoop can be programmed individually. Basically, it is used to group numerous computers into clusters, in order to store enormous amounts of data and process them at high velocity. The speed results from the utilization of the nodes in the cluster, which are calculated in a user-based manner for the respective task in their capacity. In the basic version, Hadoop consists of four components (Hadoop Common, Hadoop DFS / HDFS, MapReduce and YARN). The core of the application is the MapReduce algorithm. It ensures a separation of the computing tasks, which are directed to different points for processing. In an intermediate step, the results are combined and subsequently evaluated in the reduction phase.

Benefits of Hadoop

Uncomplicated Data Storage

Data can easily be stored and distributed in large amounts across clusters

Fast Data Processing

Data processing is simplified and accelerated through distribution of data across large clusters.

Efficient Data Analysis

Thanks to the parallel processing of data, companies save a lot of time in evaluations.

Big Data

The combination of the MapReduce Algorithm and YARN enables data processing at the Petabyte level (more than 1 Million Gigybytes)

Data Compression

The compression into different formats leads to an optimal use of storage space and resources.

Transparent Support for various File Formats

Hadoop supports structured and unstructured file formats such as text formats (.csv or .json) or tabular formats (-orc and parquet).

Questions about our Open Source products?
We would be happy to help you