Five notable books every serious programmer should read.
After the first wave of IaaS, the cloud market is turning to PaaS as the primary model for service delivery
The most popular articles of the first half of the year, sprinkled with a few pieces carefully chosen by the editors.
Installed as a layer above Hadoop, the open-source Pydoop package enables Python scripts to do big data work easily.
The revelation of secret government eavesdropping is likely to substantially reshape companies’ understanding of their data’s safety.
Data analysis is only half the battle; getting the data into a Hadoop cluster is the first step in any Big Data deployment. Apache Flume uses an elegant design to make data loading easy and efficient.
MySQL’s latest update to the Performance Schema brings the ability to profile a statement’s activity, low-level wait events, and I/O impact. It is the easiest and most detailed way to identify what statements to tune and how.
The Hadoop ecosystem relies on composability — the ability to use output from one tool as input to the next — to efficiently process data at scale, from simple projects, to processing streams of real-time data, to building data warehouses.
MapReduce on small datasets can be run easily and without much coding or fiddling — provided you know what to do. Here’s how.
The core map-reduce framework for big data consists of several interlocking technologies. This first installment of our tutorial explains what Hadoop does and how the pieces fit together.