Powered by GitBook

Tessera

Note Figure source with adaptation.

HDFS ＋ MapReduce
- HDFS as distributed data storage
  - Read data from HDFS to multiple workers on different nodes (keep data close)
- Hadoop YARN/MRv2 as the computing backend
  - Map: Process data on each worker (parallel)
  - Reduce: Shuffle data (map output) and do additional processing on each worker
R as the front end
Rhipe as the connector between R and Hadoop