Tessera


Note Figure source with adaptation.

  • HDFS + MapReduce

    • HDFS as distributed data storage

      • Read data from HDFS to multiple workers on different nodes (keep data close)
    • Hadoop YARN/MRv2 as the computing backend

      • Map: Process data on each worker (parallel)
      • Reduce: Shuffle data (map output) and do additional processing on each worker
  • R as the front end

  • Rhipe as the connector between R and Hadoop