×

Hadoop 教程

Hadoop 关于Hadoop 简介Hadoop HDFSHadoop 环境设置Hadoop 写文件Hadoop 读文件Hadoop 可靠性Hadoop 命令工具Hadoop YARNHadoop ResourceManagerHadoop NodeManagerHadoop ApplicationMasterHadoop ContainerHadoop FailoverHadoop MapReduceHadoop 读取数据Hadoop MapperHadoop ShuffleHadoop 编程Hadoop IOHadoop 测试Hadoop 安装Hadoop 配置Hadoop 监控Hadoop 参考

Hadoop 相关教程

Hadoop 大数据概述Hadoop 大数据解决方案Hadoop HDFS概述Hadoop HDFS操作Hadoop 命令参考Hadoop 流Hadoop 多节点集群

Hadoop 编程


MapReduce - 编程

处理

  1. select:直接分析输入数据,取出需要的字段数据即可
  2. where: 也是对输入数据处理的过程中进行处理,判断是否需要该数据
  3. aggregation:min, max, sum
  4. group by: 通过Reducer实现
  5. sort
  6. join: map join, reduce join

Third-Party Libraries

export LIBJARS=$MYLIB/commons-lang-2.3.jar, hadoop jar prohadoop-0.0.1-SNAPSHOT.jar org.aspress.prohadoop.c3. WordCountUsingToolRunner -libjars $LIBJARS
hadoop jar prohadoop-0.0.1-SNAPSHOT-jar-with-dependencies.jar org.aspress.prohadoop.c3. WordCountUsingToolRunner The dependent libraries are now included inside the application JAR file

一般还是上面的好,指定依赖可以利用Public Cache,如果是包含依赖,则每次都需要拷贝

参考书籍

MapReduce Design Patterns



分类导航

关注微信下载离线手册

bootwiki移动版 bootwiki
(群号:472910771)