- 浏览: 129146 次
- 性别:
- 来自: 北京
文章分类
现象:map 某个task始终实行失败,直到超时,attemp task重试四次,最后task失败
查看jobtracker发现每次都是固定的task,找到该task所在节点,查看log,搜索该taskid
如:
cat hadoop-hadoop-tasktracker-DB1221.log.2012-06-26 | grep attempt_201206081842_0456_m_000392_0
2012-06-26 17:44:23,543 INFO org.apache.hadoop.mapred.TaskTracker: JVM with ID: jvm_201206081842_0456_m_-1061492923 given task: attempt_201206081842_0456_m_000392_0 2012-06-26 17:44:30,385 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201206081842_0456_m_000392_0 0.5560105% 2012-06-26 17:44:33,387 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201206081842_0456_m_000392_0 0.5560105% 2012-06-26 17:54:35,277 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201206081842_0456_m_000392_0: Task attempt_201206081842_0456_m_000392_0 failed to report status for 601 seconds. Killing! 2012-06-26 17:54:35,300 INFO org.apache.hadoop.mapred.TaskTracker: About to purge task: attempt_201206081842_0456_m_000392_0
每次都是执行到固定百分比然后无响应直到超时。
解决办法:
map中用try catch试图打印出错日志,结果失败
又在map中加入
InputSplit inputSplit=(InputSplit)context.getInputSplit(); String filename=((FileSplit)inputSplit).getPath().getName();
并打印,然后执行job,到失败节点查看stdout,取得失败的文件。然后分析文件。
cat data | awk -F "\t" '{if(length($0)>100000000) print $0}' cat data | awk -F "\t" '{d[length($0)]++}END{for(i in d) print i"\t"d[i]}'|sort -k 1,1 -nr | less
发现有一行记录超过200M。
解决方法1:
cat data1 |awk -F "\t" '{if(length($0)<100000000)print $0}'>data2
把超长记录清楚,重新处理。
解决方法2:
hadoop the definitive guide 写道
If you are using TextInputFormat (“TextInputFormat” on page 244),
then you can set a maximum expected line length to safeguard against
corrupted files. Corruption in a file can manifest itself as a very long line,
which can cause out of memory errors and then task failure. By setting
mapred.linerecordreader.maxlength to a value in bytes that fits in mem-ory (and is comfortably greater than the length of lines in your input
data), the record reader will skip the (long) corrupt lines without the
task failing.
then you can set a maximum expected line length to safeguard against
corrupted files. Corruption in a file can manifest itself as a very long line,
which can cause out of memory errors and then task failure. By setting
mapred.linerecordreader.maxlength to a value in bytes that fits in mem-ory (and is comfortably greater than the length of lines in your input
data), the record reader will skip the (long) corrupt lines without the
task failing.
通过job中设置 mapred.linerecordreader.maxlength 参数或者集群参数跳过坏记录
Configuration conf = new Configuration(); conf.setInt("mapred.linerecordreader.maxlength", 32768);
具体参考hadoop the definitive guide p218.
另hdg中说会跳过超长记录,但代码中讲会忽略超长记录后面的内容,
public class TextInputFormat extends FileInputFormat<LongWritable, Text> { 36 37 @Override 38 public RecordReader<LongWritable, Text> 39 createRecordReader(InputSplit split, 40 TaskAttemptContext context) { 41 return new LineRecordReader(); 42 } 43 44 @Override 45 protected boolean isSplitable(JobContext context, Path file) { 46 CompressionCodec codec = 47 new CompressionCodecFactory(context.getConfiguration()).getCodec(file); 48 return codec == null; 49 } 50 51}
this.maxLineLength = job.getInt("mapred.linerecordreader.maxlength", Integer.MAX_VALUE);
in = new LineReader(codec.createInputStream(fileIn), job);
Read one line from the InputStream into the given Text. A line can be terminated by one of the following: '\n' (LF) , '\r' (CR), or '\r\n' (CR+LF). EOF also terminates an otherwise unterminated line. Parameters: str the object to store the given line (without newline) maxLineLength the maximum number of bytes to store into str; the rest of the line is silently discarded. maxBytesToConsume the maximum number of bytes to consume in this call. This is only a hint, because if the line cross this threshold, we allow it to happen. It can overshoot potentially by as much as one buffer length. Returns: the number of bytes read including the (longest) newline found. Throws: java.io.IOException if the underlying stream throws 104 105 public int readLine(Text str, int maxLineLength, 106 int maxBytesToConsume) throws IOException { 107 /* We're reading data from in, but the head of the stream may be 108 * already buffered in buffer, so we have several cases: 109 * 1. No newline characters are in the buffer, so we need to copy 110 * everything and read another buffer from the stream. 111 * 2. An unambiguously terminated line is in buffer, so we just 112 * copy to str. 113 * 3. Ambiguously terminated line is in buffer, i.e. buffer ends 114 * in CR. In this case we copy everything up to CR to str, but 115 * we also need to see what follows CR: if it's LF, then we 116 * need consume LF as well, so next call to readLine will read 117 * from after that. 118 * We use a flag prevCharCR to signal if previous character was CR 119 * and, if it happens to be at the end of the buffer, delay 120 * consuming it until we have a chance to look at the char that 121 * follows. 122 */ 123 str.clear(); 124 int txtLength = 0; //tracks str.getLength(), as an optimization 125 int newlineLength = 0; //length of terminating newline 126 boolean prevCharCR = false; //true of prev char was CR 127 long bytesConsumed = 0; 128 do { 129 int startPosn = bufferPosn; //starting from where we left off the last time 130 if (bufferPosn >= bufferLength) { 131 startPosn = bufferPosn = 0; 132 if (prevCharCR) 133 ++bytesConsumed; //account for CR from previous read 134 bufferLength = in.read(buffer); 135 if (bufferLength <= 0) 136 break; // EOF 137 } 138 for (; bufferPosn < bufferLength; ++bufferPosn) { //search for newline 139 if (buffer[bufferPosn] == LF) { 140 newlineLength = (prevCharCR) ? 2 : 1; 141 ++bufferPosn; // at next invocation proceed from following byte 142 break; 143 } 144 if (prevCharCR) { //CR + notLF, we are at notLF 145 newlineLength = 1; 146 break; 147 } 148 prevCharCR = (buffer[bufferPosn] == CR); 149 } 150 int readLength = bufferPosn - startPosn; 151 if (prevCharCR && newlineLength == 0) 152 --readLength; //CR at the end of the buffer 153 bytesConsumed += readLength; 154 int appendLength = readLength - newlineLength; 155 if (appendLength > maxLineLength - txtLength) { 156 appendLength = maxLineLength - txtLength; 157 } 158 if (appendLength > 0) { 159 str.append(buffer, startPosn, appendLength); 160 txtLength += appendLength; 161 } 162 } while (newlineLength == 0 && bytesConsumed < maxBytesToConsume); 163 164 if (bytesConsumed > (long)Integer.MAX_VALUE) 165 throw new IOException("Too many bytes before newline: " + bytesConsumed); 166 return (int)bytesConsumed; 167 }
发表评论
-
HDFS架构简介
2012-07-17 15:02 1615转自:http://asyty-cp.blog.163.com ... -
Hadoop Dont's: What not to do to harvest Hadoop's full potential
2012-07-15 15:45 694We've all heard this story. ... -
hadoop-0.20.203启用LZO压缩 安装成功
2012-07-13 18:10 2604#准备各安装包,并scp到各节点 pwd /work/lz ... -
Hadoop TaskScheduler浅析
2012-07-13 14:01 895转自:http://hi.baidu.com/_kouu/bl ... -
Hadoop OutputFormat浅析
2012-07-13 14:00 2167转自:http://hi.baidu.com/_kouu/bl ... -
Hadoop InputFormat浅析
2012-07-13 13:57 893在执行一个Job的时候,Hadoop会将输入数据划分成N个Sp ... -
hadoop面试可能遇到的问题
2012-07-13 13:50 1148Q1. Name the most common Inpu ... -
hadoop-0.20.203启用LZO压缩
2012-07-12 16:45 14941.准备工作,安装ant,(编译第三步lzo编码解码时使用,现 ... -
24 Interview Questions & Answers for Hadoop MapReduce developers
2012-07-12 10:12 732A good understanding of Hadoop ... -
hadoop hbase log backup
2012-06-28 16:43 851hadoop hbase logs目录下日志越来越多,写个简单 ... -
Hadoop的那些事儿
2012-06-25 14:37 677在说Hadoop之前,作为一个铁杆粉丝先粉一下Googl ... -
Hadoop学习总结:Map-Reduce的过程解析
2012-06-25 10:47 807一、客户端 Map-Re ... -
Hadoop集群上使用Lzo压缩
2012-06-21 14:14 1053自从Hadoop集群搭建以来,我们一直使用的是G ... -
hadoop 相关博客推荐
2012-06-04 10:26 816http://www.cnblogs.com/xuqiang/ ... -
hdfs小文件问题
2012-06-04 10:23 739http://www.cloudera.com/blog/20 ... -
hadoop tuning blog
2012-05-24 14:16 8707 Tips for Improving MapReduce ... -
hadoop作业调优参数整理及原理
2012-05-24 13:58 733转自:http://www.tbdata.org/archiv ... -
Kerberos authentication
2012-05-19 21:46 1390转自:http://www.sunchangming.com/ ... -
关于 hadoop slot的一篇转载
2012-05-17 17:12 877版权声明:转载时请以超链接形式标明文章原始出处和作者 ... -
hadoop 优化的一些点
2012-05-17 16:43 805版权声明:转载时请以超链接形式标明文章原始出处和作者 ...
相关推荐
win7下hadoop job提交
上传文件到Hadoop失败的原因分析及解决方法.pdf
Hadoop使用常见问题以及解决方法,简单实用
大数据(bigdata)中,Hadoop如何处理提交的作业(Job),本课件深入分析,一目了然。
安装hadoop的时候或者使用的时候,会出现hadoop常见问题及解决方法
在windows环境下开发hadoop时,需要配置HADOOP_HOME环境变量,变量值D:\hadoop-common-2.7.3-bin-master,并在Path...解决方案:下载本资源解压将hadoop.dll和winutils.exe文件复制到hadoop2.7.3的bin目录下即可解决。
文档主要用于对hadoop搭建及使用过程出现的问题的解决
Hadoop datanode启动失败:Hadoop安装目录权限的问题
hadoop使用distcp问题解决 然后用distcp从1.0.3的集群拷数据到2.0.1的集群中。 遇到问题处理
主要介绍了Hadoop SSH免密码登录以及失败解决方案的相关资料,需要的朋友可以参考下
[Wrox] Hadoop 专业解决方案 (英文版) [Wrox] Professional Hadoop Solutions (E-Book) ☆ 图书概要:☆ If you're ready to make the most out of massively scalable analytics, you need to know how to take ...
基于Hadoop生态系统的大数据解决方案,贯穿案例,音乐排行榜,是最好的项目实战案例
YARN HA 测试Job YARN HA 测试Job 序号 任务名称 任务一 准备MapReduce输入文件 任务二 将输入文件上传到HDFS 任务三 运行MapReduce程序测试Job 任务一 准备MapReduce输入文件 在master主节点,使用 root 用户登录,...
Hadoop 是可靠的,因为它假设计算元素和存储会失败,因此它维护多个工作数据副本,确保能够针对失败的节点重新分布处理。Hadoop 是高效的,因为它以并行的方式工作,通过并行处理加快处理速度。Hadoop 还是可伸缩的...
hadoop数据挖掘解决方案.docx
Hadoop高级编程- 构建与实现大数据解决方案.Hadoop高级编程- 构建与实现大数据解决方案
Hadoop datanode重新加载失败无法启动解决.docx
在网上搜集的以及本人自己总结的hadoop集群常见问题及解决办法,融合了网上常常搜到的一些文档以及个人自己的经验。
04-hadoop对海量数据处理的解决思路.avi 05-hadoop版本选择和伪分布式安装.avi 06-hadoop版本选择和伪分布式安装2.avi 07-hdfs&mapreduce;测试.avi 08-hdfs的实现机制初始.avi 09-hdfs的shell操作.avi 10-...
hadoop数据挖掘解决方案.pdfhadoop数据挖掘解决方案.pdfhadoop数据挖掘解决方案.pdfhadoop数据挖掘解决方案.pdfhadoop数据挖掘解决方案.pdfhadoop数据挖掘解决方案.pdfhadoop数据挖掘解决方案.pdfhadoop数据挖掘解决...