Hadoop Job失败解决 -

ZHB_McCoy

浏览: 129146 次
性别:
来自: 北京

最近访客更多访客>>

nickevin

hushuai0126

xprlp

爱吃小面

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

Hadoop Job失败解决

博客分类：

hadoop

现象：map 某个task始终实行失败，直到超时，attemp task重试四次，最后task失败

查看jobtracker发现每次都是固定的task，找到该task所在节点，查看log，搜索该taskid

如：

cat hadoop-hadoop-tasktracker-DB1221.log.2012-06-26 | grep attempt_201206081842_0456_m_000392_0

2012-06-26 17:44:23,543 INFO org.apache.hadoop.mapred.TaskTracker: JVM with ID: jvm_201206081842_0456_m_-1061492923 given task: attempt_201206081842_0456_m_000392_0
2012-06-26 17:44:30,385 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201206081842_0456_m_000392_0 0.5560105% 
2012-06-26 17:44:33,387 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201206081842_0456_m_000392_0 0.5560105% 
2012-06-26 17:54:35,277 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201206081842_0456_m_000392_0: Task attempt_201206081842_0456_m_000392_0 failed to report status for 601 seconds. Killing!
2012-06-26 17:54:35,300 INFO org.apache.hadoop.mapred.TaskTracker: About to purge task: attempt_201206081842_0456_m_000392_0

每次都是执行到固定百分比然后无响应直到超时。

解决办法：

map中用try catch试图打印出错日志，结果失败

又在map中加入

InputSplit inputSplit=(InputSplit)context.getInputSplit(); 
					String filename=((FileSplit)inputSplit).getPath().getName();

并打印，然后执行job，到失败节点查看stdout，取得失败的文件。然后分析文件。

cat data | awk -F "\t" '{if(length($0)>100000000) print $0}'


cat data | awk -F "\t" '{d[length($0)]++}END{for(i in d) print i"\t"d[i]}'|sort -k 1,1 -nr | less

发现有一行记录超过200M。

解决方法1：

cat data1 |awk -F "\t" '{if(length($0)<100000000)print $0}'>data2

把超长记录清楚，重新处理。

解决方法2：

hadoop the definitive guide 写道

If you are using TextInputFormat (“TextInputFormat” on page 244),
then you can set a maximum expected line length to safeguard against
corrupted files. Corruption in a file can manifest itself as a very long line,
which can cause out of memory errors and then task failure. By setting
mapred.linerecordreader.maxlength to a value in bytes that fits in mem-ory (and is comfortably greater than the length of lines in your input
data), the record reader will skip the (long) corrupt lines without the
task failing.

通过job中设置 mapred.linerecordreader.maxlength 参数或者集群参数跳过坏记录

Configuration conf = new Configuration();
conf.setInt("mapred.linerecordreader.maxlength", 32768);

具体参考hadoop the definitive guide p218.

另hdg中说会跳过超长记录，但代码中讲会忽略超长记录后面的内容，

查看 TextInputFormat：

http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/com.cloudera.hadoop/hadoop-core/0.20.2-737/org/apache/hadoop/mapreduce/lib/input/TextInputFormat.java

public class TextInputFormat extends FileInputFormat<LongWritable, Text> {
36
37  @Override
38  public RecordReader<LongWritable, Text> 
39    createRecordReader(InputSplit split,
40                       TaskAttemptContext context) {
41    return new LineRecordReader();
42  }
43
44  @Override
45  protected boolean isSplitable(JobContext context, Path file) {
46    CompressionCodec codec = 
47      new CompressionCodecFactory(context.getConfiguration()).getCodec(file);
48    return codec == null;
49  }
50
51}

LineRecordReader：

http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/com.cloudera.hadoop/hadoop-core/0.20.2-737/org/apache/hadoop/mapreduce/lib/input/LineRecordReader.java#LineRecordReader

 this.maxLineLength = job.getInt("mapred.linerecordreader.maxlength",
                                     Integer.MAX_VALUE);

in = new LineReader(codec.createInputStream(fileIn), job);

LineReader：

http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/com.cloudera.hadoop/hadoop-core/0.20.2-737/org/apache/hadoop/util/LineReader.java#LineReader

Read one line from the InputStream into the given Text. A line can be terminated by one of the following: '\n' (LF) , '\r' (CR), or '\r\n' (CR+LF). EOF also terminates an otherwise unterminated line.
Parameters:
str the object to store the given line (without newline)
maxLineLength the maximum number of bytes to store into str; the rest of the line is silently discarded.
maxBytesToConsume the maximum number of bytes to consume in this call. This is only a hint, because if the line cross this threshold, we allow it to happen. It can overshoot potentially by as much as one buffer length.
Returns:
the number of bytes read including the (longest) newline found.
Throws:
java.io.IOException if the underlying stream throws
104
105  public int readLine(Text str, int maxLineLength,
106                      int maxBytesToConsume) throws IOException {
107    /* We're reading data from in, but the head of the stream may be
108     * already buffered in buffer, so we have several cases:
109     * 1. No newline characters are in the buffer, so we need to copy
110     *    everything and read another buffer from the stream.
111     * 2. An unambiguously terminated line is in buffer, so we just
112     *    copy to str.
113     * 3. Ambiguously terminated line is in buffer, i.e. buffer ends
114     *    in CR.  In this case we copy everything up to CR to str, but
115     *    we also need to see what follows CR: if it's LF, then we
116     *    need consume LF as well, so next call to readLine will read
117     *    from after that.
118     * We use a flag prevCharCR to signal if previous character was CR
119     * and, if it happens to be at the end of the buffer, delay
120     * consuming it until we have a chance to look at the char that
121     * follows.
122     */
123    str.clear();
124    int txtLength = 0; //tracks str.getLength(), as an optimization
125    int newlineLength = 0; //length of terminating newline
126    boolean prevCharCR = false; //true of prev char was CR
127    long bytesConsumed = 0;
128    do {
129      int startPosn = bufferPosn; //starting from where we left off the last time
130      if (bufferPosn >= bufferLength) {
131        startPosn = bufferPosn = 0;
132        if (prevCharCR)
133          ++bytesConsumed; //account for CR from previous read
134        bufferLength = in.read(buffer);
135        if (bufferLength <= 0)
136          break; // EOF
137      }
138      for (; bufferPosn < bufferLength; ++bufferPosn) { //search for newline
139        if (buffer[bufferPosn] == LF) {
140          newlineLength = (prevCharCR) ? 2 : 1;
141          ++bufferPosn; // at next invocation proceed from following byte
142          break;
143        }
144        if (prevCharCR) { //CR + notLF, we are at notLF
145          newlineLength = 1;
146          break;
147        }
148        prevCharCR = (buffer[bufferPosn] == CR);
149      }
150      int readLength = bufferPosn - startPosn;
151      if (prevCharCR && newlineLength == 0)
152        --readLength; //CR at the end of the buffer
153      bytesConsumed += readLength;
154      int appendLength = readLength - newlineLength;
155      if (appendLength > maxLineLength - txtLength) {
156        appendLength = maxLineLength - txtLength;
157      }
158      if (appendLength > 0) {
159        str.append(buffer, startPosn, appendLength);
160        txtLength += appendLength;
161      }
162    } while (newlineLength == 0 && bytesConsumed < maxBytesToConsume);
163
164    if (bytesConsumed > (long)Integer.MAX_VALUE)
165      throw new IOException("Too many bytes before newline: " + bytesConsumed);    
166    return (int)bytesConsumed;
167  }

分享到：

Java(J2SE)使用API读取Properties配置文件 ... | HBase技术介绍

2012-06-27 17:07
浏览 3062
评论(0)
分类:开源软件
查看更多

发表评论

您还没有登录,请您登录后再发表评论

最近访客更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

Hadoop Job失败解决

评论

发表评论

相关推荐

最近访客 更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

Hadoop Job失败解决

评论

发表评论

相关推荐

HDFS架构简介

Hadoop Dont's: What not to do to harvest Hadoop's full potential

hadoop-0.20.203启用LZO压缩 安装成功

Hadoop TaskScheduler浅析

Hadoop OutputFormat浅析

Hadoop InputFormat浅析

hadoop面试可能遇到的问题

hadoop-0.20.203启用LZO压缩

24 Interview Questions & Answers for Hadoop MapReduce developers

hadoop hbase log backup

Hadoop的那些事儿

Hadoop学习总结：Map-Reduce的过程解析

Hadoop集群上使用Lzo压缩

hadoop 相关博客推荐

hdfs小文件问题

hadoop tuning blog

hadoop作业调优参数整理及原理

Kerberos authentication

关于 hadoop slot的一篇转载

hadoop 优化的一些点

最近访客更多访客>>

hadoop-0.20.203启用LZO压缩安装成功