티스토리 뷰
spark 처리중 다음과 같은 오류가 발생하는 경우가 있다.
아마도 shuffle 처리중 메모리가 부족해서 발생하는 것으로 추정된다.
이럴때는 spark.sql.shuffle.partitions 설정을 추가하면 된다.
다음 설정을 추가하여 처리하였다.
spark.sql.shuffle.partitions=300
spark.default.parallelism=300
diagnostics: User class threw exception: org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 2.0 failed 4 times, most recent failure: Lost task 1.3 in stage 2.0 (TID 436: java.io.IOException: java.util.NoSuchElementException
at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1283)
at org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:174)
at org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:65)
at org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:65)
at org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:89)
at org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:72)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:47)
at org.apache.spark.scheduler.Task.run(Task.scala:86)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.util.NoSuchElementException
at org.apache.spark.util.collection.PrimitiveVector$$anon$1.next(PrimitiveVector.scala:58)
at org.apache.spark.storage.memory.PartiallyUnrolledIterator.next(MemoryStore.scala:697)
at org.apache.spark.util.CompletionIterator.next(CompletionIterator.scala:30)
at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1$$anonfun$2.apply(TorrentBroadcast.scala:178)
at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1$$anonfun$2.apply(TorrentBroadcast.scala:178)
at scala.Option.map(Option.scala:146)
at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1.apply(TorrentBroadcast.scala:178)
at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1276)
'빅데이터 > spark' 카테고리의 다른 글
[spark] 스파크 디플로이 모드(deploy mode)의 cluster, client 의 차이 (0) | 2018.03.27 |
---|---|
[spark] 스파크 컨테이너 메모리 오류시 처리 방안 (0) | 2018.02.13 |
스파크에서 문자열 utf-8 형식으로 처리하기 (0) | 2016.06.20 |
[개념] 스톰 vs 스파크 (0) | 2015.07.21 |
[spark] 아파치 스파크 (0) | 2015.06.24 |
- Total
- Today
- Yesterday
- 하이브
- SPARK
- ubuntu
- Hadoop
- hbase
- 다이나믹
- mysql
- error
- HDFS
- AWS
- yarn
- nodejs
- 하둡
- 알고리즘
- build
- bash
- 오류
- emr
- SQL
- 파이썬
- oozie
- 정올
- HIVE
- Python
- java
- Linux
- S3
- Tez
- 백준
- airflow
일 | 월 | 화 | 수 | 목 | 금 | 토 |
---|---|---|---|---|---|---|
1 | 2 | 3 | 4 | |||
5 | 6 | 7 | 8 | 9 | 10 | 11 |
12 | 13 | 14 | 15 | 16 | 17 | 18 |
19 | 20 | 21 | 22 | 23 | 24 | 25 |
26 | 27 | 28 | 29 | 30 | 31 |