[hive] hive.exec.dynamic.partition.mode, hive.optimize.sort.dynamic.partiton 설정에 따른 성능 저하 확인

티스토리 뷰

빅데이터/hive

[hive] hive.exec.dynamic.partition.mode, hive.optimize.sort.dynamic.partiton 설정에 따른 성능 저하 확인

hs_seo 2018. 3. 8. 17:31

하이브에 hive.exec.dynamic.partition.mode 옵션은

다음과 같이 다이나믹 파티션으로 모든 데이터를 입력해야 할 때 nonstrict 모드로 설정해야 할 때 사용한다.

hive.exec.dynamic.partition.mode
Default Value: strict
Added In: Hive 0.6.0
In strict mode, the user must specify at least one static partition in case the user accidentally overwrites all partitions. In nonstrict mode all partitions are allowed to be dynamic.
Set to nonstrict to support INSERT ... VALUES, UPDATE, and DELETE transactions (Hive 0.14.0 and later). For a complete list of parameters required for turning on Hive transactions, see hive.txn.manager.

그런데 MR엔진(TEZ엔진도 동일함)을 이용하여 Mapper Only로 동작하는 스태틱 파티션 1개, 다이나믹 파티션 1개로 구성된

INSERT SELECT 문을 작성하여 테스트 하던 중 해당 설정에 따라 처리 시간이 달라지는 것을 확인하였다.

스태틱 파티션이 있기 때문에 해당값을 nonstrict 로 설정할 필요가 없었는데,

기존에 존재하던 설정값을 그대로 이용하여 처리하다보니 발생한 오류였다.

nonstrict 일경우 Mapper Only 잡이 아니라 Map, Reduce 로 잡이 구성되고

Reduce 시점에 다음과 같이 Sort 작업이 추가 되어 시간이 급격하게 증가 되었다.

따라서 hive.exec.dynamic.partition.mode 값은 기본을 strict 로 두고 꼭 필요한 경우에만 nonstrict 모드로 설정하는 것이 좋다.

Reduce Output Operator

key expressions: _col26 (type: string)

sort order: +

원인은 Hive의 Optimizer.java()에서 아래의 구문 때무에 SortedDynPartitonOptimizer() 가 적용 되어서 그렇다.

if(HiveConf.getBoolVar(hiveConf, HiveConf.ConfVars.DYNAMICPARTITIONING) &&

HiveConf.getVar(hiveConf, HiveConf.ConfVars.DYNAMICPARTITIONINGMODE).equals("nonstrict") &&

HiveConf.getBoolVar(hiveConf, HiveConf.ConfVars.HIVEOPTSORTDYNAMICPARTITION) &&

!HiveConf.getBoolVar(hiveConf, HiveConf.ConfVars.HIVEOPTLISTBUCKETING)) {

transformations.add(new SortedDynPartitionOptimizer());

}

해당 구문때문에 리듀서를 하나만 써서 Sort 하게 되면서 성능이 느려지게 된다.

정확하게는 다음의 hive.optimize.sort.dynamic.partiton 설정에 의하여 하나의 리듀서만 쓰게 된다.

따라서 hive.exec.dynamic.partition.mode, hive.optimize.sort.dynamic.partiton 설정은 필요에 따라 설정하는 것이 좋다.

hive.optimize.sort.dynamic.partition

Default Value: true in Hive 0.13.0 and 0.13.1; false in Hive 0.14.0 and later (HIVE-8151)
Added In: Hive 0.13.0 with HIVE-6455

When enabled, dynamic partitioning column will be globally sorted. This way we can keep only one record writer open for each partition value in the reducer thereby reducing the memory pressure on reducers.

따라서 다음과 같이 설정하고 작업을 진행하는 것이 좋을 것 같다.

hive.exec.dynamic.partition.mode=strict;

hive.optimize.sort.dynamic.partiton=false;

저작자표시 비영리 (새창열림)

'빅데이터 > hive' 카테고리의 다른 글

[hive] 메타스토어의 테이블로 파티션 위치 확인하는 방법 (0)	2018.03.19
[hive] drop table 처리중 GC 또는 OutOfMemory 오류가 발생하는 경우 (0)	2018.03.14
[hive] 하이브에서 MR 작업 결과를 merge 하는 방법 (0)	2018.03.07
[hive] 테이블 생성시 예약어를 사용하는 방법 (0)	2018.02.08
[hive] java.lang.OutOfMemoryError: Java heap space 오류 수정 (0)	2018.01.03

공지사항

최근에 올라온 글

최근에 달린 댓글

Total

Today

Yesterday

링크

TAG more

« 2025/06 »
일	월	화	수	목	금	토
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30

글 보관함

개발자로 살아남기

티스토리 뷰