[hive] 다이나믹 파티션의 __HIVE_DEFAULT_PARTITION_

티스토리 뷰

빅데이터/hive

[hive] 다이나믹 파티션의 __HIVE_DEFAULT_PARTITION__

hs_seo 2017. 2. 2. 17:48

하이브에서 다이나믹 파티션으로 데이터를 생성할 때

다이나믹 파티션에 입력되는 이름이 null 이거나 공백이면

하이브에서 지정된(hive.exec.default.partition.name) 이름을 이용하여 파티션을 생성한다.

hive-default.xml 에 설정된 기본 이름이 __HIVE_DEFAULT_PARTITION__이다.

아래와 같은 코드에서 country_code 칼럼에 공백이나 null 값이 있으면

기본으로 설정된 이름의 파티션이 생성된다.

insert into table partition_sample partition (country_code)

select country,

country_code

from world_name;

show partitions partition_sample ;

...

country_code=__HIVE_DEFAULT_PARTITION__

If the input column value is NULL or empty string, the row will be put into a special partition, whose name is controlled by the hive parameter hive.exec.default.partition.name. The default value is HIVE_DEFAULT_PARTITION{}. Basically this partition will contain all "bad" rows whose value are not valid partition names. The caveat of this approach is that the bad value will be lost and is replaced by HIVE_DEFAULT_PARTITION{} if you select them Hive. JIRA HIVE-1309 is a solution to let user specify "bad file" to retain the input partition column values as well.
Dynamic partition insert could potentially be a resource hog in that it could generate a large number of partitions in a short time. To get yourself buckled, we define three parameters:
hive.exec.max.dynamic.partitions.pernode (default value being 100) is the maximum dynamic partitions that can be created by each mapper or reducer. If one mapper or reducer created more than that the threshold, a fatal error will be raised from the mapper/reducer (through counter) and the whole job will be killed.
hive.exec.max.dynamic.partitions (default value being 1000) is the total number of dynamic partitions could be created by one DML. If each mapper/reducer did not exceed the limit but the total number of dynamic partitions does, then an exception is raised at the end of the job before the intermediate data are moved to the final destination.
hive.exec.max.created.files (default value being 100000) is the maximum total number of files created by all mappers and reducers. This is implemented by updating a Hadoop counter by each mapper/reducer whenever a new file is created. If the total number is exceeding hive.exec.max.created.files, a fatal error will be thrown and the job will be killed.

hive 설정 - https://svn.apache.org/repos/asf/hive/tags/release-0.6.0/conf/hive-default.xml

hive 튜토리얼 - https://cwiki.apache.org/confluence/display/Hive/Tutorial

저작자표시 비영리

'빅데이터 > hive' 카테고리의 다른 글

[hive] order by, sort by, cluster by 의 차이 (0)	2017.02.14
[hive][tez] tez 처리중 tez.lib.uris is not defined in the configuration 오류 수정 (0)	2017.02.07
[Hive] transform 으로 파이썬을 이용할 때 exception 출력하기 (0)	2017.01.23
[Hive] TRANSFORM()을 이용하여 입력데이터 변형(custom mapreduce 사용하기) (0)	2017.01.23
[hive] desc 명령을 이용하여 하이브 테이블, 파티션의 로케이션 확인하기 (0)	2017.01.19

공지사항

최근에 올라온 글

최근에 달린 댓글

Total

Today

Yesterday

링크

TAG more

« 2025/04 »
일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30

글 보관함

개발자로 살아남기

티스토리 뷰

[hive] 다이나믹 파티션의 __HIVE_DEFAULT_PARTITION__

'빅데이터 > hive' 카테고리의 다른 글

티스토리툴바