[hive][스크랩] hive orc 예제

티스토리 뷰

빅데이터/hive

[hive][스크랩] hive orc 예제

hs_seo 2018. 4. 20. 17:40

ORC는 칼럼 기반의 파일 저장방식으로, hive에 적용하면 성능을 개선할 수 있다.

사용방법은 다음과 같이 STORED AS를 ORC로 선언해주면 된다.

그리고 TBLPROPERTIES에 설정정보를 입력할 수 있다.

CREATE TABLE table1

(

col1 string,

col2 string,

col3 string,

col4 string

)

STORED AS ORC

TBLPROPERTIES (

"orc.compress"="ZLIB",

"orc.compress.size"="262144",

"orc.create.index"="true",

"orc.stripe.size"="268435456",

"orc.row.index.stride"="3000",

"orc.bloom.filter.columns"="col1,col2");

Key	Default	Notes
orc.bloom.filter.columns	""	블룸필터를 생성할 컬럼 정보, 콤마(,)로 구분하여 입력
orc.bloom.filter.fpp	0.05	블룸필터의 오판 확률(fpp=false positive portability) 설정 (must >0.0 and <1.0)
orc.compress	ZLIB	압축방식 설정 (one of NONE, ZLIB, SNAPPY)
orc.compress.size	262,144	압축을 처리할 청크 사이즈 설정(256 * 1024 = 262,144)
orc.create.index	true	인덱스 사용 여부
orc.row.index.stride	10,000	설정 row 이상일 때 인덱스 생성 (must be >= 1000)
orc.stripe.size	67,108,864	스트라이프를 생성할 사이즈 (64 * 1024 *1024 = 67,108,864)), 설정 사이즈마다 하나씩 생성

Hive and Apache Tez: Benchmarked at Yahoo! Scale from DataWorks Summit

[ORC 정보]

https://community.hortonworks.com/articles/75501/orc-creation-best-practices.html

https://cwiki.apache.org/confluence/display/Hive/LanguageManual+ORC#LanguageManualORC-SerializationandCompression

[블룸 필터(Bloom filter)]

원소가 집합에 속하는지 여부를 검사하는데 사용하는 확률적 자료 구조 이다.

블룸 필터에 의해 어떤 원소가 집합에 속한다고 판단된 경우 실제로는 원소가 집합에 속하지 않는 긍정 오류가 발생하는 것이 가능하지만, 반대로 원소가 집합에 속하지 않는 것으로 판단되었는데 실제로는 원소가 집합에 속하는 부정 오류는 절대로 발생하지 않는다는 특성이 있다.

https://ko.wikipedia.org/wiki/%EB%B8%94%EB%A3%B8_%ED%95%84%ED%84%B0

저작자표시 비영리 (새창열림)

'빅데이터 > hive' 카테고리의 다른 글

[hive] Container failed, exitCode=1. Exception from container-launch 오류 해결방법 (0)	2018.04.26
[hive] hive cli 로그 출력하게 설정 변경 (0)	2018.04.25
[hive] hive에서 함수용 테스트 array, map 데이터 생성하는 법 (0)	2018.04.12
[hive] rank() 함수 처리중 Caused by: org.apache.hive.com.esotericsoftware.kryo.KryoException: java.lang.IndexOutOfBoundsException (0)	2018.04.09
[hive] hive 쿼리 where 조건의 in 사용시 메타 스토어 오류 (0)	2018.04.09

공지사항

최근에 올라온 글

최근에 달린 댓글

Total

Today

Yesterday

링크

TAG more

« 2025/07 »
일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

글 보관함

개발자로 살아남기

티스토리 뷰

[hive][스크랩] hive orc 예제

'빅데이터 > hive' 카테고리의 다른 글

티스토리툴바