[xml] xml 파싱하기

티스토리 뷰

python

[xml] xml 파싱하기

hs_seo 2016. 2. 16. 17:55

파이썬에서 xml 을 파싱할때는

xml.etree.ElementTree 를 이용하여 진행한다.

주오 프로퍼티는 tag, attrib, text 가 있고,

주요 메소드는 iter(), findall() 이 있다.

활용 방법은 다음과 같다.

# -*- coding:utf-8 -*-
import xml.etree.ElementTree as ET

country_data_as_string = '''<?xml version="1.0"?>
<data>
    <country name="Liechtenstein">
        <rank>1</rank>
        <year>2008</year>
        <gdppc>141100</gdppc>
        <neighbor name="Austria" direction="E"/>
        <neighbor name="Switzerland" direction="W"/>
    </country>
    <country name="Singapore">
        <rank>4</rank>
        <year>2011</year>
        <gdppc>59900</gdppc>
        <neighbor name="Malaysia" direction="N"/>
    </country>
    <country name="Panama">
        <rank>68</rank>
        <year>2011</year>
        <gdppc>13600</gdppc>
        <neighbor name="Costa Rica" direction="W"/>
        <neighbor name="Colombia" direction="E"/>
    </country>
    <city name="NewYork">
        <rank>1</rank>
        <year>2015</year>
        <neighbor name="boston" time="1993"/>
    </city>
    <city name="Boston">
        <rank>4</rank>
        <year>2015</year>
        <neighbor name="newyork" time="1992"/>
    </city>
</data>
'''

tree = ET.parse('test.xml')	#파일을 이용한 파싱
root = tree.getroot()

root = ET.fromstring(country_data_as_string)	# xml 문자를 이용한 파싱

print(root.tag, root.attrib)	# root 가 <data> 엘리먼트를 가르키게 된다. 
''' 
출력
('data', {})
'''

# root 태그로 for 문을 돌리면 자식 엘리먼트 전체가 추출 됨
for child in root:
	print(child.tag, child.attrib)
'''
출력
('country', {'name': 'Liechtenstein'})
('country', {'name': 'Singapore'})
('country', {'name': 'Panama'})
('city', {'name': 'NewYork'})
('city', {'name': 'Boston'})
'''

# iter() 메소드를 이용하면 xml 문서 전체의 엘리먼트를 가지고 온다.
for neighbor in root.iter('neighbor'):
	print(neighbor.tag, neighbor.attrib)
'''
출력
('neighbor', {'direction': 'E', 'name': 'Austria'})
('neighbor', {'direction': 'W', 'name': 'Switzerland'})
('neighbor', {'direction': 'N', 'name': 'Malaysia'})
('neighbor', {'direction': 'W', 'name': 'Costa Rica'})
('neighbor', {'direction': 'E', 'name': 'Colombia'})
('neighbor', {'name': 'boston', 'time': '1993'})
('neighbor', {'name': 'newyork', 'time': '1992'})
'''

# findall() 메소드를 이용하면 현재 태그의 자식중에서 지정한 태그를 반환한다. 
for neighbor in root.findall('neighbor'):
	print(neighbor.tag)
'''
출력
없음
'''
for country in root.findall('country'):
	print(country.tag, country.attrib)
'''
출력
('country', {'name': 'Liechtenstein'})
('country', {'name': 'Singapore'})
('country', {'name': 'Panama'})
'''

# xpath 를 이용하여 데이터를 확인하는 것도 가능
for ele in root.findall("./country/year"):
	print(ele.tag, ele.text)
'''
출력
('year', '2008')
('year', '2011')
('year', '2011')
'''


# 인덱스를 이용하여 태그를 지정하는 것도 가능
print(root[2][1].tag, root[2][1].text)	
'''
출력
('year', '2011')
'''

저작자표시 비영리 (새창열림)

'python' 카테고리의 다른 글

[os] 디렉토리의 특정파일의 파일명 변경하기 (0)	2016.03.23
[urllib / http]http로 웹에서 파일 다운로드 하기 (3)	2016.03.05
[python][xlswriter] xlsxwriter 를 이용하여 엑셀 문서 생성 (0)	2016.01.20
SyntaxError: Non-ASCII character 해결하기 (0)	2016.01.20
pip의 패키지 업그레이드 하기 (0)	2016.01.20

공지사항

최근에 올라온 글

최근에 달린 댓글

Total

Today

Yesterday

링크

TAG more

« 2025/07 »
일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

글 보관함

개발자로 살아남기

티스토리 뷰

[xml] xml 파싱하기

'python' 카테고리의 다른 글

티스토리툴바