ȸ¿ø°¡ÀԡžÆÀ̵ð/ºñ¹øã±â
ȨÀ¸·Î


ÆÄÀ̽㿡¼­ À¯´ÏÄÚµå ½ºÆ®¸² ´Ù·ç±â
8³â Àü
ÆÄÀ̽㿡¼­ À¯´ÏÄڵ带 ´Ù·ê ¶§´Â ÀϹÝÀûÀ¸·Î str.decode()¿Í unicode.encode() ¸Þ¼­µå¸¦ »ç¿ëÇÏ¿© unicode ŸÀÔ°ú str ŸÀÔÀ» »óÈ£ º¯È¯ÇÑ´Ù.

¾Æ·¡ ¿¹½Ã¿¡¼­´Â 'utf-16'À¸·Î ÀÛ¼ºµÈ ÆÄÀÏÀ» ¿­¾î, ¼öÁ÷ ÅÇ(vertical tab) ÄÚµåÆ÷ÀÎÆ®¸¦ Áö¿î ´ÙÀ½, 'utf-8'·Î ÀúÀåÇÑ´Ù. (±úÁø XMLÀ» ´Ù·ê ¶§ ÀÌ ¹æ½ÄÀÌ ¸Å¿ì Áß¿äÇÏ´Ù.)

# ÆÄÀÏ ³»¿ëÀ» Àд´Ù
with open("input.txt", "rb") as input:  
    data = input.read()

# ¹ÙÀ̳ʸ® µ¥ÀÌÅ͸¦ utf-16À¸·Î µðÄÚµùÇÑ´Ù
data = data.decode("utf-16")

# ¼öÁ÷ ÅÇÀ» »èÁ¦ÇÑ´Ù
data = data.replace(u"\u000B", u"")

# À¯´ÏÄÚµå µ¥ÀÌÅ͸¦ utf-8·Î ÀÎÄÚµùÇÑ´Ù
data = data.encode("utf-8")

# µ¥ÀÌÅ͸¦ utf-8·Î ÀúÀåÇÑ´Ù
with open("output.txt", "wb") as output:  
    output.write(data)
¾öû³ª°Ô Å« ÆÄÀÏÀ» ´Ù·ê ¶§°¡ ¾Æ´Ï¶ó¸é ÀÌ Á¤µµ·Îµµ ÃæºÐÇÏ´Ù. ÇÏÁö¸¸ Å« ÆÄÀÏÀ» ´Ù·ê ¶© ¸ðµç µ¥ÀÌÅÍ°¡ ¸Þ¸ð¸®¿¡ ¿Ã¶ó°£´Ù´Â »ç½ÇÀÌ ¹®Á¦°¡ µÈ´Ù.

½ºÆ®¸®¹Ö ÀÎÄÚ´õ/µðÄÚ´õ »ç¿ëÇϱâ
ÆÄÀ̽㠱⺻ ¶óÀ̺귯¸®¿¡´Â codecs ¸ðµâÀÌ Æ÷ÇԵǾî ÀÖ´Ù. ÀÌ ¸ðµâÀ» »ç¿ëÇϸé ÆÄÀÏÀ» Á¶±Ý¾¿ ÀÐÀ» ¼ö ÀÖ°í, ¸Þ¸ð¸®¿¡µµ ¾à°£ÀÇ À¯´ÏÄÚµå µ¥ÀÌÅ͸¸ ¿Ã¶ó°¡°Ô µÈ´Ù.

codecs.open() ÇïÆÛ ¸Þ¼­µå¸¦ »ç¿ëÇÏ¿© À§ÀÇ ¿¹½Ã¸¦ ÃÖ¼ÒÇѸ¸ °íÃĺ¸ÀÚ.

import codecs

# ÀÔ·Â ½ºÆ®¸²°ú Ãâ·Â ½ºÆ®¸²À» ¿¬´Ù
input = codecs.open("input.txt", "rb", encoding="utf-16")  
output = codecs.open("output.txt", "wb", encoding="utf-8")

# À¯´ÏÄÚµå µ¥ÀÌÅÍ Á¶°¢µéÀ» ½ºÆ®¸®¹ÖÇÑ´Ù
with input, output:  
    while True:
        # µ¥ÀÌÅÍ Á¶°¢À» Àаí
        chunk = input.read(4096)
        if not chunk:
            break
        # ¼öÁ÷ ÅÇÀ» »èÁ¦ÇÑ´Ù
        chunk = chunk.replace(u"\u000B", u"")
        # µ¥ÀÌÅÍ Á¶°¢À» ¾´´Ù
        output.write(chunk)
ÆÄÀÏÀº ²ûÂïÇØ! ÀÌÅÍ·¹ÀÌÅÍ »ç¿ëÇϱâ
ÆÄÀÏÀº ´Ù·ç±â°¡ Á» Áö·çÇÏ´Ù. º¹ÀâÇÑ Ã³¸® °úÁ¤¿¡´Â À¯´ÏÄÚµå µ¥ÀÌÅÍÀÇ ÀÌÅÍ·¹ÀÌÅ͸¦ ´Ù·ç´Â ÆíÀÌ ±ò²ûÇÒ °ÍÀÌ´Ù.

¾Æ·¡´Â iterdecode()¸¦ »ç¿ëÇÏ¿©, ÆÄÀÏÀ» À¯´ÏÄÚµå µ¥ÀÌÅÍ Á¶°¢ÀÇ ÀÌÅÍ·¹ÀÌÅÍ·Î Àд ȿ°úÀûÀÎ ¹æ¹ýÀÌ´Ù.

from functools import partial  
from codecs import iterdecode

# ƯÁ¤ pathÀÇ ÆÄÀÏÀ» À¯´ÏÄÚµå Á¶°¢ÀÇ ÀÌÅÍ·¹ÀÌÅÍ·Î ¸®ÅÏÇÑ´Ù
def iter_unicode_chunks(path, encoding):  
    # ÀÐÀ» ÆÄÀÏÀ» ¿¬´Ù
    with open(path, "rb") as input:
        # ¹ÙÀ̳ʸ® ÆÄÀÏÀ» ¹ÙÀ̳ʸ® Á¶°¢À¸·Î º¯È¯ÇÑ´Ù
        binary_chunks = iter(partial(input.read, 1), "")
        # ¹ÙÀ̳ʸ® Á¶°¢À» À¯´ÏÄÚµå Á¶°¢À¸·Î º¯È¯ÇÑ´Ù
        for unicode_chunk in iterdecode(binary_chunks, encoding):
            yield unicode_chunk
ÀÌÁ¦ iterencode() ¸Þ¼­µå¸¦ »ç¿ëÇÏ¿©, À¯´ÏÄÚµå Á¶°¢ÀÇ ÀÌÅÍ·¹ÀÌÅ͸¦ ÆÄÀÏ¿¡ ½áº¸ÀÚ.

from codecs import iterencode

# À¯´ÏÄÚµå Á¶°¢ÀÇ ÀÌÅÍ·¹ÀÌÅ͸¦ ƯÁ¤ pathÀÇ ÆÄÀÏ¿¡ ¾´´Ù
def write_unicode_chunks(path, unicode_chunks, encoding):  
    # ¾µ ÆÄÀÏÀ» ¿¬´Ù
    with open(path, "wb") as output:
        # À¯´ÏÄÚµå Á¶°¢À» ¹ÙÀ̳ʸ®·Î º¯È¯ÇÑ´Ù
        for binary_chunk in iterencode(unicode_chunks, encoding):
            output.write(binary_chunk)
ÀÌ µÎ ÇÔ¼ö¿Í ÇÔ²² À¯´ÏÄÚµå µ¥ÀÌÅÍÀÇ ½ºÆ®¸²¿¡¼­ ¼öÁ÷ ÅÇÀ» ¾ø¾Ö´Â ÀÏÀÌ ¸¶¹ý °°ÀÌ ³¡³­´Ù(just becomes a case of plumbing everything together).

# ÆÄÀÏÀ» À¯´ÏÄÚµå Á¶°¢ ÇüÅ·ΠÀд´Ù
unicode_chunks = iter_unicode_chunks("input.txt", encoding="utf-16")

# À¯´ÏÄÚµå Á¶°¢À» ¼öÁ¤ÇÑ´Ù
unicode_chunks = (  
    chunk.replace(u"\u000B", u"")
    for chunk
    in unicode_chunks
)

# À¯´ÏÄÚµå Á¶°¢À» ÆÄÀÏ¿¡ ÀúÀåÇÑ´Ù
write_unicode_chunks("output.txt", unicode_chunks, encoding="utf-8")  
°ÅâÇÏ°Ô codecs ¸ðµâÀ» »ç¿ëÇØ¾ß ÇÒ±î?
¾óÇÍ ±×³É, str.decode()¿Í unicode.encode() ¸Þ¼­µå¸¦ »ç¿ëÇÏ¿© Å« file °´Ã¼¸¦ ¹ÙÀ̳ʸ® Á¶°¢À¸·Î Àаí, ÀÎÄÚµùÇÏ°í µðÄÚµùÇÏ´Â ÆíÀÌ °£´ÜÇÏ´Ù°í »ý°¢ÇÒ ¼öµµ ÀÖ°Ú´Ù.

# ³ª»Û ¿¹½Ã. ÀÌ·¸°Ô ÇÏÁö ¸¶½Ã¿À!

# ÀÔ·Â ½ºÆ®¸²°ú Ãâ·Â ½ºÆ®¸²À» ¿¬´Ù
with open("input.txt", "rb") as input, open("output.txt", "wb") as output:  
    # ¹ÙÀ̳ʸ® µ¥ÀÌÅÍ Á¶°¢µéÀ» ¼øȸÇÑ´Ù
    while True:
        # µ¥ÀÌÅÍ Á¶°¢À» Àд´Ù
        chunk = input.read(4096)
        if not chunk:
            break
        # À§Çè: ¹ÙÀ̳ʸ® µ¥ÀÌÅ͸¦ utf-16À¸·Î µðÄÚµùÇÑ´Ù
        chunk = chunk.decode("utf-16")
        # ¼öÁ÷ ÅÇÀ» »èÁ¦ÇÑ´Ù
        chunk = chunk.replace(u"\u000B", u"")
        # À¯´ÏÄÚµå µ¥ÀÌÅ͸¦ utf-8·Î ÀÎÄÚµùÇÑ´Ù
        chunk = chunk.encode("utf-8")
        # µ¥ÀÌÅÍ Á¶°¢À» ¾´´Ù
        output.write(chunk)
ºÒÇàÈ÷µµ ¸î¸î À¯´ÏÄÚµå ÄÚµåÆ÷ÀÎÆ®´Â ¹ÙÀ̳ʸ® µ¥ÀÌÅÍÀÇ ÇÑ ¹ÙÀÌÆ® ÀÌ»óÀ¸·Î ÀÎÄÚµùµÈ´Ù. µû¶ó¼­ ´Ü¼øÈ÷ ÆÄÀÏ¿¡¼­ ¹ÙÀÌÆ® Á¶°¢µéÀ» Àо decode() ¸Þ¼­µå¸¦ Àû¿ëÇÏ¸é ¿¹±âÄ¡ ¾Ê°Ô UnicodeDecodeError°¡ ¹ß»ýÇÒ ¼öµµ ÀÖ´Ù. ÀÌ´Â ¹ÙÀÌÆ® ÇÑ Á¶°¢ÀÌ ¿©·¯ ¹ÙÀÌÆ®ÀÇ ÄÚµåÆ÷ÀÎÆ®·Î ºÐ¸®µÇ¾ú±â ¶§¹®ÀÌ´Ù.

codecs ¸ðµâÀÇ µµ±¸µéÀ» »ç¿ëÇϸé ÀÌ·¯ÇÑ ¿¹±âÄ¡ ¾ÊÀº Ãæµ¹À» ¿¹¹æÇÒ ¼ö ÀÖ´Ù.

ÆÄÀ̽ã 3¿¡¼­´Â?
ÆÄÀ̽ã 3¿¡¼­´Â ÈξÀ ´Ü¼øÇÏ°Ô À¯´ÏÄÚµå ÆÄÀÏÀ» ´Ù·ê ¼ö ÀÖ´Ù. ºôÆ®ÀÎ ¸Þ¼­µåÀÎ open()Àº À¯´ÏÄÚµå µ¥ÀÌÅ͸¦ ¼öÁ¤Çϰųª ÀÎÄÚµùÀ» º¯°æÇÏ´Â µ¥ ÇÊ¿äÇÑ ±â´ÉÀ» Æ÷ÇÔÇÏ°í ÀÖ´Ù.

# ÀÔ·Â ½ºÆ®¸²°ú Ãâ·Â ½ºÆ®¸²À» ¿¬´Ù
input = open("input.txt", "rt", encoding="utf-16")  
output = open("output.txt", "wt", encoding="utf-8")

# À¯´ÏÄÚµå µ¥ÀÌÅÍ Á¶°¢µéÀ» ½ºÆ®¸®¹ÖÇÑ´Ù
with input, output:  
    while True:
        # µ¥ÀÌÅÍ Á¶°¢À» Àаí
        chunk = input.read(4096)
        if not chunk:
            break
        # ¼öÁ÷ ÅÇÀ» »èÁ¦ÇÑ´Ù
        chunk = chunk.replace("\u000B", "")
        # µ¥ÀÌÅÍ Á¶°¢À» ¾´´Ù
        output.write(chunk)
ÆÄÀ̽ã 3ÀÇ ½Ã´ë´Ù! Áñ°Ì°Ô ÄÚµùÇϱæ!
ÃßõÃßõ : 335 Ãßõ ¸ñ·Ï
¹øÈ£ Á¦¸ñ
2,705
jQuery.getJSON, JSON µ¥ÀÌÅ͸¦ ·Îµå
2,704
jQuery length¿Í slideToggle ¿¹Á¦
2,703
[CSS] before¿Í after ¼±ÅÃÀÚ¸¦ ÀÌ¿ëÇÏ¿© Float ¼Ó¼º ÇØÁ¦
2,702
jQuery toggleClass() Methods
2,701
jQuery stop(),Callback,Chaining
2,700
jQuery Effects - Animation
2,699
jQuery Effects - Sliding
2,698
jQuery fadeIn(), fadeOut(), fadeToggle(), fadeTo()
2,697
jQuery hide(), show(), toggle()
2,696
jquery trigger(), ÇÔ¼ö ½ÇÇà½ÃÅ°±â
2,695
ºñ¹Ð¹øÈ£(Æнº¿öµå) À¯È¿¼º üũ (¹®ÀÚ, ¼ýÀÚ, Ư¼ö¹®ÀÚÀÇ Á¶ÇÕÀ¸·Î 6~16ÀÚ¸®)
2,694
PHP ¹®ÀÚ¿­¿¡ ÇѱÛÀÌ Æ÷ÇԵǾî ÀÖ´ÂÁö üũ
2,693
ÇѱÛüũ(preg match)
2,692
javascript escape/unescape -> php
2,691
Æú´õ¾ÈÀÇ ÆÄÀÏ °¡Áö°í ³î±â
2,690
Á¤±Ô Ç¥Çö½ÄÀ¸·Î ÇÑ±Û ¹®ÀÚ Æ÷ÇÔÇÏ´ÂÁö È®ÀÎÇϱâ
2,689
À¯´ÏÄڵ带 ÀÌ¿ë ÇÑ±Û Ãʼº, Áß¼º, Á¾¼ºÀ¸·Î ÂÉ°³±â - ÇѱÛ/¿µ¹® º¯È¯
2,688
ÇÑ±Û Ãʼº,Áß¼º,Á¾¼º ÀÚ¸£±â ÇÔ¼ö
ÆÄÀ̽㿡¼­ À¯´ÏÄÚµå ½ºÆ®¸² ´Ù·ç±â
2,686
Python 2.x ÇÑ±Û ÀÎÄÚµù °ü·Ã Á¤¸®
2,685
%uABCD%u1234 °°Àº urlÀÇ À¯´ÏÄÚµå µðÄÚµù
2,684
ÇѱÛ, ÇÑÀÚ¸¦ À¯´ÏÄÚµå·Î º¯È¯Çϱâ
2,683
°æ·Î ºÐ¸®Çϱâ
2,682
ÇÑ±Û ASCII Äڵ带 À¥ URL¿¡¼­ »ç¿ëµÇ´Â UTF8 ÄÚµå·Î º¯È¯
2,681
Unicode (UTF-8) ÀÎÄÚµù¿¡¼­ ÇÑ±Û ±ÛÀÚ¼ö Ãâ·Â ¹× iconv substr
2,680
PHP¿¡¼­ À¯´ÏÄÚµå·Î ¹®ÀÚ¿­ ÀÎÄÚµùÇϱâ
2,679
[JavaScript] µÚ·Î°¡±â È÷½ºÅ丮°¡ ¾ø´Â °ÍÀ» ¾î¶»°Ô ¾Ë ¼ö ÀÖÀ»±î?
2,678
htmlspecialchars, entity decode
2,677
Php : Finding Chrome and Safari Browsers
2,676
[Shoutcast] jPlayer and Shoutcast Configuration
¸ñ·Ï
¹ÂÁ÷Æ®·ÎÆ® ºÎ»ê±¤¿ª½Ã ºÎ»êÁø±¸ °¡¾ßµ¿ ¤Ó °³ÀÎÁ¤º¸Ãë±Þ¹æħ
Copyright ¨Ï musictrot All rights reserved.