This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: commands.mkarg() buggy in East Asian locales
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 2.7
process
Status: closed Resolution: out of date
Dependencies: Superseder:
Assigned To: Nosy List: glyph, iritkatriel, jwilk, mirabilos, r.david.murray
Priority: low Keywords:

Created on 2014-08-12 18:13 by jwilk, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
test-mkargs.py jwilk, 2014-08-12 18:13
test.sh jwilk, 2014-09-05 18:34
Messages (9)
msg225235 - (view) Author: Jakub Wilk (jwilk) Date: 2014-08-12 18:13
This is how shell quoting in commands.mkarg() is implemented:

def mkarg(x):
    if '\'' not in x:
        return ' \'' + x + '\''
    s = ' "'
    for c in x:
        if c in '\\$"`':
            s = s + '\\'
        s = s + c
    s = s + '"'
    return s

This is unfortunately not compatible with the way bash splits arguments in some locales.
The problem is that in a few East Asian encodings (at least BIG5, BIG5-HKSCS, GB18030, GBK), the 0x5C byte (backslash in ASCII) could be the second byte of a two-byte character; and bash apparently decodes the strings before splitting.

PoC:

$ sh --version | head -n1
GNU bash, version 4.3.22(1)-release (i486-pc-linux-gnu)

$ LC_ALL=C python test-mkargs.py
crw-rw-rw- 1 root root 1, 3 Aug 12 16:00 /dev/null
ls: cannot access " ; python -c 'import this' | grep . | shuf | head -n1 | cowsay -y ; ": No such file or directory

$ LC_ALL=zh_CN.GBK python test-mkargs.py
crw-rw-rw- 1 root root 1, 3 8月  12 16:00 /dev/null
ls: 无法访问乗: No such file or directory
 ________________________________
< Simple is better than complex. >
 --------------------------------
        \   ^__^
         \  (..)\_______
            (__)\       )\/\
                ||----w |
                ||     ||
sh: 乗: 未找到命令
msg225237 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2014-08-12 18:34
For the record, neither this module nor this routine exist in python3, so this is a python2 only issue.

I'm not sure I fully understand the problem, but perhaps a possible strategy is to apply the fixes to python2's pipes.quote that were applied in python3 (where the function was further moved to shlex), and use that instead of mkarg.
msg226423 - (view) Author: mirabilos (mirabilos) Date: 2014-09-05 13:45
Just for the record, please do not assume all shells behave like GNU bash.
msg226426 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2014-09-05 15:49
That is true, but sh-alikes (posix standard) are the only ones we support in commands.  subprocess (which commands was folded in to in python3) also supports windows cmd to the extent we've managed, but that's all we are committed to support.
msg226439 - (view) Author: Jakub Wilk (jwilk) Date: 2014-09-05 18:34
I think what mirabilos meant (and what I should have mentioned in my initial message) is that even sh-alikes don't necessarily behave the same way as bash:

$ bash test.sh 
乗

$ ksh test.sh 
乗

$ dash test.sh 
test.sh: 2: test.sh: Syntax error: Unterminated quoted string

$ mksh test.sh 
test.sh[2]: no closing quote

$ posh test.sh 
test.sh:2: no closing quote
msg228095 - (view) Author: Glyph Lefkowitz (glyph) (Python triager) Date: 2014-10-01 17:20
Would simply replacing this function with pipes.quote resolve the issue?
msg228665 - (view) Author: Jakub Wilk (jwilk) Date: 2014-10-06 12:03
Something like this should be safe:

def mkarg(x):
    ' ' + pipes.quote(x)
msg228667 - (view) Author: Jakub Wilk (jwilk) Date: 2014-10-06 12:04
Err, with return of course. :-)
msg382233 - (view) Author: Irit Katriel (iritkatriel) * (Python committer) Date: 2020-12-01 11:00
Python 2 only issue.
History
Date User Action Args
2022-04-11 14:58:06adminsetgithub: 66383
2020-12-01 11:00:24iritkatrielsetstatus: open -> closed

nosy: + iritkatriel
messages: + msg382233

resolution: out of date
stage: needs patch -> resolved
2016-09-24 19:07:29christian.heimessetpriority: normal -> low
type: security -> behavior
stage: needs patch
2014-10-06 12:04:34jwilksetmessages: + msg228667
2014-10-06 12:03:45jwilksetmessages: + msg228665
2014-10-01 17:20:07glyphsetnosy: + glyph
messages: + msg228095
2014-09-05 18:34:16jwilksetfiles: + test.sh

messages: + msg226439
2014-09-05 15:49:50r.david.murraysetmessages: + msg226426
2014-09-05 13:45:12mirabilossetnosy: + mirabilos
messages: + msg226423
2014-08-12 18:34:07r.david.murraysetnosy: + r.david.murray
messages: + msg225237
2014-08-12 18:13:05jwilkcreate