Wikipedia In computer science, edit distance is a way of quantifying how dissimilar two strings (e.g., words) are to one another by counting the minimum number of operations required to transform one string into the other.
Given two strings a and b on an alphabet Σ (e.g. the set of ASCII characters, the set of bytes [0..255], etc.), the edit distance d(a, b) is the minimum-weight series of edit operations that transforms a into b. One of the simplest sets of edit operations is that defined by Levenshtein in 1966:[2]
Insertion of a single symbol. If a = uv, then inserting the symbol x produces uxv. This can also be denoted ε→x, using ε to denote the empty string.
Deletion of a single symbol changes uxv to uv (x→ε).
Substitution of a single symbol x for a symbol y ≠ x changes uxv to uyv (x→y)辑
Baidu 编辑距离(Edit Distance),又称Levenshtein距离,是指两个字串之间,由一个转成另一个所需的最少编辑操作次数。许可的编辑操作包括将一个字符替换成另一个字符,插入一个字符,删除一个字符。一般来说,编辑距离越小,两个串的相似度越大。 例如将kitten转成sitting: kitten->sitten (k→s) sitten->sittin (e→i) sittin->sitting (插入g) 俄罗斯科学家Vladimir Levenshtein在1965年提出这个概念。
在python中,我们通常使用import editdistance
来直接调用,但是,在Windows下我们直接pip install editdistance
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 C:\Users\Work>pip install editdistance Collecting editdistance Using cached editdistance-0.3.1.tar.gz Building wheels for collected packages: editdistance Running setup.py bdist_wheel for editdistance ... error Complete output from command C:\ProgramData\Anaconda3\envs\python2.7\python.exe -u -c "import setuptools, tokenize;__file__='c:\\users\\work\\appdata\\local\\temp\\pip-build-w4myie\\editdistance\\setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" bdist_wheel -d c:\users\work\appdata\local\temp\tmpnrvewtpip-wheel- --python-tag cp27: running bdist_wheel running build running build_py creating build creating build\lib.win-amd64-2.7 creating build\lib.win-amd64-2.7\editdistance copying editdistance\__init__.py -> build\lib.win-amd64-2.7\editdistance copying editdistance\_editdistance.h -> build\lib.win-amd64-2.7\editdistance copying editdistance\def.h -> build\lib.win-amd64-2.7\editdistance running build_ext building 'editdistance.bycython' extension creating build\temp.win-amd64-2.7 creating build\temp.win-amd64-2.7\Release creating build\temp.win-amd64-2.7\Release\editdistance C:\Users\Work\AppData\Local\Programs\Common\Microsoft\Visual C++ for Python\9.0\VC\Bin\amd64\cl.exe /c /nologo /Ox /MD /W3 /GS- /DNDEBUG -I./editdistance -IC:\ProgramData\Anaconda3\envs\python2.7\include -IC:\ProgramData\Anaconda3\envs\python2.7\PC /Tpeditdistance/_editdistance.cpp /Fobuild\temp.win-amd64-2.7\Release\editdistance/_editdistance.obj _editdistance.cpp editdistance/_editdistance.cpp : warning C4819: The file contains a character that cannot be represented in the current code page (936). Save the file in Unicode format to prevent data loss C:\Users\Work\AppData\Local\Programs\Common\Microsoft\Visual C++ for Python\9.0\VC\Include\xlocale(342) : warning C4530: C++ exception handler used, but unwind semantics are not enabled. Specify /EHsc editdistance/_editdistance.cpp(117) : error C2059: syntax error : 'if' editdistance/_editdistance.cpp(118) : error C2059: syntax error : 'else' editdistance/_editdistance.cpp(119) : error C2059: syntax error : 'else' editdistance/_editdistance.cpp(120) : error C2059: syntax error : 'else' editdistance/_editdistance.cpp(121) : error C2059: syntax error : 'else' editdistance/_editdistance.cpp(122) : error C2059: syntax error : 'else' editdistance/_editdistance.cpp(123) : error C2059: syntax error : 'else' editdistance/_editdistance.cpp(124) : error C2059: syntax error : 'else' editdistance/_editdistance.cpp(125) : error C2059: syntax error : 'else' editdistance/_editdistance.cpp(126) : error C2059: syntax error : 'else' editdistance/_editdistance.cpp(127) : error C2059: syntax error : 'return' editdistance/_editdistance.cpp(128) : error C2059: syntax error : '}' editdistance/_editdistance.cpp(128) : error C2143: syntax error : missing ';' before '}' editdistance/_editdistance.cpp(128) : error C2059: syntax error : '}' error: command 'C:\\Users\\Work\\AppData\\Local\\Programs\\Common\\Microsoft\\Visual C++ for Python\\9.0\\VC\\Bin\\amd64\\cl.exe' failed with exit status 2 ---------------------------------------- Failed building wheel for editdistance Running setup.py clean for editdistance Failed to build editdistance Installing collected packages: editdistance Running setup.py install for editdistance ... error Complete output from command C:\ProgramData\Anaconda3\envs\python2.7\python.exe -u -c "import setuptools, tokenize;__file__='c:\\users\\work\\appdata\\local\\temp\\pip-build-w4myie\\editdistance\\setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record c:\users\work\appdata\local\temp\pip-xfiuou-record\install-record.txt --single-version-externally-managed --compile: running install running build running build_py creating build creating build\lib.win-amd64-2.7 creating build\lib.win-amd64-2.7\editdistance copying editdistance\__init__.py -> build\lib.win-amd64-2.7\editdistance copying editdistance\_editdistance.h -> build\lib.win-amd64-2.7\editdistance copying editdistance\def.h -> build\lib.win-amd64-2.7\editdistance running build_ext building 'editdistance.bycython' extension creating build\temp.win-amd64-2.7 creating build\temp.win-amd64-2.7\Release creating build\temp.win-amd64-2.7\Release\editdistance C:\Users\Work\AppData\Local\Programs\Common\Microsoft\Visual C++ for Python\9.0\VC\Bin\amd64\cl.exe /c /nologo /Ox /MD /W3 /GS- /DNDEBUG -I./editdistance -IC:\ProgramData\Anaconda3\envs\python2.7\include -IC:\ProgramData\Anaconda3\envs\python2.7\PC /Tpeditdistance/_editdistance.cpp /Fobuild\temp.win-amd64-2.7\Release\editdistance/_editdistance.obj _editdistance.cpp editdistance/_editdistance.cpp : warning C4819: The file contains a character that cannot be represented in the current code page (936). Save the file in Unicode format to prevent data loss C:\Users\Work\AppData\Local\Programs\Common\Microsoft\Visual C++ for Python\9.0\VC\Include\xlocale(342) : warning C4530: C++ exception handler used, but unwind semantics are not enabled. Specify /EHsc editdistance/_editdistance.cpp(117) : error C2059: syntax error : 'if' editdistance/_editdistance.cpp(118) : error C2059: syntax error : 'else' editdistance/_editdistance.cpp(119) : error C2059: syntax error : 'else' editdistance/_editdistance.cpp(120) : error C2059: syntax error : 'else' editdistance/_editdistance.cpp(121) : error C2059: syntax error : 'else' editdistance/_editdistance.cpp(122) : error C2059: syntax error : 'else' editdistance/_editdistance.cpp(123) : error C2059: syntax error : 'else' editdistance/_editdistance.cpp(124) : error C2059: syntax error : 'else' editdistance/_editdistance.cpp(125) : error C2059: syntax error : 'else' editdistance/_editdistance.cpp(126) : error C2059: syntax error : 'else' editdistance/_editdistance.cpp(127) : error C2059: syntax error : 'return' editdistance/_editdistance.cpp(128) : error C2059: syntax error : '}' editdistance/_editdistance.cpp(128) : error C2143: syntax error : missing ';' before '}' editdistance/_editdistance.cpp(128) : error C2059: syntax error : '}' error: command 'C:\\Users\\Work\\AppData\\Local\\Programs\\Common\\Microsoft\\Visual C++ for Python\\9.0\\VC\\Bin\\amd64\\cl.exe' failed with exit status 2 ---------------------------------------- Command "C:\ProgramData\Anaconda3\envs\python2.7\python.exe -u -c "import setuptools, tokenize;__file__='c:\\users\\work\\appdata\\local\\temp\\pip-build-w4myie\\editdistance\\setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record c:\users\work\appdata\local\temp\pip-xfiuou-record\install-record.txt --single-version-externally-managed --compile" failed with error code 1 in c:\users\work\appdata\local\temp\pip-build-w4myie\editdistance\
于是,我就去 PYPI 下载了一下 editdistance-0.3.1.tar.gz
1 $ python setup.py install
editdistance/ _editdistance.cpp (117) : error C2059: syntax error
好的我懂了…… 我把文件里所有的日文注释全都删掉了,然后再回到setup.py的位置,
1 2 $ python setup.py install $ python -c "import editdistance"