Due historical reasons our country does not have
standard encoding. So sometimes it is impossible to read
content of files. In this cases you should use special programs
which convert encodings. We are hope you choose rusconv.
Rusconv is the program with a lot of flags. If you
forget some of them you can get help from rusconv.
To do this simple run rusconv without any arguments or
give flag '-h'.
Suppose, you work in windows, found old DOS program and
wish to remember how to run it. But "Notepad" instead of
documentation prints unreadable text. To read it you first
should convert it from DOS encoding to windows one:
As any other UNIX utility, UNIX version of rusconv is
not verbose. By default it prints only warnings and error
messages. To print all messages use flag '-v'.
Even if text written in latinica there is no
guarantee that it will be possible to read it on any
operating system. In DOS and Windows end of line is coded
by two chars, in UNIX - by one.
Every operating system use own methos of creating russian files.
When converting, rusconv creates new files. But sometimes
you don't need them or you wish only replace encoding
in specified file. In this case use flag '-o'. Then
rusconv for begin creates temporary file where results of recoding
will be placed and then moves this temporary file on place of source.
If any error occurs then source file will be unchanged and
temporary file will be contain text converted before error.
Sometimes, especially when you create web site, file should
be converted to several encodings. For example, you write
HTML pages in DOS but your homepage is in windows and KOI encodings.
You can run rusconv twice but better do so:
When you wish convert a group of files from one encoding
to another it is convenient to convert all group simultaniously.
To do this only write all file names after flags. You can use
metachars - rusconv will find all appropriate files. In UNIX
version use metachars with caution, see "Specifing output directory"
for more information.
By default files with results are created in current
directory. Usually it is directory from which you run rusconv.
But if last argument is directory name then files will be
created in this directory.
Version 3.0 of rusconv was released for DOS and UNIX.
To use long file names version 3.11 has release for windows.
Because of use of new operating system functions
this version can't be run on computer without Windows 95/98.
But now you can use long file names and network files.
Now it is time to say about flags '--', '-s', '-v', '-close' and
'-noclose'. They are usually used in command scripts.
To recognize file encoding use program whatrus.
This program is distributed with rusconv.
If you are going to use rusconv in command scripts then
consider this advises.
In past most popular encoding was alternative (or codepage 866)
which was used in MS-DOS. Now leader is windows encoding (codepage 1251).
But encoding KOI-8 is still important. It was very usefull in
first stages of growing of Russian Internet. Very rarely you can
find text written in Macintosh. There were a lot of other
encodings but they are died. It is impossibe to use one encoding
to read text written in another one.To avoid this some
people write texts in latinica (transliteracija, volapjuk) -
russian text spelled latin letters. More, different operating
systems use different methods to code end of line. DOS and
Windows code it by two chars, UNIX - by one. So text written
in UNIX will be as one long line in DOS/windows.
All this troubles can be solved by rusconv.
Examples:
DOS:
C:\UTIL>RUSCONV
C:\UTIL>rusconv /h
UNIX:
$rusconv -h
$rusconv
Using of rusconv in windows is more difficult than
in other operating systems because rusconv is command-line
oriented program. It is recommended to use any file manager
like Norton Commander (we recommend Windows Commander).
Then usage of rusconv in windows will be simpler.
Anyway, you can run program from menu "Start". In this menu
choose item "Run...", write full path to rusconv (better use
button "Browse..."), add flags and files and press button "OK".
As any other UNIX utility, UNIX version is not verbose.
It prints only short list of flags. To get more help try
$man rusconv
The best way is to use HTML documentation.
C:\GAMES\WARCRAFT>rusconv -alt +win read.me
** rusconv -- convertor of Russian codepages, v.3.11.
** (c)w_re -- Oleg A. Paraschenko http://beta.math.spbu.ru/~prof/w_re/
.\read.me -> .\read.win: ok.
1 file(s) converted.
After converting in current directory (usually it is
directory from which you run rusconv, here - c:\games\warcraft)
will be created file with the same name but with other extension.
Extension shows what encoding is in a new file.
Here a list of extensions for encodings:
.alt - alternative encoding, uses in DOS
.koi - KOI-8 encoding, uses in UNIX
.lat - latinica, russian text spelled latin letters
.mac - Macintosh encoding
.win - Windows encoding
To specify from which encoding to convert use one of flags
In next example will be created file 'test.file' which
contains text "Проверка флага '-v'." (testing '-v' flag.).
This file will be in KOI-8 encoding. For begin we convert this
file to windows encoding without flag '-v'. File 'test.win'
will be created but you don't get any message from rusconv.
Then we converts file to latinica using flag '-v'.
To finish example, we check content of file 'test.lat'.
$echo Проверка флага '-v'. >test.file
$rusconv -koi +win test.file
$rusconv -v -koi +lat test.file
** rusconv -- convertor of Russian codepages, v.3.11.
** (c)w_re -- Oleg A. Paraschenko http://beta.math.spbu.ru/~prof/w_re/
./test.file -> ./test.lat: ok.
1 file(s) converted.
$cat test.lat
Proverka flaga '-v'.
Suppose, you work in DOS or windows and downloaded
from Internet text file created in UNIX. Notepad shows this text
as one long line with funny chars where should be line breaks.
To convert text to normal view use command
C:\NEW>rusconv -cr2crlf readme.txt
** rusconv -- convertor of Russian codepages, v.3.11.
** (c)w_re -- Oleg A. Paraschenko http://beta.math.spbu.ru/~prof/w_re/
.\readme.txt -> .\readme.crlf: ok.
1 file(s) converted.
Result of converting will be in file with the same name
and with extension .crlf (UNIX and windows) or
.crl (DOS). In this example - in 'readme.crlf'.
If you are in UNIX then incorrect format leads to another
problem. At the end of lines text editors print additional char,
some programs could not be compiled. To solve this problem use
flag '-crlf2cr':
$rusconv -crlf2cr -v files.bbs
** rusconv -- convertor of Russian codepages, v.3.11.
** (c)w_re -- Oleg A. Paraschenko http://beta.math.spbu.ru/~prof/w_re/
./files.bbs -> ./files.cr: ok.
1 file(s) converted.
For correct converting you should consider encoding
and type of line ends. Here is minimal set of flags to do it:
From DOS to UNIX : -alt +koi -crlf2cr
From UNIX to DOS : -koi +alt -cr2crlf
From windows to UNIX : -win +koi -crlf2cr
From UNIX to windows : -koi +win -cr2crlf
From DOS to windows : -alt +win
From windows to DOS : -win +alt
Probably most often tasks is converting text from UNIX
to DOS, from UNIX to windows and back. Converting between
DOS and windows styles usually unnecessary - for windows texts
you can use Notepad, for DOS texts use can you old DOS
file managers like Norton Commander.
It is not good idea to type every time this sets of flags.
So you can use abbreviations:
-dos2unix - the same as '-alt +koi -crlf2cr'
-unix2dos - the same as '-koi +alt -cr2crlf'
-win2unix - the same as '-win +koi -crlf2cr'
-unix2win - the same as '-koi +win -cr2crlf'
This abbreviations are usefull but and they are long enough.
So you can cut them:
-d2u - the same as '-dos2unix'
-u2d - the same as '-unix2dos'
-w2u - the same as '-win2unix'
-u2w - the same as '-unix2win'
rusconv -w2u index.html
** rusconv -- convertor of Russian codepages, v.3.11.
** (c)w_re -- Oleg A. Paraschenko http://beta.math.spbu.ru/~prof/w_re/
.\index.html -> .\index.koi: ok.
1 file(s) converted.
Example:
D:\HTML>rusconv -o -w2u index.html
** rusconv -- convertor of Russian codepages, v.3.11.
** (c)w_re -- Oleg A. Paraschenko http://beta.math.spbu.ru/~prof/w_re/
.\index.html -> D:\HTML\rcA290.TMP -> .\index.html: ok.
1 file(s) converted.
C:\HTML>rusconv -alt +koi +win index-pre.html
** rusconv -- convertor of Russian codepages, v.3.11.
** (c)w_re -- Oleg A. Paraschenko http://beta.math.spbu.ru/~prof/w_re/
.\index-pre.html -> .\index-pre.koi, .\index-pre.win: ok.
1 file(s) converted.
Results are placed in files with the same names and with
default extensions:
.alt - for alternative encoding
.koi - for KOI-8 encoding
.lat - for latinica
.mac - for Macintosh encoding
.win - for Windows encoding
Often default extensions are not convenient. Then you
can define you own extension. To do this use commands:
-aext extension - for alternative encoding
-kext extension - for KOI-8 encoding
-lext extension - for latinica
-mext extension - for Macintosh encoding
-wext extension - for Windows encoding
For example, you typed russian alphabet in windows encoding
and wish to know how it looks in all other encodings. More,
you wish that results should be in text files. No problems:
E:\EX>dir
folder E:\EX
. <FOLDER> 24.10.98 15:27 .
.. <FOLDER> 24.10.98 15:27 ..
ALPHABET TXT 66 02.10.98 13:15 alphabet.txt
E:\EX>rusconv -win +alt -aext alt.txt +koi -kext koi.txt
+lat -lext lat.txt +mac -mext mac.txt +win -wext win.txt alphabet.txt
** rusconv -- convertor of Russian codepages, v.3.11.
** (c)w_re -- Oleg A. Paraschenko http://beta.math.spbu.ru/~prof/w_re/
.\alphabet.txt -> .\alphabet.alt.txt, .\alphabet.koi.txt,
.\alphabet.mac.txt, .\alphabet.lat.txt, .\alphabet.win.txt: ok.
1 file(s) converted.
E:\EX>dir
folder E:\EX
. <FOLDER> 24.10.98 15:27 .
.. <FOLDER> 24.10.98 15:27 ..
ALPHAB~1 TXT 66 24.10.98 16:24 alphabet.alt.txt
ALPHAB~2 TXT 66 24.10.98 16:24 alphabet.koi.txt
ALPHAB~3 TXT 66 24.10.98 16:24 alphabet.mac.txt
ALPHAB~4 TXT 82 24.10.98 16:24 alphabet.lat.txt
ALPHAB~5 TXT 66 24.10.98 16:24 alphabet.win.txt
ALPHABET TXT 66 02.10.98 13:15 alphabet.txt
If you wish change extension you can use one of commands
'aext', 'kext', 'lext', 'mext' or 'wext'.
But if you converts to only one encoding then it is more better
to use command
-ext extension
Depending on target encoding this command is interpreted
as one of commands '-aext extension',
'-kext extension', '-lext extension',
'-mext extension' or '-wext extension'.
Using command '-ext' you can also redefine default extensions
'.cr' and '.crlf' when you change only type of end of lines:
E:\EX>rusconv -cr2crlf unixtext
** rusconv -- convertor of Russian codepages, v.3.11.
** (c)w_re -- Oleg A. Paraschenko http://beta.math.spbu.ru/~prof/w_re/
.\unixtext -> .\unixtext.crlf: ok.
1 file(s) converted.
E:\EX>rusconv -cr2crlf -ext txt unixtext
** rusconv -- convertor of Russian codepages, v.3.11.
** (c)w_re -- Oleg A. Paraschenko http://beta.math.spbu.ru/~prof/w_re/
.\unixtext -> .\unixtext.txt: ok.
1 file(s) converted.
C:\HTML>rusconv -alt +koi +win -kext koi.html -wext win.html *.txt
** rusconv -- convertor of Russian codepages, v.3.11.
** (c)w_re -- Oleg A. Paraschenko http://beta.math.spbu.ru/~prof/w_re/
.\rusconv.txt -> .\rusconv.koi.html, .\rusconv.win.html: ok.
.\readme.txt -> .\readme.koi.html, .\readme.win.html: ok.
.\index.txt -> .\index.koi.html, .\index.win.html: ok.
3 file(s) converted.
In UNIX use metachars with caution. Here interpretating
of metachars is the work of operating system and program
get ready list of arguments. There is no any guarantee
that last argument is not a directory. So do not forget
to specify output directory:
Content of current directory:
$ls -l
-rwxr-xr-x 1 w_re w_re 21394 Oct 25 02:27 file1.html
-rwxr-xr-x 1 w_re w_re 21394 Oct 25 02:27 file2.html
drwxr-xr-x 2 w_re w_re 1024 Oct 25 02:27 res
May be error:
$rusconv -v -w2u *
// After interpetating:
// rusconv -v -w2u file1.html file2.html res
** rusconv -- convertor of Russian codepages, v.3.11.
** (c)w_re -- Oleg A. Paraschenko http://beta.math.spbu.ru/~prof/w_re/
./file1.html -> res/file1.koi: ok.
./file2.html -> res/file2.koi: ok.
2 file(s) converted.
To create files in current directory:
$rusconv -v -w2u * .
// After interpetating:
// rusconv -v -w2u file1.html file2.html res .
** rusconv -- convertor of Russian codepages, v.3.11.
** (c)w_re -- Oleg A. Paraschenko http://beta.math.spbu.ru/~prof/w_re/
warning: 'res' is a directory, skipping.
./file1.html -> ./file1.koi: ok.
./file2.html -> ./file2.koi: ok.
2 file(s) converted.
To convert file with space in its name use quotes:
C:\HTML>rusconv -win +alt "long file name.txt"
** rusconv -- convertor of Russian codepages, v.3.11.
** (c)w_re -- Oleg A. Paraschenko http://beta.math.spbu.ru/~prof/w_re/
.\long file name.txt -> .\long file name.alt: ok.
1 file(s) converted.
Most of file managers for windows like Norton Commander
if you press keys Ctrl+Enter add file name to command line.
If file name contains spaces then they automatically surround
it by qoutes. If it is not so then change your file manager.
We recommend Windows Commander.
Working in local windows network you can (if you have rights)
convert files on other computers without drive mapping.
To do it use universal file names (\\server\\resource\file):
rusconv -w2u -ext html \\comp\c\html\*.html "\\comp\c\html\koi version"
** rusconv -- convertor of Russian codepages, v.3.11.
** (c)w_re -- Oleg A. Paraschenko http://beta.math.spbu.ru/~prof/w_re/
\\comp\c\html\tutorial.html -> \\comp\c\html\koi version\tutorial.html: ok.
\\comp\c\html\index.html -> \\comp\c\html\koi version\index.html: ok.
\\comp\c\html\errors.html -> \\comp\c\html\koi version\errors.html: ok.
3 file(s) converted.
-- end of flags
Rusconv scans command line from left to right. First argument
which is not a flag starts a list of file. Rusconv consider
that flag is argument which first char is '-' or
'+' (or '/' in DOS and windows versions).
Sometimes you need to break flag parsing. To do this use
chars '--'. All after them is a file list.
Suppose, file with name '-file.txt' should be converted
from windows encoding to KOI-8 encoding:
Error:
E:\EX>rusconv -win +koi -file.txt
** rusconv -- convertor of Russian codepages, v.3.11.
** (c)w_re -- Oleg A. Paraschenko http://beta.math.spbu.ru/~prof/w_re/
error: unrecognized flag '-file.txt'.
try 'rusconv -h' or read the manual for help.
Success:
rusconv -win +koi -- -file.txt
** rusconv -- convertor of Russian codepages, v.3.11.
** (c)w_re -- Oleg A. Paraschenko http://beta.math.spbu.ru/~prof/w_re/
.\-file.txt -> .\-file.koi: ok.
1 file(s) converted.
-s silent mode, no any message will be printed
-v verbose mode, all messages will be printed
Flag '-s' suppresses printing of messages. Contrary,
flag '-v' causes talkative work. If you specify both flags '-s'
and '-v' then error message will be printed. DOS and windows
versions are talkative by default. UNIX version by default prints
only warnings and error messages.
-close close rusconv's window after program finished
-noclose do not close window
Flags '-close' and '-noclose' are used only in windows
version. DOS and UNIX versions ignore them. Windows operating
system runs rusconv in separate window which should be closed
after program finished. To avoid this and to let user to see
report rusconv after all files converted waits for key pressed
('-noclose', by default). Behavior can be changed by flag '-close'.
With this flag rusconv finishes after all files converted.
If you specify both flags '-close' and '-noclose' then
error message will be printed.
C:\UTIL>whatrus \\comp\c\html\index.html
WIN detected.
You can't specify several file names. If you don't need
message and wish only to get return code then use flag '-s'.
Windows operating system runs whatrus in a separate
windows which should be closed after program finished. To keep window
and to let user to see result whatrus after recognition waits
for key pressed. To avoid this use flag '-s'.
Usage of return codes of rusconv and whatrus
makes you script more intelligent.
This chars breaks flag parsing and prevents error
if you script get file name which can be interpreted as rusconv's flag.
Is is very impotant when you are working in UNIX
because user can use metachars. In UNIX interpretating of metachars
is the work of operating system and program get ready list
of arguments. There is no any guarantee that last argument
is not a directory.
To keep window on the desktop and to let user to see
report rusconv after all files converted waits for key pressed.
In command scripts you probably don't need such behaviour.
So use flag '-close' and rusconv will finish immediately after
converting.
In windows version of whatrus flag '-s'
is the same as flag '-close' in rusconv.
Here is an example of command script. It get any file
and convert them to file index.html in windows encoding.
windows version, makeindex.bat:
@ECHO OFF
REM Copy source file to file with name 'index'.
ECHO COPY %1 index
copy %1 index
IF EXIST index GOTO TAKEENC
ECHO copy failed
EXIT
REM Guess encoding
:TAKEENC
ECHO WHATRUS -s %1
whatrus -s %1
REM Branching started from big numbers because
REM 'IF ERRORLEVEL = N' is indeed
REM 'IF ERRORLEVEL >= N'.
IF ERRORLEVEL = 255 GOTO WRERR
IF ERRORLEVEL = 14 GOTO MACENC
IF ERRORLEVEL = 13 GOTO WINENC
IF ERRORLEVEL = 12 GOTO KOIENC
IF ERRORLEVEL = 11 GOTO ALTENC
ECHO encoding not recognized
EXIT
:WRERR
ECHO whatrus failed
EXIT
REM convert file 'index' to 'index.html'.
:ALTENC
ECHO RUSCONV -close -alt +win -ext html index
rusconv -close -alt +win -ext html index
EXIT
:KOIENC
ECHO RUSCONV -close -alt +win -ext html index
rusconv -close -koi +win -ext html index
EXIT
:MACENC
ECHO RUSCONV -close -alt +win -ext html index
rusconv -close -mac +win -ext html index
EXIT
:WINENC
ECHO RUSCONV -win -alt +win -ext html index
rusconv -close -alt +win -ext html index
EXIT
UNIX version for bash, makeindex.sh:
# Copy source file to file with name 'index'.
rm -f index
cp $1 index
if [ ! -f index ]
then
echo copy failed
exit
fi
# Guess encoding and convert file to 'index.html'
whatrus $1
case $? in
255) echo error executing whatrus;;
0) can''t detect encoding;;
11) rusconv -alt +win -ext html index;;
12) rusconv -koi +win -ext html index;;
13) rusconv -win +win -ext html index;;
14) rusconv -mac +win -ext html index;;
esac
tutorial-e.html
Document created by Oleg A. Paraschenko
Last changes - 15 November 1998
[email protected]