[RUSSIAN] [ENGLISH]    [BACK] [AUTHOR'S HOME]

rusconv v.3.11 tutorial.

 Due historical reasons our country does not have standard encoding. So sometimes it is impossible to read content of files. In this cases you should use special programs which convert encodings. We are hope you choose rusconv.
 In past most popular encoding was alternative (or codepage 866) which was used in MS-DOS. Now leader is windows encoding (codepage 1251). But encoding KOI-8 is still important. It was very usefull in first stages of growing of Russian Internet. Very rarely you can find text written in Macintosh. There were a lot of other encodings but they are died. It is impossibe to use one encoding to read text written in another one.To avoid this some people write texts in latinica (transliteracija, volapjuk) - russian text spelled latin letters. More, different operating systems use different methods to code end of line. DOS and Windows code it by two chars, UNIX - by one. So text written in UNIX will be as one long line in DOS/windows.
 All this troubles can be solved by rusconv.

 

[Content]

Printing help.

 Rusconv is the program with a lot of flags. If you forget some of them you can get help from rusconv. To do this simple run rusconv without any arguments or give flag '-h'.
 Examples:

  DOS:
C:\UTIL>RUSCONV
C:\UTIL>rusconv /h
  UNIX:
$rusconv -h
$rusconv
 Using of rusconv in windows is more difficult than in other operating systems because rusconv is command-line oriented program. It is recommended to use any file manager like Norton Commander (we recommend Windows Commander). Then usage of rusconv in windows will be simpler. Anyway, you can run program from menu "Start". In this menu choose item "Run...", write full path to rusconv (better use button "Browse..."), add flags and files and press button "OK".
 As any other UNIX utility, UNIX version is not verbose. It prints only short list of flags. To get more help try
$man rusconv
 The best way is to use HTML documentation.

 

[Content]

Converting file from one encoding to another.

 Suppose, you work in windows, found old DOS program and wish to remember how to run it. But "Notepad" instead of documentation prints unreadable text. To read it you first should convert it from DOS encoding to windows one:

C:\GAMES\WARCRAFT>rusconv -alt +win read.me
** rusconv -- convertor of Russian codepages, v.3.11.
** (c)w_re -- Oleg A. Paraschenko  http://beta.math.spbu.ru/~prof/w_re/
.\read.me -> .\read.win: ok.
1 file(s) converted.
 After converting in current directory (usually it is directory from which you run rusconv, here - c:\games\warcraft) will be created file with the same name but with other extension. Extension shows what encoding is in a new file.
 Here a list of extensions for encodings:
 .alt - alternative encoding, uses in DOS
 .koi - KOI-8 encoding, uses in UNIX
 .lat - latinica, russian text spelled latin letters
 .mac - Macintosh encoding
 .win - Windows encoding
 To specify from which encoding to convert use one of flags
-alt, -koi, -mac or -win.
 Rusconv can't convert from latinica. To specify target encoding use some of flags
+alt, +koi, +lat, +mac or +win.

 As any other UNIX utility, UNIX version of rusconv is not verbose. By default it prints only warnings and error messages. To print all messages use flag '-v'.
 In next example will be created file 'test.file' which contains text "Проверка флага '-v'." (testing '-v' flag.). This file will be in KOI-8 encoding. For begin we convert this file to windows encoding without flag '-v'. File 'test.win' will be created but you don't get any message from rusconv. Then we converts file to latinica using flag '-v'. To finish example, we check content of file 'test.lat'.

$echo Проверка флага '-v'. >test.file
$rusconv -koi +win test.file
$rusconv -v -koi +lat test.file
** rusconv -- convertor of Russian codepages, v.3.11.
** (c)w_re -- Oleg A. Paraschenko  http://beta.math.spbu.ru/~prof/w_re/
./test.file -> ./test.lat: ok.
1 file(s) converted.
$cat test.lat
Proverka flaga '-v'.

 

[Content]

Changing type of lines.

 Even if text written in latinica there is no guarantee that it will be possible to read it on any operating system. In DOS and Windows end of line is coded by two chars, in UNIX - by one.
 Suppose, you work in DOS or windows and downloaded from Internet text file created in UNIX. Notepad shows this text as one long line with funny chars where should be line breaks. To convert text to normal view use command

C:\NEW>rusconv -cr2crlf readme.txt
** rusconv -- convertor of Russian codepages, v.3.11.
** (c)w_re -- Oleg A. Paraschenko  http://beta.math.spbu.ru/~prof/w_re/
.\readme.txt -> .\readme.crlf: ok.
1 file(s) converted.
 Result of converting will be in file with the same name and with extension .crlf (UNIX and windows) or .crl (DOS). In this example - in 'readme.crlf'.
 If you are in UNIX then incorrect format leads to another problem. At the end of lines text editors print additional char, some programs could not be compiled. To solve this problem use flag '-crlf2cr':
$rusconv -crlf2cr -v files.bbs
** rusconv -- convertor of Russian codepages, v.3.11.
** (c)w_re -- Oleg A. Paraschenko  http://beta.math.spbu.ru/~prof/w_re/
./files.bbs -> ./files.cr: ok.
1 file(s) converted.

 

[Content]

Abbreviations of flags for most often tasks.

 Every operating system use own methos of creating russian files.

 For correct converting you should consider encoding and type of line ends. Here is minimal set of flags to do it:
  From  DOS      to  UNIX     : -alt +koi -crlf2cr
  From  UNIX     to  DOS      : -koi +alt -cr2crlf
  From  windows  to  UNIX     : -win +koi -crlf2cr
  From  UNIX     to  windows  : -koi +win -cr2crlf
  From  DOS      to  windows  : -alt +win
  From  windows  to  DOS      : -win +alt
 Probably most often tasks is converting text from UNIX to DOS, from UNIX to windows and back. Converting between DOS and windows styles usually unnecessary - for windows texts you can use Notepad, for DOS texts use can you old DOS file managers like Norton Commander.
 It is not good idea to type every time this sets of flags. So you can use abbreviations:
  -dos2unix  -  the same as '-alt +koi -crlf2cr'
  -unix2dos  -  the same as '-koi +alt -cr2crlf'
  -win2unix  -  the same as '-win +koi -crlf2cr'
  -unix2win  -  the same as '-koi +win -cr2crlf'
 This abbreviations are usefull but and they are long enough. So you can cut them:
  -d2u  -  the same as '-dos2unix'
  -u2d  -  the same as '-unix2dos'
  -w2u  -  the same as '-win2unix'
  -u2w  -  the same as '-unix2win'
rusconv -w2u index.html
** rusconv -- convertor of Russian codepages, v.3.11.
** (c)w_re -- Oleg A. Paraschenko  http://beta.math.spbu.ru/~prof/w_re/
.\index.html -> .\index.koi: ok.
1 file(s) converted.

 

[Content]

File overwriting.

 When converting, rusconv creates new files. But sometimes you don't need them or you wish only replace encoding in specified file. In this case use flag '-o'. Then rusconv for begin creates temporary file where results of recoding will be placed and then moves this temporary file on place of source. If any error occurs then source file will be unchanged and temporary file will be contain text converted before error.
 Example:

D:\HTML>rusconv -o -w2u index.html
** rusconv -- convertor of Russian codepages, v.3.11.
** (c)w_re -- Oleg A. Paraschenko  http://beta.math.spbu.ru/~prof/w_re/
.\index.html -> D:\HTML\rcA290.TMP -> .\index.html: ok.
1 file(s) converted.

 

[Content]

Converting to several encodings and specifing own extensions for files.

 Sometimes, especially when you create web site, file should be converted to several encodings. For example, you write HTML pages in DOS but your homepage is in windows and KOI encodings. You can run rusconv twice but better do so:

C:\HTML>rusconv -alt +koi +win index-pre.html
** rusconv -- convertor of Russian codepages, v.3.11.
** (c)w_re -- Oleg A. Paraschenko  http://beta.math.spbu.ru/~prof/w_re/
.\index-pre.html -> .\index-pre.koi, .\index-pre.win: ok.
1 file(s) converted.
 Results are placed in files with the same names and with default extensions:
 .alt - for alternative encoding
 .koi - for KOI-8 encoding
 .lat - for latinica
 .mac - for Macintosh encoding
 .win - for Windows encoding
 Often default extensions are not convenient. Then you can define you own extension. To do this use commands:
 -aext extension - for alternative encoding
 -kext extension - for KOI-8 encoding
 -lext extension - for latinica
 -mext extension - for Macintosh encoding
 -wext extension - for Windows encoding
 For example, you typed russian alphabet in windows encoding and wish to know how it looks in all other encodings. More, you wish that results should be in text files. No problems:
E:\EX>dir

 folder E:\EX
.              <FOLDER>      24.10.98  15:27 .
..             <FOLDER>      24.10.98  15:27 ..
ALPHABET TXT            66  02.10.98  13:15 alphabet.txt

E:\EX>rusconv -win +alt -aext alt.txt +koi -kext koi.txt
 +lat -lext lat.txt +mac -mext mac.txt +win -wext win.txt alphabet.txt

** rusconv -- convertor of Russian codepages, v.3.11.
** (c)w_re -- Oleg A. Paraschenko  http://beta.math.spbu.ru/~prof/w_re/
.\alphabet.txt -> .\alphabet.alt.txt, .\alphabet.koi.txt,
.\alphabet.mac.txt, .\alphabet.lat.txt, .\alphabet.win.txt: ok.
1 file(s) converted.

E:\EX>dir

 folder E:\EX
.              <FOLDER>      24.10.98  15:27 .
..             <FOLDER>      24.10.98  15:27 ..
ALPHAB~1 TXT            66  24.10.98  16:24 alphabet.alt.txt
ALPHAB~2 TXT            66  24.10.98  16:24 alphabet.koi.txt
ALPHAB~3 TXT            66  24.10.98  16:24 alphabet.mac.txt
ALPHAB~4 TXT            82  24.10.98  16:24 alphabet.lat.txt
ALPHAB~5 TXT            66  24.10.98  16:24 alphabet.win.txt
ALPHABET TXT            66  02.10.98  13:15 alphabet.txt
 If you wish change extension you can use one of commands 'aext', 'kext', 'lext', 'mext' or 'wext'. But if you converts to only one encoding then it is more better to use command
 -ext extension
 Depending on target encoding this command is interpreted as one of commands '-aext extension', '-kext extension', '-lext extension', '-mext extension' or '-wext extension'.
 Using command '-ext' you can also redefine default extensions '.cr' and '.crlf' when you change only type of end of lines:
E:\EX>rusconv -cr2crlf unixtext
** rusconv -- convertor of Russian codepages, v.3.11.
** (c)w_re -- Oleg A. Paraschenko  http://beta.math.spbu.ru/~prof/w_re/
.\unixtext -> .\unixtext.crlf: ok.
1 file(s) converted.

E:\EX>rusconv -cr2crlf -ext txt unixtext
** rusconv -- convertor of Russian codepages, v.3.11.
** (c)w_re -- Oleg A. Paraschenko  http://beta.math.spbu.ru/~prof/w_re/
.\unixtext -> .\unixtext.txt: ok.
1 file(s) converted.

 

[Content]

Converting of several files simultaniously.

 When you wish convert a group of files from one encoding to another it is convenient to convert all group simultaniously. To do this only write all file names after flags. You can use metachars - rusconv will find all appropriate files. In UNIX version use metachars with caution, see "Specifing output directory" for more information.

C:\HTML>rusconv -alt +koi +win -kext koi.html -wext win.html *.txt
** rusconv -- convertor of Russian codepages, v.3.11.
** (c)w_re -- Oleg A. Paraschenko  http://beta.math.spbu.ru/~prof/w_re/
.\rusconv.txt -> .\rusconv.koi.html, .\rusconv.win.html: ok.
.\readme.txt -> .\readme.koi.html, .\readme.win.html: ok.
.\index.txt -> .\index.koi.html, .\index.win.html: ok.
3 file(s) converted.

 

[Content]

Specifing output directory.

 By default files with results are created in current directory. Usually it is directory from which you run rusconv. But if last argument is directory name then files will be created in this directory.
 In UNIX use metachars with caution. Here interpretating of metachars is the work of operating system and program get ready list of arguments. There is no any guarantee that last argument is not a directory. So do not forget to specify output directory:

 Content of current directory:
$ls -l
-rwxr-xr-x   1 w_re     w_re        21394 Oct 25 02:27 file1.html
-rwxr-xr-x   1 w_re     w_re        21394 Oct 25 02:27 file2.html
drwxr-xr-x   2 w_re     w_re         1024 Oct 25 02:27 res

 May be error:
$rusconv -v -w2u *
// After interpetating:
// rusconv -v -w2u file1.html file2.html res
** rusconv -- convertor of Russian codepages, v.3.11.
** (c)w_re -- Oleg A. Paraschenko  http://beta.math.spbu.ru/~prof/w_re/
./file1.html -> res/file1.koi: ok.
./file2.html -> res/file2.koi: ok.
2 file(s) converted.

 To create files in current directory:
$rusconv -v -w2u * .
// After interpetating:
// rusconv -v -w2u file1.html file2.html res .
** rusconv -- convertor of Russian codepages, v.3.11.
** (c)w_re -- Oleg A. Paraschenko  http://beta.math.spbu.ru/~prof/w_re/
warning: 'res' is a directory, skipping.
./file1.html -> ./file1.koi: ok.
./file2.html -> ./file2.koi: ok.
2 file(s) converted.

 

[Content]

Using long file names and network files.

 Version 3.0 of rusconv was released for DOS and UNIX. To use long file names version 3.11 has release for windows. Because of use of new operating system functions this version can't be run on computer without Windows 95/98. But now you can use long file names and network files.
 To convert file with space in its name use quotes:

C:\HTML>rusconv -win +alt "long file name.txt"
** rusconv -- convertor of Russian codepages, v.3.11.
** (c)w_re -- Oleg A. Paraschenko  http://beta.math.spbu.ru/~prof/w_re/
.\long file name.txt -> .\long file name.alt: ok.
1 file(s) converted.
 Most of file managers for windows like Norton Commander if you press keys Ctrl+Enter add file name to command line. If file name contains spaces then they automatically surround it by qoutes. If it is not so then change your file manager. We recommend Windows Commander.
 Working in local windows network you can (if you have rights) convert files on other computers without drive mapping. To do it use universal file names (\\server\\resource\file):
rusconv -w2u -ext html \\comp\c\html\*.html "\\comp\c\html\koi version"
** rusconv -- convertor of Russian codepages, v.3.11.
** (c)w_re -- Oleg A. Paraschenko  http://beta.math.spbu.ru/~prof/w_re/
\\comp\c\html\tutorial.html -> \\comp\c\html\koi version\tutorial.html: ok.
\\comp\c\html\index.html -> \\comp\c\html\koi version\index.html: ok.
\\comp\c\html\errors.html -> \\comp\c\html\koi version\errors.html: ok.
3 file(s) converted.

 

[Content]

Other flags.

 Now it is time to say about flags '--', '-s', '-v', '-close' and '-noclose'. They are usually used in command scripts.

  --     end of flags
 Rusconv scans command line from left to right. First argument which is not a flag starts a list of file. Rusconv consider that flag is argument which first char is '-' or '+' (or '/' in DOS and windows versions). Sometimes you need to break flag parsing. To do this use chars '--'. All after them is a file list.
 Suppose, file with name '-file.txt' should be converted from windows encoding to KOI-8 encoding:
 Error:
E:\EX>rusconv -win +koi -file.txt
** rusconv -- convertor of Russian codepages, v.3.11.
** (c)w_re -- Oleg A. Paraschenko  http://beta.math.spbu.ru/~prof/w_re/
error: unrecognized flag '-file.txt'.
try 'rusconv -h' or read the manual for help.

 Success:
rusconv -win +koi -- -file.txt
** rusconv -- convertor of Russian codepages, v.3.11.
** (c)w_re -- Oleg A. Paraschenko  http://beta.math.spbu.ru/~prof/w_re/
.\-file.txt -> .\-file.koi: ok.
1 file(s) converted.
  -s     silent mode, no any message will be printed
  -v     verbose mode, all messages will be printed
 Flag '-s' suppresses printing of messages. Contrary, flag '-v' causes talkative work. If you specify both flags '-s' and '-v' then error message will be printed. DOS and windows versions are talkative by default. UNIX version by default prints only warnings and error messages.
  -close     close rusconv's window after program finished
  -noclose   do not close window
 Flags '-close' and '-noclose' are used only in windows version. DOS and UNIX versions ignore them. Windows operating system runs rusconv in separate window which should be closed after program finished. To avoid this and to let user to see report rusconv after all files converted waits for key pressed ('-noclose', by default). Behavior can be changed by flag '-close'. With this flag rusconv finishes after all files converted. If you specify both flags '-close' and '-noclose' then error message will be printed.

 

[Content]

How to recognize file encoding.

 To recognize file encoding use program whatrus. This program is distributed with rusconv.

C:\UTIL>whatrus \\comp\c\html\index.html
WIN detected.
 You can't specify several file names. If you don't need message and wish only to get return code then use flag '-s'.
 Windows operating system runs whatrus in a separate windows which should be closed after program finished. To keep window and to let user to see result whatrus after recognition waits for key pressed. To avoid this use flag '-s'.

 

[Content]

Using rusconv in command scripts.

 If you are going to use rusconv in command scripts then consider this advises.

 Usage of return codes of rusconv and whatrus makes you script more intelligent.  Here is an example of command script. It get any file and convert them to file index.html in windows encoding.
windows version, makeindex.bat:

@ECHO OFF

REM  Copy source file to file with name 'index'.
ECHO COPY %1 index
copy %1 index
IF EXIST index GOTO TAKEENC
ECHO copy failed
EXIT

REM  Guess encoding
:TAKEENC
ECHO WHATRUS -s %1
whatrus -s %1

REM  Branching started from big numbers because
REM  'IF ERRORLEVEL = N' is indeed
REM  'IF ERRORLEVEL >= N'.
IF ERRORLEVEL = 255 GOTO WRERR
IF ERRORLEVEL =  14 GOTO MACENC
IF ERRORLEVEL =  13 GOTO WINENC
IF ERRORLEVEL =  12 GOTO KOIENC
IF ERRORLEVEL =  11 GOTO ALTENC
ECHO encoding not recognized
EXIT

:WRERR
ECHO whatrus failed
EXIT

REM  convert file 'index' to 'index.html'.
:ALTENC
ECHO RUSCONV -close -alt +win -ext html index
rusconv -close -alt +win -ext html index
EXIT
:KOIENC
ECHO RUSCONV -close -alt +win -ext html index
rusconv -close -koi +win -ext html index
EXIT
:MACENC
ECHO RUSCONV -close -alt +win -ext html index
rusconv -close -mac +win -ext html index
EXIT
:WINENC
ECHO RUSCONV -win -alt +win -ext html index
rusconv -close -alt +win -ext html index
EXIT


UNIX version for bash, makeindex.sh:

# Copy source file to file with name 'index'.
rm -f index
cp $1 index
if [ ! -f index ]
then
	echo copy failed
	exit
fi

# Guess encoding and convert file to 'index.html'
whatrus $1
case $? in
	255) echo error executing whatrus;;
	  0) can''t detect encoding;;
	 11) rusconv -alt +win -ext html index;;
	 12) rusconv -koi +win -ext html index;;
	 13) rusconv -win +win -ext html index;;
	 14) rusconv -mac +win -ext html index;;
esac

 

[Content]

Have a nice work!


tutorial-e.html
Document created by Oleg A. Paraschenko
Last changes - 15 November 1998
[email protected]