User Commands fsexam(1)
NAME
fsexam - examine encoding of file name or content and con-
vert to UTF-8
SYNOPSIS
fsexamc [-a] [-b] [-d dry-run-result-file] [-E module-name]
[-e encoding-list] [-F] [-f 'expression'] [-g history-
length] [-H] [-k] [-L log-file] [-l] [-n] [-P] [-p] [-R] [-
r] [-S] [-s] [-t] [-w]
fsexamc [-V]
fsexamc [-?]
fsexam [-a] [-b] [-E module-name] [-e encoding-list] [-F]
[-f 'expression'] [-g history-length] [-H] [-k] [-L log-
file] [-l] [-n] [-P] [-p] [-R] [-r] [-S] [-s] [-t] [-w]
fsexam [-V]
fsexam [-?]
DESCRIPTION
The fsexam graphical user interface utility examines file
names or file contents and try to convert them from legacy
encodings to UTF-8 using given encoding list, system default
encoding list, or both.
The fsexamc invocation is the same as fsexam except that the
utility is now a command line interface utility.
When converting file names, fsexam will process regular file
names, directory file names, and symbolic links by default.
When converting file content, it will handle regular plain
text files only by default. Use "-E module-name" to enable
special file handling.
fsexam will ignore most of non-plain text files such as
binary files, office document files, image files, and so on.
It might produce unexpected result if conversion of such
files are forced with -F option. Internally, fsexam uses
file(1) utility to determine whether files are plain text
files or not.
By default, fsexam will convert file names. To convert file
contents instead, specify -t option.
To help find the best encoding, fsexam has encoding lists
for supported languages. They include the most popular
codesets or encodings of corresponding languages. For exam-
ple, fsexam specifies GB18030, BIG5, EUC-TW, and so on for
Simplified Chinese. The list is used to generate conversion
SunOS 5.11 Last change: 16 Apr 2007 1
User Commands fsexam(1)
candidates. You can use "-e encoding-list" option to add
more encodings other than those system pre-defined encod-
ings. If -a option is specified, additional encodings that
are suggested by the encoding auto-detection library will be
added to the encoding list for possible use. The encoding
specified by the -e option has higher priority than the
automatically detected encodings.
OPTIONS
The following options are supported:
-a
--auto-detect
Enable encoding auto-detection. fsexam can
guess the encodings of file names or file
contents with the help of encoding auto-
detection library interfaces. Use this
option when you do not know the encodings of
files. Note that, in file name conversions,
the auto-detection based on the statistics
may not be reliable due to small number of
characters in the file names.
-b
--batch
Batch mode which is also known as non-
interactive mode. With this mode, fsexam
will not display candidates or wait for
user's selection or confirmation.
Please make sure your terminal can display
UTF-8 characters well when using this
option. Otherwise, illegible or gibberish
characters may be presented to you.
-d dry-run-result-file
Specifies the dry run result file. Used with
-n option, dry run result will be stored
into the file. Used without -n option,
fsexam will convert based on the scenario in
the dry run result file supplied.
The dry run result file will be created if
it does not exist. If it exists as a regular
file, the file will be truncated to zero
SunOS 5.11 Last change: 16 Apr 2007 2
User Commands fsexam(1)
length and overwritten.
When fsexam creates a dry run result file,
you can edit and then subsequently feed it
to fsexam to perform conversions based on
the content of the edited dry run result
file. Note that the editing should be done
carefully in the format preserving manner.
Recommended edit operation is to delete any
wrong or inappropriate candidates and make
the right one as the first candidate. For
more information, refer to fsexam(4).
If the edited file does not conform to the
file format described in the fsexam(4),
fsexam will print out a warning message and
quit without doing anything.
-E moduel-name
--enable-module moduel-name
Enable special file handling. Currently the
only valid option argument is "COMPRES".
"AL" can be used to enable all modules
available.
The COMPRES module supports several popular
compress and archive format files.
Currently, the module supports .tar,
.tar.gz, .tar.bz2, .zip, and .tar.Z file
formats. Used with -t option, fsexam con-
verts contents of files in archived,
compressed, or files of both. Without -t,
fsexam converts file names.
Note that the COMPRES module ignores sym-
bolic links in the files archived,
compressed, or both. It also ignores -n
option. The COMPRES module handles files
compressed, archived, or both only if -R
option is specified. If there is no suitable
ISO8859-1 codeset locales in the system,
this option is not supported as described in
the NOTES section.
-e encoding-list
--encoding-list encoding-list
Specifies one or more colon or comma
SunOS 5.11 Last change: 16 Apr 2007 3
User Commands fsexam(1)
separated encodings to be used during
conversion.
If this or -a options are not specified,
fsexam uses system pre-defined encoding list
for the current locale.
If specified without -a, -p, or -P options,
by default, the list of encodings supplied
with -e option replaces the system pre-
defined encoding list for this session.
Use -p to prepend it after the system pre-
defined encoding list. Use -P to append it
before the pre-defined encoding list. If you
want to make the encoding-list permanent,
instead of only for the current session, use
-S option.
When used with -a option, fsexam will merge
the supplied encoding list and auto-detected
encoding list. Note that the supplied
encoding-list here has higher priority than
the auto-detected encodings.
In non-interactive mode, the first encoding
which can be used to convert file name or
file content to UTF-8 successfully is used.
In interactive mode, fsexam will display all
candidates that are successfully converted
from the encodings in the list of encodings
to UTF-8. Note that if fsexam cannot convert
successfully, such encodings will not be
displayed in the list of candidates.
-F
--force-convert
Forcible conversion mode. fsexam will deter-
mine whether file name or file content is in
UTF-8 or not, and if it is in UTF-8 already,
then, fsexam will not convert by default.
However, since fsexam has no completely
accurate way to determine whether a string
is in UTF-8 or not, sometimes, a byte
sequence in legacy encoding could be treated
as a valid UTF-8 string. As an example,
three Simplified Chinese characters in
GB2312 (two bytes per character) could be
treated as two valid UTF-8 characters (three
SunOS 5.11 Last change: 16 Apr 2007 4
User Commands fsexam(1)
bytes per character). Use this option to
bypass the verification step and perform
conversions forcibly.
This option has to be used with caution and
should be also avoided to use with -R when-
ever possible. It may convert real UTF-8
encoded file names or file contents to unin-
tended characters.
-f 'expression'
--find-expression 'expression'
Search files according to 'expression.' The
'expression' here is a subset of the
'expression' used in find(1). But unlike
find(1), the 'expression' here must include
a path name of a starting point in the
directory hierarchy in which you want to
search files from as the first item. Follow-
ing the path name, other items valid for the
expression are following options and their
combinations: -name, -amin, -atime, -cmin,
-ctime, -group, -mmin, -mtime, -user. Refer
to find(1) for more information. Internally,
fsexam uses find(1) to perform searching.
You may want to use single quote to quote
the whole expression because shell may
expand special characters in it if you use
double quotes.
When this option is used, any other operands
are ignored.
-g history-length
Set the history length. fsexam saves the
information about on what it has done and
use the information to handle restore opera-
tions.
By default, fsexam will save history infor-
mation for 100 fsexam executions as long as
disk space permits. A single batch conver-
sion counts as one. Use this option to
change the default value.
SunOS 5.11 Last change: 16 Apr 2007 5
User Commands fsexam(1)
If you change the length from a higher value
to a lower value, the older history informa-
tion will be purged.
When the number of history reach to the top
limit, fsexam will discard the oldest his-
tory information in order to accept and
record new history information.
-H
--hidden
Handles hidden files. Unless the option is
specified, hidden files with names starting
with a dot (.) will be ignored by default.
-k
--no-check-symlink-content
By default, during file name conversions, if
both symbolic link and its source belong to
the user supplied list of files or a start-
ing point of a directory hierarchy at
operands, fsexam tries to keep them con-
sistent. In other words, if a source name is
converted, then, not only symbolic link
itself when applicable but also the content
of the symbolic link is converted. If given
source names are not converted for some rea-
son, the corresponding symbolic link con-
tents are also not converted and warning
messages are issued. If either is not in the
operand specified list, fsexam may break the
symbolic links.
This default behavior of symbolic link pro-
cessings need more resource and computation
time and thus use of -k option is recom-
mended to bypass the default behavior of
symbolic link processing if you have no sym-
bolic links.
During content conversions and dry run
conversions, fsexam does not care about the
symbolic link contents.
SunOS 5.11 Last change: 16 Apr 2007 6
User Commands fsexam(1)
-l
-list-encoding
List all available encodings supported by
fsexam.
-L log-file
--log-file log-file
If specified, fsexam writes log into the
log-file. Default is no log file writing.
The basic log file format is:
(category) fullpath: message
The "category" values possible are "EROR",
"WARNING", and "INFO". The "fullpath" is
the full path of file that is handled. The
"message" briefly describes the operation
result.
If the "fullpath" or the "message" contain
non-UTF-8 characters, fsexam writes their
hexadecimal byte values prefixed with "\x"
such as "\xAE\x89" into the file.
-n
--dry-run
Dry run mode. With this mode, fsexam writes
conversion information into the dry-run-
result-file specified with -d option instead
of actually performing the conversion on the
file names or contents.
If used with -a option, the dry-run-result-
file may get more candidates.
Note that compressed or archived files are
not supported with this mode and symbolic
links and their source consistencies are
also not kept.
-P
--append-encoding-list
SunOS 5.11 Last change: 16 Apr 2007 7
User Commands fsexam(1)
When used with -e option, fsexam appends the
encoding-list to the system pre-defined
encoding list. Otherwise, it has no effect.
-p
--prepend-encoding-list
When used with -e option, fsexam prepends
the encoding-list to the system pre-defined
encoding list. Otherwise, it has no effect.
-R
--recursive
Recursive mode. In this mode, fsexam recur-
sively converts all applicable files and
subdirectories specified at the operands as
directories.
-r
--remote
With this option, fsexam handles files
mounted as NFS and such remote file systems.
Without the option, fsexam handles files in
local disks only.
Obviously, while fsexam is running, file
system mounting or unmounting at a directory
hierarchy that is being examined are not
recommended.
-S
--save-encoding-list
By default, the encoding-list option argu-
ment of '-e' option is used only for the
current session. If this option is speci-
fied, however, fsexam makes the encoding-
list option argument permanent. This option
may override the default, system pre-defined
encoding list. If you do not want that to
happen, use with -p or -P to prepend or
append, respectively.
SunOS 5.11 Last change: 16 Apr 2007 8
User Commands fsexam(1)
-s
--restore
Restores file names to their original names.
To restore file contents, specify with -t
option.
This option is useful when you want to
restore files to their last states in case
wrong conversions have been made.
When this option is used on a file, fsexam
restores its name or content. When used on a
directory together with -R option, fsexam
restores all files and subdirectories under
the directory including the directory to
their original names or contents.
-t
--conv-content
Converts file contents rather than file
names. fsexam mainly handles plain text
files only.
Internally, fsexam uses file(1) to determine
whether a file is a plain text file or not.
First convert file names before converting
contents if there are files or directories
that contain multi-byte characters in their
files names. Otherwise, you may get illegi-
ble characters in your log-file or dry-run-
result-file.
-w
--follow
If specified with -R, fsexam follows sym-
bolic links if they are symbolic links to
directories as if they were regular and nor-
mal directories. If no -R option is speci-
fied, fsexam tries to convert symbolic links
and it source only. If the source is a sym-
bolic link too, fsexam keep convert source's
source and so on. By default, fsexam does
not follow symbolic links.
SunOS 5.11 Last change: 16 Apr 2007 9
User Commands fsexam(1)
-V
--version
Print the version number of fsexam and halt.
-?
--help
Print usage information and halt.
OPERANDS
The following operand is supported:
pathname The pathname of a file or a directory to be
converted. All arguments behind "--" will be
treated as an operand, even if they begin
with '-' character. If fsexam encounters '-'
as an operand or no operand at all, fsexam
will read pathnames from the standard input.
EXAMPLES
Example 1: Convert the name of a file
The following will convert the name of file "myfile" using
the system pre-defined encoding list:
example% fsexam myfile
If there is no pre-defined encoding for the current locale,
fsexam will exit with error messages.
Example 2: Recursively convert the names of files and sub-
directories under the directory "mydir" with the given
encoding list
example% fsexam -e GB18030:BIG5:EUC-TW --recursive mydir
Example 3: Dry run fsexam with auto-detected encoding
The following will scan the directory "mydir" and try to
convert file and directory names under the directory with
the system pre-defined plus auto-detected encodings to UTF-8
and store the result into the file, "mydryrunresult" without
actually changing the names:
SunOS 5.11 Last change: 16 Apr 2007 10
User Commands fsexam(1)
example% fsexam --auto-detect --dry-run -d mydryrunresult \
--recursive mydir
Example 4: Perform scenario based conversions using a dry
run result file
The following will perform scenario based conversions by
using the "mydryrunresult." The first candidate for each
file name is used. If there is no candidate, no action will
be taken on the file:
example% fsexam -d mydryrunresult
Example 5: Forcibly convert a file name
The following will convert the file "myfile" by using the
system pre-defined encodings even if fsexam thinks it is
UTF-8 encoding already. This option should be used with
caution as it may corrupt the already UTF-8 file names and
contents:
example% fsexam --force myfile
Example 6: Convert files generated by other utility
The following two examples have the same effect and it will
convert files generated by find(1) command with the system
pre-defined and auto-detected encodings:
example% /usr/bin/find . -name "*.txt" fsexam --auto-detect
example% fsexam --auto-detect `/usr/bin/find . -name "*.txt"`
The following is similar to the above two examples except
the following uses the system pre-defined encodings only and
files generated by ls(1) utility:
example% /usr/bin/ls *.txt fsexam
The following will search all files trailing with '.txt'
under the current directory and convert any of them using
the system pre-defined encoding list:
example% fsexam -f '. -name "*.txt"'
Example 7: Batch mode conversion
The following will use GB18030 and BIG5 to recursively con-
vert file names under the directory "mydir" and use the
first candidate to convert the file names.
SunOS 5.11 Last change: 16 Apr 2007 11
User Commands fsexam(1)
example% fsexam --batch -e GB18030:BIG5 --recursive mydir
Example 8: Follow symbolic links and handle hidden files
The following will follow all symbolic links in the direc-
tory "mydir" and symbolic links in the symbolic link
source's directory. Hidden files under the directory will be
converted also:
example% fsexam --follow --hidden --recursive mydir
Example 9: Convert file contents recursively using specified
encoding list
The following will recursively scan files under the direc-
tory "mydir." For each plain text file, it will automati-
cally detect its possible encodings, combine them with
GB18030 or BIG5, and try to convert the file using the
encodings formulated one by one. If the conversion is suc-
cessful, fsexam is done with the file and rest of the encod-
ings will not be tried. If a file is a compressed or
archived file, fsexam will first uncompress and unarchive
them into a temporary directory and perform above operation,
compress and archive them again, and replace the original
file:
example% fsexam --conv-content --recursive -e GB18030:BIG5 \
--auto-detect --enable-module COMPRES mydir
Example 10: Restore a file name or a file content
The following restores the file "myfile" to its original
name:
example% fsexam --restore myfile
example% fsexam --conv-content --restore myfile
The following restores the content of "myfile" to its origi-
nal content:
EXIT STATUS
The following exit values are returned:
0 File names or contents are converted successfully
or corresponding information is written to a dry
run result file successfully.
>0 An error occurred. More information can be
retrieved from a log file if "-L log-file" option
SunOS 5.11 Last change: 16 Apr 2007 12
User Commands fsexam(1)
and option argument are supplied.
ATRIBUTES
See attributes(5) for descriptions of the following attri-
butes:
ATRIBUTE TYPE ATRIBUTE VALUE
Availability SUNWfsexam
Interface stability Committed
SEE ALSO
file(1), find(1), locale(1), tar(1), libautoef(3LIB),
fsexam(4)
NOTES
When you want to convert names of many files, do not convert
them one by one in a loop. Try to construct a list of files
and give the list to fsexam for conversions. For example,
the following is not recommended:
for file in *
do
fsexamc -b $file
done
It is highly recommended to run this utility with UTF-8
locale. Otherwise, you may see some illegible or garbled
characters. Since fsexam has the system pre-defined and the
most popular encodings for every language, considering the
best multiscript capability, it will be more smooth if you
run on a UTF-8 locale environment of your language.
As shown in the NOTES section of the tar(1) man page, if an
archive is created that contains files whose names were
created by processes running in multiple or different
locales, a locale that uses a full 8-bit coding space, i.e.,
0x0 to 0xff, such as enUS.ISO8859-1 should be used both to
create the archive and to extract files from the archive.
Due to that, when you specify COMPRES module with -E
option, fsexam(1) tries to use enUS.ISO8859-1,
frFR.ISO8859-1, deDE.ISO8859-1, esES.ISO8859-1,
itIT.ISO8859-1, or svSE.ISO8859-1 locales. If there is no
such locale in the current system, use of -E option is
ignored and a warning message is issued.
SunOS 5.11 Last change: 16 Apr 2007 13
|