1

Updating README.md

This commit is contained in:
Alex Yatskov 2017-02-12 11:13:10 -08:00
parent f122223809
commit 992a845631

View File

@ -31,25 +31,28 @@ take this intermediate data and store it in a reasonable, industry standard repr
The instructions below are for building on Linux, Mac OS X, and Windows, provided your environment is already configured The instructions below are for building on Linux, Mac OS X, and Windows, provided your environment is already configured
for development: for development:
1. Clone the `https://github.com/FooSoft/zero-epwing` repository. 1. Clone the repository by executing `git clone https://github.com/FooSoft/zero-epwing`.
2. Initialize (`git submodule init`) and update (`git submodule update`) the project submodules. 2. Initialize the git submodules by executing `git submodule init`.
3. Switch to the `eb` directory and execute `./configure --disable-shared --disable-ebnet --disable-nls`. 3. Pull down the submodule data by executing `git submodule update`.
4. While still in the `eb` directory, build the modified library by executing `make`. 4. Switch to the `eb` directory and execute `./configure --disable-shared --disable-ebnet --disable-nls`.
5. Switch to the `jansson` directory then execute `cmake .` and `make`. 5. While still in the `eb` directory, build the modified library by executing `make`.
6. From the project root directory, execute `cmake .` and `make`. 6. Switch to the `jansson` directory then execute `cmake .` and `make`.
7. From the project root directory, execute `cmake .` and `make`.
## Usage ## ## Usage ##
Zero-EPWING takes a single parameter, the directory of the EPWING dictionary to dump. It also supports the following Zero-EPWING takes a single parameter, the directory of the EPWING dictionary to dump. It also supports the following
optional flags: optional flags:
* `--pretty` (`-p`): output pretty-printed JSON (useful for debugging). * `--entries` (`-e`): output dictionary entry data (most common option).
* `--fonts` (`-f`): output output font bitmap data (useful for OCR).
* `--markup` (`-m`): markup the output with as much metadata as possible. * `--markup` (`-m`): markup the output with as much metadata as possible.
* `--positions` (`-s`): output *page* and *offset* data for each entry. * `--positions` (`-s`): output *page* and *offset* data for each entry.
* `--pretty` (`-p`): output pretty-printed JSON (useful for debugging).
Upon loading and processing the provided EPWING data, Zero-EPWING will output a UTF-8 encoded JSON file to `stdout`. Upon loading and processing the requested EPWING data, Zero-EPWING will output a UTF-8 encoded JSON file to `stdout`.
Information about errors will be printed to `stderr`; serious errors will result in this application returning a Diagnostic information about errors will be printed to `stderr`. Serious errors will result in this application
non-zero exit code. A sample of the JSON data output is pretty-printed below for reference: returning a non-zero exit code. A sample of the JSON dictionary entry data output is pretty-printed below for reference.
``` ```
{ {
@ -71,14 +74,15 @@ non-zero exit code. A sample of the JSON data output is pretty-printed below for
... ...
``` ```
You may have noticed the unusual-looking double curly bracket markers (such as `{{w_50035}}`). Remember what I mentioned You may have noticed the unusual-looking double curly bracket markers such as `{{w_50035}}`. Remember what I mentioned
about certain characters being represented by image files? There are two graphical fonts in each dictionary, narrow and about certain characters being represented by image files? There are two graphical fonts sets in each dictionary: narrow
wide. Whenever a character cannot be encoded as text, a glyph is used in its place. These font indices cannot be and wide. Both of these font sets are available in four sizes: 24, 30, 36, and 48 pixels. Whenever a character cannot be
converted directly to characters, differ from one dictionary to another, and short of performing OCR, you must build encoded as text, a glyph is used in its place. These font indices cannot be converted directly to characters, differ
tables to map these font indices to Unicode characters yourself. Zero-EWPING has no means to map these font glyphs to from one dictionary to another, and must be manually mapped to Unicode character tables. Zero-EWPING has no facility to
Unicode by itself, and instead places inline markers in the form of `{{w_xxxx}}` and `{{n_xxxx}}` in the output, map these font glyphs to Unicode by itself, and instead places inline markers in the form of `{{w_xxxx}}` and
specifying the referenced indices of the wide or narrow fonts respectively. The font glyphs can be dumped with the `{{n_xxxx}}` in the output, specifying the referenced indices of the wide or narrow fonts respectively.
`ebfont` sample application distributed with `libeb`, making the creation of these mappings tedious but possible.
The bitmaps for these font glyphs can be dumped by executing this application with the `--fonts` command line argument.
## License ## ## License ##