bpupgrade

bpupgrade — Poliqarp binary corpus converter

Synopsis

bpupgrade { -h | --help | -v | --version }

bpupgrade [option...] corpus-base-name

Background

Poliqarp 1.3 lifts limitations on corpora sizes: it should be possible to build and process any reasonable corpus up to 2G segments. Unfortunately, the binary corpus format needed to be changed.

You can check version of your corpus by inspecting the *.cdf file:

  • lack of the *.cdf indicates the old format;

  • version = 1 string indicates the old format;

  • version = 2 string indicates the new one.

Sakura, the underlying library, does no longer support the old format. However, bpupgrade, the conversion utility is provided.

Description

bpupgrade converts a legacy (version 1) binary Poliqarp corpus to the new format (version 2).

Note that bpupgrade modifies corpora in place. Please backup your data!

Options

-h, --help

Display help and exit.

-v, --version

Output version information and exit.

-q, --quiet

Be quiet, suppress progress information.