Four distinct steps are needed for ezPAARSE to be able to analyse a new content provider's platform. These steps can be performed by different people:
Information specific to the platform (test data, knowledge base) can be stored in an excel spreadsheet file type depending on this model.
This Excel file will be used to create csv test and knowledge base files (
This test file is intended to validate the proper functioning of the parser.
It is a CSV file formatted as follows: Columns prefixed with
in- contain the data to be sent to the parser, those prefixed with
out- contain the data that the parser is supposed to identify.
Note that parsers are independent from the knowledge bases. The test file must only contain data that is present in the URL or any other materials provided to the parser.
plateform.version.csv.) that contains one line per analyzed resource and save it in the csv format
in-) and those of the recognized elements (
Every parser must be able to be automatically tested.
plateform.version.csv file is used to that purpose.
If you wish to manually launch the test, you can use the
make test command.
Here is a schema of how the test works with the csv file:
Example of a test file :
ezPAARSE uses files called knowledge bases, named after this pattern:
Those are text file, formatted with the KBART standard.
There is often one (or more) for each platform.
But they are not needed when the parser is able to extract a normalized identifier (like an ISSN) directly from the URL.
You can find those KBART files in a specific folder structure
ezpaarse/platforms-kb/platform, following the same semantics as parsers.
The Publisher Knowledge Bases are useful for :