The Configuration Manual

Introduction

The Pincette configuration resides in Pincette itself, in the folder /configuration/. It is loaded via the file /configuration/maps.xml. The configuration is under version control just like anything else in Pincette. The only difference is that the system always loads it through the configuration view, which is at /view/configuration. Whenever something changes under the configuration folder, at any level, as seen through the configuration view, the entire configuration is reloaded. When this fails for some reason the running configuration remains in force.

The configuration contains declarations of various modules, global environment settings and metadata declarations. Modules take care of tasks such as transforming documents from one format to another, extracting text for full-text search, preview extraction, metadata exchange, etc.

Using the Configuration View

By default the configuration view just selects the latest version of any resource. So whenever you check in some modification the configuration is reloaded. There are situations, however, where circular dependencies exist between certain resources, which make it impossible to reach a consistent state that allows a proper reload. Changing individual resources in the configuration also makes it very hard to return to a previous consistent situation.

A small modification of the configuration view opens up new possibilities. Views contain a sequence of version selection rules that are evaluated in the given order. View rules always pertain to a path in the repository. For example, a rule for the path "/" is valid for the entire repository. Typically rules with more specific paths are placed before those with less specific ones. Version selection can then "fall through" the rules.

If we apply this to the configuration view we can precede the standard rule that selects "latest" for everything under "/" with a rule for the path "/configuration/". For this rule we can use a label, e.g. "CONFIG_20130213". If a resource under "/configuration/" has that label one of its versions then that version will be selected for it. Otherwise the last version will be selected.

The advised way to modify the configuration is to always go via the work view, which is at /view/work. After having checked in all modifications you go to the /configuration/ folder and set a new label on it, e.g. "CONFIG_<date>". You have to set the label recursively, which causes all last versions of all resources, folders and documents, to receive this new label. Then you open the configuration view and replace the label with the new one. Only when you save the modified configuration view will the configuration be reloaded. This means that moving to the new configuration occurs in one transaction. Whenever something is wrong with the new configuration, even though it was loaded successfully, you can revert to the previous one by simply putting back the old label in the configuration view.

With this method you keep track of your entire configuration history and you can quickly switch between configurations.

The maps.xml File

This is the central file in the /configuration/ folder that defines the Pincette configuration. It can be made modular by using XML-entities. The elements can appear in any order. Later elements overwrite earlier ones. If somewhere an element occurs that defines the text extraction for PDF, for example, then this element will be superseded with a subsequent one that also defines the text extraction for PDF.

A lot of elements can be restricted to specific paths and/or MIME types. That is how specific definitions are made for certain areas in the repository. One or more "path" elements declare where an element has an effect. With the "except" element certain paths can be excluded from this. The following example says that the element has an impact on paths "/path1/" and "/path2/". However, everything under "/path2/a/b/" is excluded.

<element>
  <path>/path1/</path>
  <path>/path2/</path>
  <except>/path2/a/b/</except>
  ...
  <mime-type>application/pdf</mime-type>
  ...
</element>

Note that maps.xml is an XML-file. Pincette won't save XML-files if they are not wellformed.

Environment Settings

Note that maps.xml is an XML-file. Pincette won't save XML-files if they are not wellformed.

<environment>
  <env>
    <name>name</value>
    <value>value</value>
  </env>
  ...
</environment>

The rest of this section defines the parameters.

checkin-delay-seconds

Pincette has several versioning modes, which can be set on folders and documents. They range from manual check-out and check-in to full-automatic versioning. A special mode is "filesystem". This is meant for cases where users save documents all the time without being aware of the fact that there is a versioning system underneath. Since this would yield a lot of unnecessary versions, the filesystem mode automatically checks out a documents when it is updated, but delays the check-in. For example, if you set it to 3600 seconds you will have at most eight or nine versions for a document someone works on all day.

create-time-views

Defines whether or not a time view is created when a new user is added to the system. The default is true.

default-home-folder

This is the folder in which the user's home folders will be created by default.

default-language

This is the default language used for documents when none was given in the request.

error-page-<status code>

The full path in Pincette of an XHTML-file that should be displayed for the given HTTP status code.

full-text-score-limit

Using this parameter it is possible to cut off results produced by a full-text search that are below a certain value. Its value should be between 0.0 and 1.0. The default value is 0.0. Set it very low if you have few documents in the system and increase it gradually as the document set grows.

full-text-oracle-metadata-boost

It is possible to configure metadata that has to be indexed along in the full-text index. The weight in the score can be boosted for metadata with this environment entry. It should be a integer which is larger than zero. The default is 5.

full-text-queue-max-time

This is the maximum time in milliseconds a module on the full-text queue is allowed to run before it is interrupted. The default 120000.

hide-unreadable

By default a WebDAV-collection also contains the resources for which the user doesn't have any access privileges. Setting this parameter to true will hide them.

isolate-users

This parameter, which is false by default, lets you host independent users on one Pincette-system. The users won't see each other, they won't find each other's documents in search results nor will they see each other's metadata properties.

lock-timeout

In order to avoid eternal locks this parameter can be set to indicate after how many seconds a lock should be automatically released. The default is 3600.

mail-bcc

When the system sends an e-mail the Bcc header will be set to the value of this parameter.

mail-from

When the system sends an e-mail the From header will be set to the value of this parameter.

maximum-file-size

The maximum number of bytes a file may have. Going beyond it will result in a 507 status code.

max-search-results

The maximum number of results returned by a full-text or metadata search is defined with this parameter. The default value is 50.

no-trash

Setting this to true will prevent trash bins to be created in the user's home folders. By default it is false

password-mail-template

An URL to the XHTML template used to send password change notification e-mails. It is merged with fields, which are written in place holders. A place holder has the form "$[name]". The following fields are supported:

password: The new password.
url: The URL of the system. This will open its web-interface if it is configured like that.
username: The username.

pass-secret

The password to sign pass-URLs with.

quota

The total number of bytes the domain may contain. The default is -1, which means there is no limit.

quota-mail-template

An URL to the XHTML template used to send quota warning e-mails. It is merged with fields, which are written in place holders. A place holder has the form "$[name]". The following fields are supported:

level: The used percentage of the provided quota.

quota-warning-levels

When a user performs an action that causes her used space pass beyond a certain percentage of her quota the system send a warning e-mail. This parameter specifies the percentages at which this should happen. It is a comma-separated value. The default is 80,90,95.

read-version-history-under-all

By default the privilege read-version-history is implied by the read privilege. By setting this environment entry to true you can make the privilege stand on its own.

rss-days

Any folder in Pincette is also an RSS-feed. You only have to append "?type=application%2frss%2bxml" to the folder-URL. This parameter defines how many days the feed will go back in the past. The default value is 30.

rss-items

The number of items in an RSS-feed can be limited with this parameter. The default value is 1000.

rss-logo

This is a URL for the logo that will be included in the RSS-feeds. It may be relative.

rss-stylesheet

This is a URL for the stylesheet that will be used in the RSS-feeds. It may be relative.

system-users

The "system" user has all privileges on all resources in the repository. There are, however, situations, where the authentication method doesn't allow one to log in with that username. When certificates are used, for example, the username is the distinguished name in the certificate. It is also possible that more than one username should be considered as "system". With this parameter a comma-separated list of usernames can be specified that are mapped to "system". When such a user logs in every operation is executed as "system" instead of the real username.

workflow-mail-template

An URL to the XHTML template used to send workflow notification e-mails. It is merged with fields, which are written in place holders. A place holder has the form "$[name]". The following fields are supported:

collection-path: The path of the parent folder.
collection-url: The URL of the parent folder.
comment: The comment the previous assigned user made.
due-date: The due date for the current step in the workflow.
path: The path of the document.
status: The status of the document.
url: The URL of the document.
version-url: The URL of the specific version of the document.

Access Control List Settings

When a new document or folder is created it doesn't have an access control list. This means only the owner can do anything with it. There are two ways to give it an initial ACL. The first is to specify a default ACL with the element "default-acl". This element should contain access control entries as defined in RFC 3744. Optionally the element may also contain "path" elements with absolute paths below which the default ACL will be used. If no path has been specified "/" is assumed, which means the declaration is used in the entire repository. This is an example:

<default-acl xmlns:dav="DAV:">
  <path>/</path>
  <dav:ace>
    <dav:principal>
      <dav:authenticated />
    </dav:principal>
    <dav:grant>
      <dav:privilege>
        <dav:read-current-user-privilege-set />
      </dav:privilege>
    </dav:grant>
  </dav:ace>
</default-acl>

The second way for initializing a new resource with an ACL is through ACL-inheritance. When inheritance is set for a path it is active at any depth below this path. A new resource will receive the ACL from its parent folder. Note that subsequent changes to this parent folder have no effect on existing resources in it. Here is an example:

<inherit-acl>
  <path>/Public/</path>
  <path>/Shared/</path>
</inherit-acl>

Default Index Pages

When a browser performs a GET on a folder some HTML-representation of that folder will be generated and returned, unless the folder contains a document called "index.html", "index.xhtml" or "index.htm". It is possible to assign a special page to take over the index page role for all folders or just some through the usual "path" elements. The special page may be an application such as PincetteWeb. This is an example:

<default-index-page>
  <page>/web/index.xhtml</page>
</default-index-page>

Metadata Settings

Pincette supports metadata in the form of name/value pairs. The DublinCore elements and terms are supported, as well as a few built-in properties. These standard properties can have several locales. Users can add additional properties with chosen names.

Type Declarations

The type of the metadata properties can be declared in the configuration. There are four types: "number", "boolean", "string" and "time". When a property is not declared it is assumed to have the type "string". All declarations should occur within the "metadata-types" element like this:

<metadata-types>
  <property>
    <name>dcterms:date</name>
    <type>time</type>
  </property>
</metadata-types>

Full-text Indexing

It is also possible to have certain metadata properties indexed in the full-text index. This can be useful for properties like "dc:title" or "dcterms:abstract". For each property that has to be indexed a prefix should be specified. This prefix can then be used in full-text queries in the form <prefix>:<term>. Here is an example of a declaration:

<metadata-index>
  <property>
    <name>dc:title</name>
    <prefix>title</prefix>
  </property>
  <property>
    <name>dcterms:abstract</name>
    <prefix>abstract</prefix>
  </property>
</metadata-index>

Default Values

Default metadata can be configured that is set on newly created documents. For metadata properties that represent times the value should be an ISO-8601 timestamp. It is also allowed to use the symbolic value now or combine it with an offset like "now+<days>" or "now+<hours>:<minutes>:<seconds>". If a default value is provided for the property dc:identifier Pincette will generate a suffix of the form -YYYYMMDD-N, where N is a unique number per day. Optional "path" elements can limit the configuration to certain folders and everything below them. This is an example:

<default-metadata>
  <path>/home/werner/test/1996/</path>
  <property>
    <name>dc:creator</name>
    <value>Me Myself and I</value>
  </property>
  <property>
    <name>re:company</name>
    <value>Pincette</value>
  </property>
  <property>
    <name>Division</name>
    <value>Development</value>
  </property>
  <property>
    <name>dc:date</name>
    <value>now</value>
  </property>
  <property>
    <name>dcterms:valid</name>
    <value>now+365</value>
  </property>
  <property>
    <name>dcterms:available</name>
    <value>now+12:12:12</value>
  </property>
</default-metadata>

Steering Metadata Exchange

It is possible to limit the activity of metadata exchange modules with the elements "exchange-properties-off", "exchange-properties-from-only" and "exchange-properties-to-only". They all contain "path" elements under which the setting will be valid. The first variant turns off the exchange completely. The second one only extracts properties from updated documents, but never updates them in the document when they are updated in Pincette. The last option updates the properties in the document when they are updated in Pincette, but never extracts any properties from documents. An example:

<exchange-properties-off>
  <path>/home/werner/test/</path>
</exchange-properties-off>

XML Catalogs

Pincette doesn't store XML-files like other files. An XML-file is always parsed, so it should be wellformed. XML-files can be composed of multiple entities and catalogs are used to resolve identifiers to actual entities. Pincette supports catalogs as defined by the SGML Open Technical Resolution TR9401:1997. Only PUBLIC and SYSTEM statements are supported at this time. The catalog that should be used by Pincette can be declared in the configuration like this:

<catalog>
  <file>resource:res/dtd/catalog</file>
</catalog>

The "file" element can contain a path in Pincette or a URL. The special case in the previous example takes the catalog from the built-in resources of Pincette.

The "part-of" Feature

Certain XML-languages such as XHTML can use external parts that are always rendered together with the document. Images are an obvious example. In order for web-caching to work properly a document should be outdated whenever one of its parts is modified. Otherwise the editor has to perform some fake modification of the document itself. The part-of relationships can be declared in the configuration like this:

<part-of>
  <part>
    <namespace>http://www.w3.org/1999/xhtml</namespace>
    <local-name>body</local-name>
    <attribute>background</attribute>
  </part>
  <part>
    <namespace>http://www.w3.org/1999/xhtml</namespace>
    <local-name>frame</local-name>
    <attribute>src</attribute>
  </part>
  <part>
    <namespace>http://www.w3.org/1999/xhtml</namespace>
    <local-name>iframe</local-name>
    <attribute>src</attribute>
  </part>
  <part>
    <namespace>http://www.w3.org/1999/xhtml</namespace>
    <local-name>img</local-name>
    <attribute>src</attribute>
  </part>
  <part>
    <namespace>http://www.w3.org/1999/xhtml</namespace>
    <local-name>link</local-name>
    <attribute>href</attribute>
  </part>
  <part>
    <namespace>http://www.w3.org/1999/xhtml</namespace>
    <local-name>script</local-name>
    <attribute>src</attribute>
  </part>
</part-of>

Each part declares an attribute of an element in a namespace that refers to an external entity that should be considered as being an integral part of the document.

User Agent Based Dispatch

You can make Pincette automatically redirect to a variant of a page based on the HTTP-header "User-Agent". The "user-agent-map" element should contain a list of "map" elements, which in turn should have one "match" element and one more "name" elements. The former contains a regular expression and the latter the name of a subfolder. The "map" elements are always evaluated in the order of appearance and the "name" elements are tried for variants in the order of appearance.

So if there is a page "/a/b/page.xhtml" and a variant in "/a/b/tablet/page.xhtml", then the browser is redirected to the latter when a tablet is being used. This is an example:

<user-agent-map>
  <map>
    <match>.*iPhone;.*</match>
    <name>mobile</name>
  </map>
  <map>
    <match>.*iPod;.*</match>
    <name>mobile</name>
  </map>
  <map>
    <match>.*iPad;.*</match>
    <name>tablet</name>
  </map>
  <map>
    <match>.*Android.*Mobile.*</match>
    <name>mobile</name>
  </map>
  <map>
    <match>.*Android.*</match>
    <name>tablet</name>
  </map>
  <map>
    <match>.*BlackBerry.*Mobile.*</match>
    <name>mobile</name>
  </map>
  <map>
    <match>.*BlackBerry.*</match>
    <name>tablet</name>
  </map>
  <map>
    <match>.*Opera Mobi.*</match>
    <name>mobile</name>
  </map>
  <map>
    <match>.*Opera Mini.*</match>
    <name>mobile</name>
  </map>
  <map>
    <match>.*Mobile.*</match>
    <name>mobile</name>
  </map>
</user-agent-map>

The "user-agent-map" element may contain "path" elements to make the declaration more specific and there may be several "user-agent-map" elements, which will all be merged.

No Version Control

By default every document and folder in Pincette is under version control. There may be areas in the repository for which no versions are needed; for example, a folder where files are shared temporarily. This can be achieved with the "no-version-control" element, which should contain "path" elements. For every path version control is turned off for the corresponding folder and everything in it at any level. Note that if a folder already exists when this declaration is added to the configuration, then it will remain under version control, but anything that is created in it will not have versions. This is an example:

<no-version-control>
  <path>/tmp/</path>
</no-version-control>

Modules

Pincette is extensible through modules. This section discusses the kind of modules that are supported and how they are configured. All modules are loaded in their own class loader in order to isolate them from each other. Their code is stored in the configuration area of Pincette, which means it is under version control and loaded via the configuration view. Modules enter Pincette through the front door. They can also be made more specific using "path" elements.

In module declarations the "class" element is used to make the link to the code. It should always have a "name" element with the class name. Optionally it can have several "jar" elements containing an absolute or relative path in Pincette. The latter is more probable. You should always specify a "Class-Path" field in the manifest of a JAR-file. The manifests are used recursively to construct a proper class path for the module. The order of the "jar" elements drives the class path construction.

Modules implement one the interfaces described below. On top of that a module class may have two additional static methods. They are called when the class is loaded. The first call is to "initialize", which should have the following signature:

public static void initialize(be.re.webdav.Context context)

The second call is to "setParameters" with the following signature:

public static void setParameters(java.util.Properties properties)

The properties are taken from all the "param" elements that occur in "class" element within the element that declares the module class. A "param" element has two subelements "name" and "value", each with text as their contents.

Storage

A storage module deals with the actual stream of a document. Pincette simply stores the produced stream in the database without further interpretation.

There are two kinds of storage modules, one for plain byte streams and another for XML-streams. The latter are always parsed for wellformedness before being serialized into the database. The first kind should implement the be.re.repo.Storage interface, while the second should implement be.re.repo.XMLStorage.

A storage module is allowed to store its content outside of the database. In that case it would produce an empty stream for Pincette and manage a folder structure elsewhere. This could be useful for multi-media material. Note that in this case storage will not be transactional.

The modules are declared with the "storage" element, which may contain several "mime-type" elements and one "class" element. The MIME types may contain wild cards. Here are a few examples:

<storage>
  <mime-type>*/*</mime-type>
  <class><name>be.re.repo.mod.DefaultStorage</name></class>
</storage>

<storage>
  <mime-type>application/ecmascript</mime-type>
  <mime-type>application/javascript</mime-type>
  <mime-type>application/x-javascript</mime-type>
  <mime-type>text/*</mime-type>
  <class><name>be.re.repo.mod.TextStorage</name></class>
</storage>

<storage>
  <mime-type>text/xml</mime-type>
  <mime-type>application/xml</mime-type>
  <mime-type>*/*+xml</mime-type>
  <class><name>be.re.repo.mod.DefaultXMLStorage</name></class>
</storage>

Transformers

Transformers are modules that are placed in the input and output paths of Pincette. They can transform a stream from one MIME type to another. Say, for example, you have a transformer from HTML to PDF and put it in the output path. When a user then asks to have an HTML-document in PDF, it will be converted on-the-fly. There are two ways this transformation can be triggered. Firstly, through HTTP content negotiation, which occurs using the "Accept" HTTP-header. Secondly, the user can add the query string "?type=application%2fpdf" to the URL, where the slash is URI-encoded.

The same is possible in the other direction. If there is a transformer from Microsoft Word to PDF, for example, which is placed in the input path, then a user can save a Word-document over a PDF-document that already resides in Pincette.

In the case of XML there is always a chain of two transformers. The innermost one is a transformer from XML to XML, not necessarily the same vocabulary. The outermost one is a transformer from XML to some other type, which may also be XML or something else entirely. For the output path the conversion is from XML and for the input path it is to XML. For example, the transformation from XHTML to PDF could be done in two steps. The innermost transformer could transform XHTML to XSL-FO, which is an XML-vocabulary for formatting documents. The outermost transformer would then have to be an XSL-FO processor, which produces PDF.

Pincette will always look for the most precise transformer chain using the source and target MIME types. Normally there is always a very generic transformer that doesn't modify the stream.

The "sequence" element can contain two things for the innermost XML to XML transformers. A subelement "xslt", with in turn the subelement "uri", which denotes an XSLT-stylesheet. The other possibility is a "class" element. All sequence entries will be wired together to form one big filter. The class should be an instance of one of the following:

The following example shows the generic input and output transformers that simply let the stream pass unmodified:

<in-transformer>
  <from>
    <mime-type>*/*</mime-type>
  </from>
  <sequence>
    <class><name>be.re.repo.mod.DefaultInTransformerFactory</name></class>
  </sequence>
</in-transformer>

<out-transformer>
  <from>
    <mime-type>*/*</mime-type>
  </from>
  <sequence>
    <class><name>be.re.repo.mod.DefaultOutTransformerFactory</name></class>
  </sequence>
</out-transformer>

This example shows how JNLP-files are transformed when they are requested by a client. The first transformer serializes the plain XML-text. The second one performs the typical marco-expansion for a JNLP-file. Both transformers are necessary, because without the first one there would be no way to edit the unexpanded JNLP-file.

<out-transformer>
  <from>
    <mime-type>application/x-java-jnlp-file</mime-type>
  </from>
  <to>
    <mime-type>application/xml</mime-type>
  </to>
  <sequence>
    <class><name>be.re.repo.mod.DefaultOutTransformerFactory</name></class>
  </sequence>
</out-transformer>

<out-transformer>
  <from>
    <mime-type>application/x-java-jnlp-file</mime-type>
  </from>
  <to>
    <mime-type>application/x-java-jnlp-file</mime-type>
  </to>
  <sequence>
    <class><name>be.re.repo.mod.JNLPOutTransformerFactory</name></class>
  </sequence>
</out-transformer>

Let's discuss a more complicated XML-example. We are going to configure what is needed for the above-mentioned transformation from XHTML to PDF. The first transformer is an innermost XML-to-XML transformer. More specifically, it transforms XHTML to XSL-FO. The transformer is itself a sequence of two modules, one to generate the XSL-FO and another to do some post-processing for links. The attribute "expand=true" means that all entity references and XInclude elements will be expanded prior to the transformation.

The second transformer is like the first one, but for some folders it adds an XSLT preprocessing step to the sequence. It also expands its input.

The two "xml-out" elements are outermost transformers that convert XSL-FO to PDF and PostScript respectively.

<xml-out-transformer expand="true">
  <from>
    <mime-type>application/xhtml+xml</mime-type>
    <namespace>http://www.w3.org/1999/xhtml</namespace>
    <local-name>html</local-name>
  </from>
  <to>
    <mime-type>text/xml</mime-type>
    <namespace>http://www.w3.org/1999/XSL/Format</namespace>
    <local-name>root</local-name>
  </to>
  <sequence>
    <class>
      <name>be.re.css.CSSToXSLFOFilter</name>
      <jar>modules/css2xslfo.jar</jar>
    </class>
    <class>
      <name>be.re.repo.mod.XSLFOBasicLinkFilter</name>
    </class>
  </sequence>
</xml-out-transformer>

<xml-out-transformer expand="true">
  <path>/reports</path>
  <path>/java/applications/css2xslfo/doc</path>
  <path>/java/applications/pincette/doc</path>
  <from>
    <mime-type>application/xhtml+xml</mime-type>
    <namespace>http://www.w3.org/1999/xhtml</namespace>
    <local-name>html</local-name>
  </from>
  <to>
    <mime-type>text/xml</mime-type>
    <namespace>http://www.w3.org/1999/XSL/Format</namespace>
    <local-name>root</local-name>
  </to>
  <sequence>
    <xslt><uri>/doc/style/preprocess_xhtml.xsl</uri></xslt>
    <class>
      <name>be.re.css.CSSToXSLFOFilter</name>
      <jar>modules/css2xslfo.jar</jar>
    </class>
    <class>
      <name>be.re.repo.mod.XSLFOBasicLinkFilter</name>
    </class>
  </sequence>
</xml-out-transformer>

<xml-out>
  <from>
    <mime-type>text/xml</mime-type>
    <namespace>http://www.w3.org/1999/XSL/Format</namespace>
    <local-name>root</local-name>
  </from>
  <to>
    <mime-type>application/pdf</mime-type>
  </to>
  <class>
    <name>be.re.tools.PincetteXEP</name>
    <jar>XEP/lib/pincette_xep.jar</jar>
  </class>
</xml-out>

<xml-out>
  <from>
    <mime-type>text/xml</mime-type>
    <namespace>http://www.w3.org/1999/XSL/Format</namespace>
    <local-name>root</local-name>
  </from>
  <to>
    <mime-type>application/postscript</mime-type>
  </to>
  <class>
    <name>be.re.tools.PincetteXEP</name>
    <jar>XEP/lib/pincette_xep.jar</jar>
  </class>
</xml-out>

Metadata Exchange

Pincette can exchange metadata with documents. When you upload a document the metadata is extracted from it and made available in Pincette for search and navigation via the "/meta/browser/" folder. Likewise, when the metadata is updated in Pincette the document will be modified as well.

Since extracting and manipulating metadata is different for each document format, modules should be added for each of them. For most popular formats a built-in module is already pre-configured. You can add more with the element "exchange-properties", which contains the subelements "mime-type" and "class". The class in there should implement one of the interfaces be.re.repo.ExchangeProperties or be.re.repo.XMLExchangeProperties. This is an example of such an element:

<exchange-properties>
  <mime-type>application/pdf</mime-type>
  <class><name>be.re.repo.mod.ExchangePropertiesPDF</name></class>
</exchange-properties>

Text Extraction

Pincette supports full-text search. In order to do that it needs the text that is inside the documents that are uploaded. It extracts it in the background using text extraction modules. For each format there is a specific one and most popular formats have a pre-configured built-in module.

Additional modules can be configured through the "text-extract" element, which has the subelements "mime-type" and "class". The class in the latter should implement either be.re.repo.TextExtract or be.re.repo.XMLTextExtract. This is an example:

<text-extract>
  <mime-type>text/xml</mime-type>
  <mime-type>application/xml</mime-type>
  <mime-type>*/*+xml</mime-type>
  <class><name>be.re.repo.mod.TextExtractXML</name></class>
</text-extract>

Preview Extraction

When documents are uploaded Pincette tries to create a preview for them in the background. This is done with preview extraction modules. The modules provide the image and Pincette produces three standard sizes of it: small, medium and large. Those can be retrieved by appending the "preview" parameter to the URL of the document like "?preview=large".

Since creating a preview from a document is different for each format, there will be a specific module per format. For most popular formats there is a pre-configured built-in module. You can add modules with the "preview-extract" element, which has the subelements "mime-type" and "class". The class should be an instance of be.re.repo.PreviewExtract or be.re.repo.XMLPreviewExtract. This is an example:

<preview-extract>
  <mime-type>application/epub+zip</mime-type>
  <class><name>be.re.repo.mod.PreviewExtractEPub</name></class>
</preview-extract>

Difference Testing

For certain operations Pincette wants to know if the contents of two document versions are different. In most cases a binary comparison up to the first different byte is just fine. However, if you have a format that requires more analysis you can add a module for it with the "test-difference" element, which has the subelements "mime-type" and "class". The class should be an instance of be.re.repo.TestDifference or be.re.repo.XMLTestDifference. This is an example of such an element:

<test-difference>
  <mime-type>*/xml</mime-type>
  <mime-type>*/*+xml</mime-type>
  <class><name>be.re.repo.mod.XMLTestDifference</name></class>
</test-difference>

Merge

Pincette supports advanced versioning features such as branching and merging. The latter involves the construction of a new version based on a common version up in the version tree, a source version on one branch and a target version on the branch where the result should end up. This kind of calculation depends on the document format. You can create specific modules for this and configure them with the "merge" element, which has the subelements "mime-type" and "class". The class should implement either be.re.repo.MergeDocument or be.re.repo.XMLMergeDocument. Here is an example:

<merge>
  <mime-type>text/*</mime-type>
  <class><name>be.re.repo.mod.MergeText</name></class>
</merge>

Comparison

When executing the HTTP-method GET on a document in Pincette you can add the extension-header "X-be.re.Compare-With". Its value should be a version URL of the document. Depending on the format some module actually does the comparison and writes out the result, which ends up on the response stream. Such a module can be added with the "compare" element, which has the subelements "mime-type" and "class". The class should be an instance of be.re.repo.Compare or be.re.repo.XMLCompare. This is an example:

<compare>
  <mime-type>application/vnd.oasis.opendocument.text</mime-type>
  <mime-type>application/vnd.oasis.opendocument.presentation</mime-type>
  <mime-type>application/vnd.oasis.opendocument.spreadsheet</mime-type>
  <class><name>be.re.repo.mod.CompareODF</name></class>
</compare>

Web Modules

With this kind of module simple active pages can be created such as a form processor or a page that creates content dynamically. The interface is just a static method with the following signature:

public static Map<String,String> f
(
  java.util.Map<String,String> parameters,
  java.lang.String             vcr,
  java.util.ResourceBundle     bundle,
  be.re.webdav.Context         context
) throws java.lang.Exception

The method should return a map of URL parameter names to resource bundle keys, which are used for the error messages. One can also use the empty string as the parameter name. In that case a general message can be produced. The returned map may also be empty, but not null.

Such code is made available with the "module" element in which you place one or more "jar" elements that constitute the classpath. Here is how:

<module>
  <path>/oauth/</path>
  <jar>modules/oauth_endpoint.jar</jar>
  <jar>modules/pincette_hash.jar</jar>
</module>

The following is an example of a page that uses the code. The "param" elements are transformed into the parameters argument of the authenticate method.

<?xml version='1.0' encoding='UTF-8'?>
<form xmlns="urn:be-re:forms">
  <action>be.re.app.oauth.Endpoint.authenticate</action>
  <reply>login.xhtml</reply>
  <error>login.xhtml</error>
  <param>
    <name>login-uri</name>
    <value>login.xhtml</value>
  </param>
  <param>
    <name>config-user</name>
    <value>system</value>
  </param>
  <param>
    <name>config-path</name>
    <value>config.xml</value>
  </param>
</form>