Daisy documentation
Book Index

Daisy documentation

Table of Contents

1 Documentation Home

These pages contain the documentation of the Daisy 2.1 release.

See also:

The documentation is also available published as a Daisy-book.

For an end-user introduction to Daisy, have a look at the video tutorials.

2 Installation

2.1 Downloading Daisy

Packaged versions of Daisy can be found in the distribution area (Sourceforge). This includes everything required to run Daisy, except for:

If you don't have these already, the installation of these will be covered further on.

Consider subscribing to the Daisy mailing list to ask questions and talk with fellow Daisy users and developers.

There is also information available about the source code.

2.2 Installation Overview

Daisy is a multi-tier application, consisting of a repository server and a publication layer. Next to those, a database server (MySQL) is required. All together, this means three processes, which can run on the same server or on different servers.

The Daisy binary distribution packs most of the needed software together, the only additional things you'll need is a Java Virtual Machine for your platform, and MySQL. All libraries and applications shipped with Daisy are the original, unmodified distributions that will be configured as part of the installation. We've only grouped them in one download for your convenience.

If you follow the instructions in this document, you can have Daisy up and running in less than an hour.

The diagram below gives an overview of the the setup. All shown port numbers are configurable of course.

2.2.1 Platform Requirements

We have tested the Daisy installation on Windows 2000/XP, GNU/Linux and MacOSX. Other unixes like Solaris should also work, though we don't test that ourselves.

2.2.2 Memory Requirements

By default, the Daisy Wiki and Daisy Repository Server are started with a maximum heap size of 128 MB each. To this you need to add some overhead of the JVMs themselves, and then some memory for MySQL, the OS and its (filesystem) caches. This doesn't mean all this memory will be used, that will depend on usage intensity.

2.2.3 Required knowledge

These installation instructions assume you're comfortable with installing software, editing configuration (XML) files, running applications from the command line, setting environment variables, and that sort of stuff.

2.2.4 Can I use Oracle, PostgreSQL, MS-SQL, ... instead of MySQL? Websphere, Weblogic, Tomcat, ... instead of Jetty?

Daisy contains the necessary abstractions to support different database engines, though we currently only support MySQL. Users are welcome to contribute and maintain different databases (ask on the mailing list how to get started).

The Daisy Wiki webapp should be able to run in any servlet container (at least one that can run unpacked webapps, and as far as there aren't any Cocoon-specific issues), but we ship Jetty by default. For example, using Tomcat instead of Jetty is very simple and is described on this page.

2.3 Installing a Java Virtual Machine

Daisy requires the Java JDK or JRE 1.5 or 1.6 (the versions are also know as 5 or 6). You can download it from here on the Sun site (take by preference the JDK, not the JRE). Install it now if you don't have it already.

After installation, make sure the JAVA_HOME environment variable is defined and points to the correct location (i.e., the directory where Java is installed). To verify this, open a command prompt or shell and enter:

For Windows:
%JAVA_HOME%/bin/java -version

For Linux:
$JAVA_HOME/bin/java -version

This should print out something like:

java version "1.5.0"

or

java version "1.6.0"

2.3.1 Installing JAI (Java Advanced Imaging) -- optional

If you want images (especially PNG) to appear in PDFs, it is highly advisable to install JAI, which you can download from the JAI project on java.net. Take the JDK (or JRE) package, this will make JAI support globally available.

2.4 Installing MySQL

Daisy requires one of the following MySQL versions:

MySQL can be downloaded from mysql.com. Install it now, and start it (often done automatically by the install).

Windows users can take the "Windows Essentials" package. During installation and the configuration wizard, you can leave most things to their defaults. In particular, be sure to leave the "Database Usage" to "Multifunctional Database", and leave the TCP/IP Networking enabled (on port 3306). When it asks for the default character set, select "Best Support For Multilingualism" (this will use UTF-8). When it asks for Windows options, check the option "Include Bin Directory In Windows Path".

Linux users: install the "MySQL server" and "MySQL client" packages. Installing the MySQL server RPM will automatically initialize and start the MySQL server.

2.4.1 Creating MySQL databases and users

MySQL is used by both the Daisy Repository Server and JMS (ActiveMQ). Therefore, we are now going to create two databases and two users.

Open a command prompt, and start the MySQL client as root user:

mysql -uroot -pYourRootPassword

On some systems, the root user has no password, in which case you can drop the -p parameter.

Now create the necessary databases, users and access rights by entering (or copy-paste) the commands below in the mysql client. What follows behind the IDENTIFIED BY is the password for the user, which you can change if you wish. The daisy@localhost entries are necessary because otherwise the default access rights for anonymous users @localhost will take precedence. If you'll run MySQL on the same machine as the Daisy Repository Server, you only need the @localhost entries.

CREATE DATABASE daisyrepository CHARACTER SET 'utf8';
GRANT ALL ON daisyrepository.* TO daisy@'%' IDENTIFIED BY 'daisy';
GRANT ALL ON daisyrepository.* TO daisy@localhost IDENTIFIED BY 'daisy';
CREATE DATABASE activemq CHARACTER SET 'utf8';
GRANT ALL ON activemq.* TO activemq@'%' IDENTIFIED BY 'activemq';
GRANT ALL ON activemq.* TO activemq@localhost IDENTIFIED BY 'activemq';

2.5 Extract the Daisy download

Extract the Daisy download. On Linux/Unix you can extract the .tar.gz file as follows:

tar xvzf daisy-<version>.tar.gz

On non-Linux unixes (Solaris notably), use the GNU tar version if you experience problems extracting.

On Windows, use the .zip download, which you can extract using a tool like WinZip.

After extraction, you will get a directory called daisy-<version>. This directory is what we will call from now on the DAISY_HOME directory. You may set a global environment variable pointing to that location, or you can do it each time in the command prompt when needed.

2.6 Daisy Repository Server

2.6.1 Initialising and configuring the Daisy Repository

Open a command prompt or shell and set an environment variable DAISY_HOME, pointing to the directory where Daisy is installed.

Windows:
set DAISY_HOME=c:\daisy-2.1

Linux:
export DAISY_HOME=/home/daisy_user/daisy-2.1

Then go to the directory <DAISY_HOME>/install, and execute:

daisy-repository-init

Follow the instructions on screen. The installation will (1) initialize the database tables for the repository server and (2) create a Daisy data directory containing customized configuration files.

2.6.2 Starting the Daisy Repository Server

Still in the same command prompt (or in a new one, but make sure DAISY_HOME is set), go to the directory <DAISY_HOME>/repository-server/bin, and execute:

daisy-repository-server <location-of-daisy-data-dir>

In which you replace <location-of-daisy-data-dir> with the location of the daisy data directory created in the previous step.

Starting the repository server usually only takes a few seconds, however the first time it will take a bit longer because the workflow database tables are created during startup. When the server finished starting it will print a line like this:

Daisy repository server started [timestamp]

Wait for this line to appear (the prompt will not return).

2.7 Daisy Wiki

2.7.1 Initializing the Daisy Wiki

Before you can run the Daisy Wiki, the repository needs to be initialised with some document types, a "guest" user, a default ACL configuration, etc.

Open a command prompt or shell, make sure DAISY_HOME is set, go to the directory <DAISY_HOME>/install, and execute:

daisy-wiki-init

The program will start by asking a login and password, enter here the user created during the execution of daisy-repository-init (the default was testuser/testuser). It will also ask for the URL where the repository is listening, you can simply press enter here.

If everything goes according to plan, the program will now print out some informational messages and end with "Finished.".

2.7.2 Creating a "wikidata" directory

Similar to the data directory of the Daisy repository server, the Daisy Wiki also has its own data directory (which we call the "wikidata directory").

To set up this directory, open a command prompt or shell, make sure DAISY_HOME is set, go to the directory <DAISY_HOME>/install, and execute:

daisy-wikidata-init

and follow the instructions on-screen.

Since the Daisy Wiki and the Daisy repository server are two separate applications (which might be deployed on different servers), each has its own data directory.

2.7.3 Creating a Daisy Wiki Site

The Daisy Wiki has the concept of multiple sites, these are multiple views on top of the same repository. You need at least one site to do something useful with the Daisy Wiki, so we are now going to create one.

Open a command prompt or shell, make sure DAISY_HOME is set, go to the directory <DAISY_HOME>/install, and execute:

daisy-wiki-add-site <location of wikidata directory>

The application starts by asking the same parameters as for daisy-wiki-init.

Then it will ask a name for the site. This should be a name without spaces. If you're inspirationless, enter something like "test" or "main".

Then it will ask for the sites directory location, for which the presented default should be OK, so just press enter.

2.7.4 Starting the Daisy Wiki

Open a command prompt or shell and make sure DAISY_HOME is set.

Go to the directory <DAISY_HOME>/daisywiki/bin, and execute:

daisy-wiki <location of wikidata directory>

Background info: this will start Jetty (a servlet container) with the webapp found in <DAISY_HOME>/daisywiki/webapp.

2.8 Finished!

Now you can point your web browser to:

http://localhost:8888/

To be able to create or edit documents, you will have to change the login, you can use the user you created for yourself while running daisy-repository-init (the default was testuser/testuser).

To start the Daisy repository server and Daisy Wiki after the initial installation, see the summary here, or even better, set up service (init) scripts to easily/automatically start and stop Daisy.

2.9 2.0(.x) to 2.1 changes

2.10 2.0(.x) to 2.1 compatibility

2.10.1 Skin compatibility

2.10.1.1 XSL-FO (stylesheets for PDF)

Daisy 2.1 ships with a major new release of the XSL-FO processor, FOP 0.93. If you have custom XSL-FO stylesheets, it could be there are smallish compatibility issues.

2.10.2 Repository extensions, authentication schemes, etc

2.10.2.1 New Runtime

Since we moved from Avalon Merlin to the new Daisy Runtime, you will need to adjust your repository extensions, authentication schemes, etc. to be compatible with the new infrastructure.

Some pointers to more information:

If you have trouble adjusting your extensions or understanding the new system, you can ask questions on the Daisy mailing list.

2.10.2.2 Package move

Most of the SPI classes have been moved to different packages. For example:

org.outerj.daisy.authentication => org.outerj.daisy.authentication.spi

It should be easy to adjust your classes, which you'll need to do anyhow for the new Daisy Runtime.

2.10.2.3 AbstractAuthenticationFactory

This class is deprecated (and non-functional). If you used this, see the updated NTLM and LDAP authentication schemes for how to update your code.

2.10.3 Publisher wraps exception

The Publisher now wraps any exception occurring in the publisher with a GlobalPublisherException, containing information on the execution stack of the publisher. This might have effects on how you handle exceptions coming from the publisher. For example you might do a catch for GlobalPublisherException and then do getCause() on it to get the actual exception.

The error.xsl has also been changed to hide the GlobalPublisherException.

2.10.4 Book publisher

2.10.4.1 If you're using custom book publication types

The shiftHeaders task has been deprecated, the heading shifting is now performed as part of the assembleBook task. This change had to be made in order to implement the new heading shifting for document includes.

Normally, you don't need to adjust anything: the shiftHeaders task still exists but now does nothing at all. To avoid future confusion, it is recommended you remove the shiftHeaders task from any custom book publication types you might have.

2.10.5 Changes to non-public things

The following are changes to Daisy internals that might be relevant for some users.

2.10.5.1 Constants.DAISY_LINK_PATTERN

This is not really a part of the public API, but if you would happen to use the regex pattern defined in Constants.DAISY_LINK_PATTERN, you might have to adjust your code because the structure of this pattern has changed a bit: meaningless groups have been changed into non-capturing groups. See the javadoc of that constant for the exact matching groups.

2.10.5.2 Change to htmlcleaner.xml

The pre element now allows a daisy-shift-headings attribute for the new heading shifting for document includes feature.

2.10.6 Automated installation

When making use of the possibility to specify a property file to the repository-server-init script, two new properties are now required: dbName and jmsDbName, containing the names of the databases (= the same as those which are  the JDBC URL).

2.11 2.0(.x) to 2.1. upgrade

These are the upgrade instructions for when you have currently Daisy 2.0 or 2.0.1 installed.

If you have 2.1-RC installed, see here.

2.11.1 Upgrading

2.11.1.1 Daisy installation review

In case you're not very familiar with Daisy, it is helpful to identify the main parts involved. The following picture illustrates these.

There is the application directory, which is simply the extracted Daisy download, and doesn't contain any data (to be safe don't remove it yet though).

Next to this, there are 3 locations where data (and configuration) is stored: the relational database (MySQL), the repository data directory, and the wiki data directory. The Daisy repository and the Daisy Wiki are two independent applications, therefore each has its own data directory.

The text between the angle brackets (< and >) is the way we will refer to these directories further on in this document. Note that <DAISY_HOME> is the new extracted download (see later on), not the old one.

2.11.1.2 Stop your existing Daisy

Stop your existing Daisy, both the repository server and the Daisy Wiki.

2.11.1.3 Download and extract Daisy 2.1

If not done already, download Daisy 2.1 from the distribution area (Sourceforge). For Windows, download the zip or autoextract.exe (not the installer!). For Unix-based systems, the .tar.gz is recommended. The difference is that the .zip contains text files with DOS line endings, while the .tar.gz contains text files with unix line endings. When using non-Linux unixes such as Solaris, be sure to use GNU tar to extract the archive.

Extract the download at a location of your choice. Extract it next to your existing Daisy installation, do not copy it over your existing installation.

2.11.1.4 Update environment variables

Make sure the DAISY_HOME environment variable points to the just-extracted Daisy 2.1 directory.

Note that when you start/stop Daisy using the wrapper scripts, you don't need to set DAISY_HOME, though you do need to update or re-generate the service wrapper configuration (see next section).

How this is done depends a bit on your system and personal preferences:

2.11.1.5 Creating log configuration

Daisy now uses log4j for logging, which needs a new configuration file in the repository data directory.

Therefore copy the file

<DAISY_HOME>/repository-server/conf/repository-log4j.properties

to

<REPO DATA DIR>/conf/

2.11.1.6 Updating the repository SQL database

Execute the database upgrade script:

cd <DAISY_HOME>/misc
mysql -Ddaisyrepository -udaisy -ppassword < daisy-2_0-to-2_1.sql

On many MySQL installations you can use "root" as user (thus specify -uroot instead of -udaisy) without password, thus without the -p option.

2.11.1.7 Adjusting the daisy.xconf file

Open the following file in a text editor:

<wiki data dir>/daisy.xconf

At the end of this file, before the closing </cocoon> tag, add these lines:

<component
    class="org.outerj.daisy.frontend.GuestRepositoryProviderImpl"
    role="org.outerj.daisy.frontend.GuestRepositoryProvider"
    logger="daisy">
    <guestUser login="guest" password="guest"/>
</component>

2.11.1.8 ActiveMQ configuration

The repository-server-init script of Daisy 2.0(.1) made an error in the ActiveMQ configuration. If you upgraded your 2.0 from earlier releases, the configuration should be OK, but there's no harm in checking it anyhow.

Open the following file in a text editor:

<daisydata dir>/conf/activemq-conf.xml

If you find the following line in that file, remove it:

<property name="poolPreparedStatements" value="true"/>

2.11.1.9 Jetty configuration (only when using a custom jetty-daisywiki.xml)

If you have a custom jetty-daisywiki.xml in your wikidata directory, it will need updating because Daisy 2.1 contains a major new Jetty version (6.1.3).

The easiest is probably to start from the new default jetty-daisywiki.xml found at

<DAISY_HOME>/daisywiki/conf/jetty-daisywiki.xml

and change what you want to change (usually just the HTTP port number).

The new default jetty-daisywiki.xml enables request logging by default. You might want to disable this if you have a webserver in front which also does request logging.

2.11.1.10 Wrapper scripts

This section is only applicable if you are using the wrapper scripts.

Various updates have been done to the wrapper scripts.

An important difference is that the wrapper scripts now require DAISY_HOME to be set.

Please see the wrapper documentation on how to regenerate the service wrapper scripts.

2.11.1.11 Start the servers

Make sure the DAISY_HOME environment variable points to the new Daisy 2.1 directory (you might want to rename the old directory to avoid it is still used by accident).

2.11.1.12 Update the default repository schema

There are some new schema types, therefore update the repository schema by running the daisy-wiki-init script:

[Windows]
cd <DAISY_HOME>\install
daisy-wiki-init

[Linux]
cd <DAISY_HOME>/install
./daisy-wiki-init

2.12 2.1-RC to 2.1 upgrade

2.12.1 Changes since 2.1-RC

2.12.2 Upgrade instructions

These are the upgrade instructions for when you have currently Daisy 2.1-RC installed.

This release requires no special upgrade steps, besides putting the new Daisy distribution in place.

In case you have problems during the upgrade or notice errors or shortcomings in the instructions below, please let us know on the Daisy mailing list.

2.12.2.1 Daisy installation review

In case you're not very familiar with Daisy, it is helpful to identify the main parts involved. The following picture illustrates these.

There is the application directory, which is simply the extracted Daisy download, and doesn't contain any data (to be safe don't remove it yet though).

Next to this, there are 3 locations where data (and configuration) is stored: the relational database (MySQL), the repository data directory, and the wiki data directory. The Daisy repository and the Daisy Wiki are two independent applications, therefore each has its own data directory.

The text between the angle brackets (< and >) is the way we will refer to these directories further on in this document. Note that <DAISY_HOME> is the new extracted download (see later on), not the old one.

2.12.2.2 Stop your existing Daisy

Stop your existing Daisy, both the repository server and the Daisy Wiki.

2.12.2.3 Download and extract Daisy 2.1

If not done already, download Daisy 2.1 from the distribution area (Sourceforge). For Windows, download the zip or autoextract.exe (not the installer!). For Unix-based systems, the .tar.gz is recommended. The difference is that the .zip contains text files with DOS line endings, while the .tar.gz contains text files with unix line endings. When using non-Linux unixes such as Solaris, be sure to use GNU tar to extract the archive.

Extract the download at a location of your choice. Extract it next to your existing Daisy installation, do not copy it over your existing installation.

2.12.2.4 Update environment variables

Make sure the DAISY_HOME environment variable points to the just-extracted Daisy 2.1 directory.

Note that when you start/stop Daisy using the wrapper scripts, you don't need to set DAISY_HOME, though you do need to update or re-generate the service wrapper configuration (see next section).

How this is done depends a bit on your system and personal preferences:

2.12.2.5 Start Daisy

Start Daisy using the normal scripts or the wrapper scripts.

3 Source Code

Sources can be obtained through SVN. Instructions for setting up a development environment with Daisy (which is slightly different from using the packaged version) are included in the README.txt's in the source tree. For anonymous, read-only access to Daisy SVN, use the following command:

svn co http://svn.cocoondev.org/repos/daisy/trunk/daisy

This will give the latest development code (the "trunk"). To get the source code of a specific release, use a command like this:

svn co http://svn.cocoondev.org/repos/daisy/tags/RELEASE_1_3_1 daisy

See also the existing tags.

No authentication is required for anonymous access. If you're behind a (transparent) proxy, you might want to verify whether your proxy supports the extended HTTP WebDAV methods.

3.1 Daisy Build System

We should consider removing this document, Maven is common enough these days.

The build system used by Daisy is Maven, an Apache project.

3.1.1 Maven intro

What follows is the very-very-quick Maven intro, for those not familiar with Maven.

Unlike Ant, where you tell how your code should be build, in Maven you simply tell what directory contains your code, and what the dependencies are (i.e. what other jars it depends on), and it will build your code. This information is stored in the project.xml files that you'll see across the Daisy source tree. There are a lot of them, since Daisy is actually composed of a whole lot of mini-projects, whereby some of these projects depend on one or more of the others.

An important concept of Maven is the repository, which is a repository of so-called artifacts, usually jar files. An artifact in the repository is identified uniquely by a group id and an id (both are simply descriptive names). Declaring the dependencies of a project is done by specifying repository references, thus for each dependency you specify the group id and id of the dependency. An example dependency declaration, as defined in a project.xml file:

    <dependency>
      <groupId>lucene</groupId>
      <artifactId>lucene</artifactId>
      <version>1.3</version>
    </dependency>

So where does the repository physically exist? Well, there can be many repositories. The most important public one is on ibiblio:

http://www.ibiblio.org/maven/

The repository is simply accessed using HTTP, so you can take your browser and surf to that URL. A repository like the one on ibiblio is called a remote repository. After initially downloading an artifact from the remote repository, it is installed in your local repository, which is by default located in ~/.maven/repository.

When you build a project, the result of the build is usually a jar file. Maven will install this jar file in your local repository, so that when you build another project that depends on this jar file, it can be found over there. When searching a dependency, Maven always checks the local repository first, and then goes off checking remote repositories. Which remote repositories are searched is of course configurable.

I should also tell you something about the build.properties and project.properties files. Both files contain properties for the build and configuration for Maven. The difference is that the project.properties files are committed to the source repository (SVN in Daisy's case), while the build.properties files are intended for local customisations (thus on your computer). So if you see something in a project.properties file that you'd like to change, don't change it over there (as this will otherwise show up as a modified file when doing svn status), but do it in the build.properties file. The build.properties file thus has a higher precedence than the project.properties file.

There is a lot more to tell about Maven, such as that it is actually composed of a whole lot of plugins, that there is something like "goals" to execute, that there is the possibility to have a maven.xml file to define custom goals with custom build instructions, and that all artifacts are also versioned. But I'll let you explore the Maven documentation to learn about that.

3.1.2 Extra dependencies

Daisy has some dependencies on artifacts (remember, jar files) that are not available in the public ibiblio repository. We make these available in our own repository on http://cocoondev.org/repository/.

3.1.3 Building Daisy

Instructions for building Daisy can be found in the README.txt file in the root of the Daisy source tree. At some point it will tell you to execute maven in the root of the source tree, which will actually build all the little mini-projects of which Daisy consists, in the correct sequence so that all dependencies are satisfied.

4 Repository server

The repository server is the core of Daisy. It provides the pure content management functionality without GUI (graphical user interface).

The main purpose of the repository is managing documents.

The repository server consists of a core and some non-essential extension components that add additional functionality. The repository can be accessed by a variety of client applications (such as web applications, command-line tools, desktop applications, ...) through its programming interfaces.

4.1 Documents

4.1.1 Introduction

The purpose of the Daisy Repository Server is managing documents. The main content of a document is contained in its so-called parts and fields. Parts contain arbitrary binary data (e.g. an XML document, a PDF file, an image). Fields contain simple information of a certain data type (string, date, decimal, ...).

The diagram below gives an overview of the document structure, this is explained in more detail below.

4.1.2 No hierarchy

Daisy has no folders or directories like a filesystem, all documents are stored in one big bag. When saving a document, you only have to choose a name for it (which acts in fact as the title of the document), and this name is not even required to be unique (see below). Documents are retrieved by searching or browsing. Front-end applications like the Daisy Wiki allow to define multiple hierarchical views on the same set of repository documents.

4.1.3 Documents & document variants

A document can exist in multiple variants, e.g. in multiple languages. A document in itself does not consist of much, most of the data is contained in the document variants. From another point of view (which closer matches the implementation), one could say that the repository server actually manages document variants, which happen to share a few properties (most notably their identity) through the concept of a document.

A document has always at least one document variant, a document cannot exist by itself without variants.

A document is identified uniquely by its ID, a document variant is identified by the triple {document ID, branch, language}.

If you are not interested in using variants, you can mostly ignore them. In that case each document will always be associated with exactly one document variant. Therefore, often when we speak about a document in Daisy, we implicitly mean "a certain variant of a document" (a "document variant"). In a practical working environment like the Daisy Wiki, the branch and language which identify the particular variant of the document are usually a given (Daisy Wiki: configured per site), and you'll only work with document IDs, so it is as if the existence of variants is transparent.

Refer to the diagram above to see if a certain aspect applies to a document, a document variant, or a version of a document variant.

For more details on this topic, see variants.

4.1.4 Document properties

4.1.4.1 ID

When a document is saved for the first time, it is assigned a unique ID. The ID is the combination of a sequence counter and the repository namespace. If the repository namespace is FOO, then the first document will get ID 1-FOO, the second 2-FOO, and so on. The ID of a document never changes.

4.1.4.2 Owner

The owner of a document is a person who is always able to access (read/write) the document, regardless of what the ACL specifies. The owner is initially the creator of the document, but can be changed afterwards.

4.1.4.3 Created

The date and time when the document was created. This value never changes.

4.1.4.4 Last Modified and Last Modifier

Each time a document is saved, the user performing the save operation is stored as the last modifier, and the date and time of the save operation as the "last modified" timestamp.

Note that each document variant has their own last modified and last modifier properties, which are usually more interesting: the last modified and modifier of the document are only updated when some of the shared document properties change.

4.1.5 Document variant properties

4.1.5.1 Versions

A document consists of versioned and non-versioned data. Versioned data means that each time the document is saved (and some of the versioned aspects of the document changed), a new version will be stored, so that the older state of the data can still be viewed afterwards.

It hence provides a history of who made what changes at what time. It also allows to work on newer versions of a document while an older version stays the live version, as explained in version state.

4.1.5.2 Versioned Content

The versioned content of a document consists of the following:

So if any changes are made to any of these, and the document is stored, a new version is created.

4.1.5.2.1 Version ID

Each version has an ID, which is simply a numeric sequence number: the first version has number 1, the next number 2, and so on.

4.1.5.2.2 Document Name

The name of a document is required (it cannot be empty). The name is not required to be unique. Thus there can be multiple documents with the same name. The ID of the document is its unique identification.

The name is usually rendered as the title of the document.

4.1.5.2.3 Parts

A part contains arbitrary binary data. "Binary data" simply means that it can be any sort of information, such as plain text, XML or HTML, an image, a PDF or OpenOffice document.

In contrast with many repositories or file systems, a Daisy document can contain multiple parts. This allows to store different types of data in one document (e.g. text and an image), and makes these parts separately retrievable.

For example, one could have a document with a part containing an abstract and a part containing the main text. It is then very easy and efficient to show a page with the abstracts of a set of document.

As another example, a document for an image could contain a part with the rendered image (e.g. as PNG), a part with a thumbnail image and a part with the source image file (e.g. a PhotoShop or SVG file).

The parts that can be added to a document are controlled by its document type.

Each part:

4.1.5.2.4 Fields

Fields contain simple information of a certain data type (string, date, decimal, ...). Depending on how you look at it, fields could be metadata about the data stored in the parts, or can be data by themselves.

One of the data types supported for fields is link, which allows the field to contain a link to another Daisy document. Link-type fields are useful for defining structured links (associations) between documents. For example, you could have documents describing wines, and other documents describing regions. Using a link-type field you can connect a wine to a region. By having this association in a field, it is easy to perform searches such as all wines associated with a certain region. The Daisy Wiki allows, by means of the Publisher, to aggregate data from linked documents when displaying a document, which combined with some custom styling allows to do very interesting things.

Fields can be multi-valued. The order of the values in a multi-value field is maintained. The same value can appear more than once.

A field can be hierarchical, meaning that its value represents a hierarchical path. A field can be multi-value and hierarchical at the same time.

The fields that can be added to a certain document are specified by its document type.

Each field:

A document can contain links in the content of parts (for example, an <a> element in HTML) or in link-type fields. Next to this a document can have a number of so-called out-of-line links. These are links stored separately from the content. Each link consists of a title and a target (some URL). These links are usually rendered at the bottom of a page in as a bulleted list.

Out-of-line links are useful in case you want to link to related documents (or any URL) and either don't want or can't (e.g. in case of non-HTML content) link to them from the content of a part.

4.1.5.2.6 Version state & the live version

Each version can have a state indicating whether it is a draft version (i.e. you started editing the document but are not finished yet, in other words the changes should not yet be published), or a publishable version. The most recent version having the state 'publish' becomes the live version. The live version is the version that is typically shown by default to the user. It is also the version whose data is indexed in the full-text index, and whose properties are used by default when querying. The ACL enables to restrict access for users to only the live versions of documents.

4.1.5.3 Non-versioned properties

4.1.5.3.1 Document type

Each document is associated with a document type, describing the parts and fields the document can contain. See repository schema for more information on document types.

4.1.5.3.2 Collections and collection membership

Collections are sets of documents. A document can belong to zero, one or more collections, thus collections can overlap. A collection is simply a way to combine some documents in order to do something with them or treat them in some special way. In other words, they are a sort of built-in (always present) metadata to identify a set of documents.

Collections themselves can be created or deleted only by Administrators (in the Daisy Wiki, this is done in the administration interface). Deleting a collection does not delete the documents in it. You can limit who can put documents in a collection by ACL rules.

4.1.5.3.3 Custom fields

Custom fields are arbitrary name-value pairs assigned to a document. The name and value are both strings. In contrast with the earlier-mentioned fields that are part of the document type, these fields are non-versioned. This makes it possible to stick tags to documents without causing a new version to be created, and without formally defining a field type.

4.1.5.3.4 Private

A document marked as private can only be read (and written) by its owner.

While the global access control system of Daisy makes it easy to centrally handle access control for sets of documents, sometimes it could be useful to simply say "I want nobody else to see this (for now)". This can be done by enabling the private flag. The document will then not be accessible for others, and also won't turn up in search results done by others. The private flag can be set on or off at any time, by the owner or by an Administrator.

There is however one big exception: Administrators can always access all documents, and thus will be able to read your "private" documents. The content is not encrypted.

4.1.5.3.5 Retired

If a document variant is no longer needed, because its content is outdated, replaced by others, or whatever, you can mark the document variant as retired. This makes the document variant virtually deleted. It won't show up in search results anymore.

The retired flag can be set on or off at any time, retiring is not a one-time operation.

4.1.5.3.6 Lock

A lock can be taken on a document variant to make sure nobody else edits the document variant while you're working on it.

Daisy automatically performs so-called optimistic locking, this means that if person A starts editing the document, and then person B starts editing the document, and then person A saves the document, and then person B tries to save the document, this last operation will fail because the document has changed since the time person B loaded it. This mechanism is always enabled, it is not needed to take an explicit lock.

A lock can then be taken to make others aware that you are editing the document. A lock can be of two types: an exclusive lock or a warn lock. An exclusive lock is pretty much as its name implies: it is a lock exclusively for the user who requested it, and avoids that any one else will be able to save the document until you release the lock. A warn lock isn't really a lock, it is just an informational mechanism to let others know that someone else also started to edit the document, but it doesn't enforce anything. Anyone else can still at any time save the document or replace the lock with their own.

A lock can optionally have a certain duration, if the duration is expired, the lock is automatically removed.

For example, the Daisy Wiki application by default uses exclusive locks with a duration of 15 minutes, and automatically extends them as longs as the user continues editing.

A lock can be removed either by the person who created it, or by an Administrator.

4.1.5.3.7 Last Modified and Last Modifier

Each time a document is saved, the user performing the save operation is stored as the last modifier, and the date and time of the save operation as the "last modified" timestamp. This will often fall together with the Created/Creator fields of the last version, but not necessarily so: if only non-versioned properties are changed, no new version will be created.

4.2 Repository schema

4.2.1 Overview

The repository schema controls the structure of documents.

The repository schema defines part types, field types and document types. A document type is a combination of zero or more part types and zero or more field types. Part and field types are defined as independent entities, meaning that the same part and field types can be reused across different document types. The diagram below shows the structure and relation of all these entities.

4.2.1.1 Common aspects of document, part and field types

Let us first look at the things document, part and field types have in common. Their primary, unchangeable identifier is a numeric ID, though they also have a unique name (which can be changed after creation), which you will likely prefer to use.

Next to the name, they can be optionally assigned a localized label and a description. Localized means that a different label and description can be given for different locales. A locale can be a language, language-country, or language-country-variant specification. For example, a label entered for the locale "fr-BE " would mean it is in French, and specifically for Belgium. The labels and descriptions are retrieved using a fallback system. For example, if the user's locale is "fr-BE", the system will first check if a label is available for "fr-BE", if not found it will check for "fr", and finally for the empty locale "". Thus if you want to provide labels and descriptions but are not interested in localization, you can simply enter them for the empty locale.

Document, part and field types cannot be deleted as long as they are still in use in the repository. Once a document has been created that uses one of these types, the type can thus not be deleted anymore (unless the documents using them are deleted). However, it is possible to mark a type as deprecated to indicate it should not be used anymore. This deprecation flag is purely informational, the system simply stores it.

4.2.1.2 Document types

A document type combines a number of part types and field types. The association with the part and field types, in the diagram shown as the "Part Type Use" and "Field Type Use", are not stand-alone entities but part of the document type.

The associations have a property to indicate whether or not the parts and fields are required to have a value.

The associations also have a property called 'editable'. This property is a hint towards the document editing GUI that the part or field should not be editable. This is just a GUI hint, not an access control restriction. This can for example be useful if the values of certain fields or parts are assigned by an automated process.

4.2.1.3 Part types

A part type defines a part that can be added to a document.

4.2.1.3.1 Mime-type

A part type allows to restrict which types of data (thus which mime-types) are stored in the part, but this is not required. This restriction is done by specifying a list of allowed mime types.

4.2.1.3.2 The Daisy HTML flag

A part type has a flag indicating whether the part contains "Daisy HTML". Daisy HTML is basically HTML formatted as well-formed XML (with element and attribute names lowercased). It is not the same as XHTML, because the elements are not in the XHTML namespace. If the "Daisy HTML" flag is set to true, the mime-type should be limited to text/xml. For the repository server, the Daisy-HTML flag on the part type has little meaning. Currently it serves only to enable the creation of document summaries (which might even be replaced with a more flexible mechanism in the future). The Daisy Wiki front end application will show a wysiwyg editor for Daisy HTML parts, and display the content of such parts inline.

4.2.1.3.3 Link extraction

For each part type a link extractor can be defined to extract links from the content contained in the part. The most common link extractor is the "daisy-html" one, which will extract links from the href attribute of the <a> element, the src attribute of the <img> element, and the character content of <p class="include">. The format of the links is:

daisy:<document id>
or
daisy:<document id>@<branch id or name>:<language id or name>:<version id>#fragment_id

Links that don't conform to this form will be ignored. The <version id> can take the special value "LAST" (case insensitive). A link without a version specification denotes a link to the live version of the document. The branch, language and version and fragment ID parts are all optional. For example, daisy:15@:nl is a link to the Dutch version of document 15.

The repository server also has link extractors for extracting links from navigation and book definition documents.

4.2.1.4 Field types

A field type defines a field that can be added to a document.

4.2.1.4.1 Value Type

The most important thing a field type tells about a field is its value type. A value type identifies the kind of data that can be stored in a field, the available value types are listed in the table below, together with their matching Java class.

Value type name

Corresponding Java class

string

java.lang.String

date

java.util.Date

datetime

java.util.Date

long

java.lang.Long

double

java.lang.Double

decimal

java.math.BigDecimal

boolean

java.lang.Boolean

link

org.outerj.daisy.repository.VariantKey

The link type is somewhat special: it defines a link to another document variant. Its value is thus a triple (document ID, branch ID, language ID). The branch ID and language ID are optional (value -1 in the VariantKey object) to denote they should default to the same as the containing document (in other words, the branch and language are relative to the document). The branch and language will usually be unspecified, since this allows copying content between the variants while the links stay relative to the actual variant.

4.2.1.4.2 Multi-value

The multi-value property of a field type indicates whether the fields of that type can have multiple values. All the values of a multi-value field should be of the same value type.

A multi-value field can have more than once the same value, and the order of values of a multi-value field is maintained. Thus the values of a multi-value field form an ordered list.

In the Java API, a multi-value value is represented as an Object[] array, in which the entries are objects of the type corresponding to the field's value type (e.g. an array of String's, or an array of Long's).

4.2.1.4.3 Hierarchical

The hierarchical property of a field type indicates that the value of the fields of that type is a hierarchical path (a path in some hierarchy). A path is often represented as a slash-separated string, e.g. Animals/Four-legged/Dogs.

Hierarchical fields are technically quite similar to multi-value fields, because a hierarchical path is also an ordered set of values. It is however possible for a field type to be both hierarchical and multi-value at the same time.

In the Java API, a hierarchical value is represented by a HierarchyPath object:

org.outerj.daisy.repository.HierarchyPath

A multi-value hierarchical value is an array (Object[]) of HierarchyPath o