Daisy documentation
Book Index

Daisy documentation

Table of Contents

1 Documentation Home

These pages contain the documentation of the Daisy 2.2 release - browse the navigation on the left.

See also:

The documentation is also available published as a Daisy-book.

For an end-user introduction to Daisy, have a look at the video tutorials.

2 Installation

2.1 Downloading Daisy

Packaged versions of Daisy can be found in the distribution area (Sourceforge). This includes everything required to run Daisy, except for:

If you don't have these already, the installation of these will be covered further on.

Consider subscribing to the Daisy mailing list to ask questions and talk with fellow Daisy users and developers.

There is also information available about the source code.

2.2 Installation Overview

Daisy is a multi-tier application, consisting of a repository server and a publication layer. Next to those, a database server (MySQL) is required. All together, this means three processes, which can run on the same server or on different servers.

The Daisy binary distribution packs most of the needed software together, the only additional things you'll need is a Java Virtual Machine for your platform, and MySQL. All libraries and applications shipped with Daisy are the original, unmodified distributions that will be configured as part of the installation. We've only grouped them in one download for your convenience.

If you follow the instructions in this document, you can have Daisy up and running in less than an hour.

The diagram below gives an overview of the the setup. All shown port numbers are configurable of course.

2.2.1 Platform Requirements

We have tested the Daisy installation on Windows 2000/XP, GNU/Linux and MacOSX. Other unixes like Solaris should also work, though we don't test that ourselves.

2.2.2 Memory Requirements

By default, the Daisy Wiki and Daisy Repository Server are started with a maximum heap size of 128 MB each. To this you need to add some overhead of the JVMs themselves, and then some memory for MySQL, the OS and its (filesystem) caches. This doesn't mean all this memory will be used, that will depend on usage intensity.

2.2.3 Required knowledge

These installation instructions assume you're comfortable with installing software, editing configuration (XML) files, running applications from the command line, setting environment variables, and that sort of stuff.

2.2.4 Can I use Oracle, PostgreSQL, MS-SQL, ... instead of MySQL? Websphere, Weblogic, Tomcat, ... instead of Jetty?

Daisy contains the necessary abstractions to support different database engines, though we currently only support MySQL. Users are welcome to contribute and maintain different databases (ask on the mailing list how to get started).

The Daisy Wiki webapp should be able to run in any servlet container (at least one that can run unpacked webapps, and as far as there aren't any Cocoon-specific issues), but we ship Jetty by default. For example, using Tomcat instead of Jetty is very simple and is described on this page.

2.3 Installing a Java Virtual Machine

Daisy requires the Java JDK or JRE 1.5 or 1.6 (the versions are also know as 5 or 6). You can download it from here on the Sun site (take by preference the JDK, not the JRE). Install it now if you don't have it already.

After installation, make sure the JAVA_HOME environment variable is defined and points to the correct location (i.e., the directory where Java is installed). To verify this, open a command prompt or shell and enter:

For Windows:
%JAVA_HOME%/bin/java -version

For Linux:
$JAVA_HOME/bin/java -version

This should print out something like:

java version "1.5.0"

or

java version "1.6.0"

2.3.1 Installing JAI (Java Advanced Imaging) -- optional

If you want images (especially PNG) to appear in PDFs, it is highly advisable to install JAI, which you can download from the JAI project on java.net. Take the JDK (or JRE) package, this will make JAI support globally available.

2.4 Installing MySQL

Daisy requires one of the following MySQL versions:

MySQL can be downloaded from mysql.com. Install it now, and start it (often done automatically by the install).

Windows users can take the "Windows Essentials" package. During installation and the configuration wizard, you can leave most things to their defaults. In particular, be sure to leave the "Database Usage" to "Multifunctional Database", and leave the TCP/IP Networking enabled (on port 3306). When it asks for the default character set, select "Best Support For Multilingualism" (this will use UTF-8). When it asks for Windows options, check the option "Include Bin Directory In Windows Path".

Linux users: install the "MySQL server" and "MySQL client" packages. Installing the MySQL server RPM will automatically initialize and start the MySQL server.

2.4.1 Creating MySQL databases and users

MySQL is used by both the Daisy Repository Server and JMS (ActiveMQ). Therefore, we are now going to create two databases and two users.

Open a command prompt, and start the MySQL client as root user:

mysql -uroot -pYourRootPassword

On some systems, the root user has no password, in which case you can drop the -p parameter.

Now create the necessary databases, users and access rights by entering (or copy-paste) the commands below in the mysql client. What follows behind the IDENTIFIED BY is the password for the user, which you can change if you wish. The daisy@localhost entries are necessary because otherwise the default access rights for anonymous users @localhost will take precedence. If you'll run MySQL on the same machine as the Daisy Repository Server, you only need the @localhost entries.

CREATE DATABASE daisyrepository CHARACTER SET 'utf8';
GRANT ALL ON daisyrepository.* TO daisy@'%' IDENTIFIED BY 'daisy';
GRANT ALL ON daisyrepository.* TO daisy@localhost IDENTIFIED BY 'daisy';
CREATE DATABASE activemq CHARACTER SET 'utf8';
GRANT ALL ON activemq.* TO activemq@'%' IDENTIFIED BY 'activemq';
GRANT ALL ON activemq.* TO activemq@localhost IDENTIFIED BY 'activemq';

2.5 Extract the Daisy download

Extract the Daisy download. On Linux/Unix you can extract the .tar.gz file as follows:

tar xvzf daisy-<version>.tar.gz

On non-Linux unixes (Solaris notably), use the GNU tar version if you experience problems extracting.

On Windows, use the .zip download, which you can extract using a tool like WinZip.

After extraction, you will get a directory called daisy-<version>. This directory is what we will call from now on the DAISY_HOME directory. You may set a global environment variable pointing to that location, or you can do it each time in the command prompt when needed.

2.6 Daisy Repository Server

2.6.1 Initialising and configuring the Daisy Repository

Open a command prompt or shell and set an environment variable DAISY_HOME, pointing to the directory where Daisy is installed.

Windows:
set DAISY_HOME=c:\daisy-2.0

Linux:
export DAISY_HOME=/home/daisy_user/daisy-2.0

Then go to the directory <DAISY_HOME>/install, and execute:

daisy-repository-init

Follow the instructions on screen. The installation will (1) initialize the database tables for the repository server and (2) create a Daisy data directory containing customized configuration files.

2.6.2 Starting the Daisy Repository Server

Still in the same command prompt (or in a new one, but make sure DAISY_HOME is set), go to the directory <DAISY_HOME>/repository-server/bin, and execute:

daisy-repository-server <location-of-daisy-data-dir>

In which you replace <location-of-daisy-data-dir> with the location of the daisy data directory created in the previous step.

Starting the repository server usually only takes a few seconds, however the first time it will take a bit longer because the workflow database tables are created during startup. When the server finished starting it will print a line like this:

Daisy repository server started [timestamp]

Wait for this line to appear (the prompt will not return).

2.7 Daisy Wiki

2.7.1 Initializing the Daisy Wiki

Before you can run the Daisy Wiki, the repository needs to be initialised with some document types, a "guest" user, a default ACL configuration, etc.

Open a command prompt or shell, make sure DAISY_HOME is set, go to the directory <DAISY_HOME>/install, and execute:

daisy-wiki-init

The program will start by asking a login and password, enter here the user created during the execution of daisy-repository-init (the default was testuser/testuser). It will also ask for the URL where the repository is listening, you can simply press enter here.

If everything goes according to plan, the program will now print out some informational messages and end with "Finished.".

2.7.2 Creating a "wikidata" directory

Similar to the data directory of the Daisy repository server, the Daisy Wiki also has its own data directory (which we call the "wikidata directory").

To set up this directory, open a command prompt or shell, make sure DAISY_HOME is set, go to the directory <DAISY_HOME>/install, and execute:

daisy-wikidata-init

and follow the instructions on-screen.

Since the Daisy Wiki and the Daisy repository server are two separate applications (which might be deployed on different servers), each has its own data directory.

2.7.3 Creating a Daisy Wiki Site

The Daisy Wiki has the concept of multiple sites, these are multiple views on top of the same repository. You need at least one site to do something useful with the Daisy Wiki, so we are now going to create one.

Open a command prompt or shell, make sure DAISY_HOME is set, go to the directory <DAISY_HOME>/install, and execute:

daisy-wiki-add-site <location of wikidata directory>

The application starts by asking the same parameters as for daisy-wiki-init.

Then it will ask a name for the site. This should be a name without spaces. If you're inspirationless, enter something like "test" or "main".

Then it will ask for the sites directory location, for which the presented default should be OK, so just press enter.

2.7.4 Starting the Daisy Wiki

Open a command prompt or shell and make sure DAISY_HOME is set.

Go to the directory <DAISY_HOME>/daisywiki/bin, and execute:

daisy-wiki <location of wikidata directory>

Background info: this will start Jetty (a servlet container) with the webapp found in <DAISY_HOME>/daisywiki/webapp.

2.8 Finished!

Now you can point your web browser to:

http://localhost:8888/

To be able to create or edit documents, you will have to change the login, you can use the user you created for yourself while running daisy-repository-init (the default was testuser/testuser).

To start the Daisy repository server and Daisy Wiki after the initial installation, see the summary here, or even better, set up service (init) scripts to easily/automatically start and stop Daisy.

2.9 2.1 to 2.2 upgrade

2.9.1 Changes

2.9.1.1 Translation management + translation import/export

We've added a bunch of translation management functionality. See also the documentation.

2.9.1.2 Finegrained (partial) read access

The ACL model has been extended with 'read access details' that allow to give partial read access to documents. For example, this way you can allow a user to see a document exists, but deny access to its fields and parts.

2.9.1.3 Other

2.10 2.1 to 2.2 compatibility

2.10.1 Acess control (ACL) changes

2.10.1.1 Read access details added, read-live permission removed

IMPORTANT CHANGE! Anyone making use of access control should read this.

The ACL model has been extended to allow to define detailed permissions when granting the read permission. We call this read access details.

As part of this change, the 'read live' permission has been removed, and has been replaced by a read access detail. Instead of granting the 'read live' permission, now you grant the read permission, and in the read access details, you specify that non-live versions cannot be read.

The database upgrade script will adjust the ACL configuration in the database to the new format, but due to the nature of the ACL structure, this upgrade can't be done perfectly (in all cases). Therefore if you rely on the read-live permission, after upgrading you should check that the ACL still behaves as intended.

As an example, a situation like this cannot be automatically upgraded correctly:

   Read-live       Read
   grant           grant
   don't change    deny

2.10.1.2 ACL API changes

As a consequence of these changes, the enumeration value AclPermission.READ_LIVE has been removed, programs making use of this will need to be updated.

To check if a user has read access to a document, before you would have done:

aclResultInfo.isAllowed(AclPermission.READ);
or
aclResultInfo.getActionType(AclPermission.READ) == AclActionType.GRANT

Now, this doesn't guarantee anymore that the user has full read access (i.e. to non-live versions, to all fields and parts, ...). If you want to verify the user has full read access, you can use this convenience method:

aclResultInfo.isFullyAllowed(AclPermission.READ);

2.10.1.3 ACL XML format changes

The XML representation of the ACL has also changed a bit. Permissions as part of an d:aclEntry are now specified in the same way as in d:aclResultInfo, with an extensible set of elements rather than attributes. This makes it possible to add the accessDetails as a child element. Besides, it also adds more consistency and makes it easier to add new types of permissions.

2.10.2 Version.setState() method -- IMPORTANT!

Version.setState(VersionState) behaviour changed: before, the change was instantly sent to the backend, now the change is deferred until you call version.save().

2.10.3 Other repository changes

2.10.4 Import/export tool

The option fullStackTracesOfFailures was mistakenly spelled fullStrackTracesOfFailures. This has been corrected now, the wrong option name will not work anymore.

2.10.5 Skinning

(revision r4547) The layoutType 'mini' has been removed as it is no longer used.  Accordingly, the files mini.css and layout-mini.css were removed.
Additionally, plain.css now imports doceditor.css.

2.10.6 Relaxation of variant creation right

Previously, adding a new variant to a document required write access to the start variant, but now it only required read access.

2.11 2.1 to 2.2 upgrade

These are the upgrade instructions for when you have currently Daisy 2.1 installed.

In case you have problems during the upgrade or notice errors or shortcomings in the instructions below, please let us know on the Daisy mailing list.

2.11.1 Review compatibility notes

Review the compatibility notes.

As described in the above notes, this release contains a change to the ACL model which makes a perfect automated upgrade impossible (in certain cases) in case you rely on the read-live permission.

2.11.2 Upgrading

2.11.2.1 Daisy installation review

In case you're not very familiar with Daisy, it is helpful to identify the main parts involved. The following picture illustrates these.

There is the application directory, which is simply the extracted Daisy download, and doesn't contain any data (to be safe don't remove it yet though).

Next to this, there are 3 locations where data (and configuration) is stored: the relational database (MySQL), the repository data directory, and the wiki data directory. The Daisy repository and the Daisy Wiki are two independent applications, therefore each has its own data directory.

The text between the angle brackets (< and >) is the way we will refer to these directories further on in this document. Note that <DAISY_HOME> is the new extracted download (see later on), not the old one.

2.11.2.2 Make a backup

(blah blah)

2.11.2.3 Stop your existing Daisy

Stop your existing Daisy, both the repository server and the Daisy Wiki.

2.11.2.4 Download and extract Daisy 2.2

If not done already, download Daisy 2.2 from the distribution area (Sourceforge). For Windows, download the zip or autoextract.exe (not the installer!). For Unix-based systems, the .tar.gz is recommended. The difference is that the .zip contains text files with DOS line endings, while the .tar.gz contains text files with unix line endings. When using non-Linux unixes such as Solaris, be sure to use GNU tar to extract the archive.

Extract the download at a location of your choice. Extract it next to your existing Daisy installation, do not copy it over your existing installation.

2.11.2.5 Update environment variables

Make sure the DAISY_HOME environment variable points to the just-extracted Daisy 2.2 directory.

Note that when you start/stop Daisy using the wrapper scripts, you don't need to set DAISY_HOME, though you do need to update or re-generate the service wrapper configuration (see next section).

How this is done depends a bit on your system and personal preferences:

2.11.2.6 Updating the repository SQL database

Execute the database upgrade script:

cd <DAISY_HOME>/misc
mysql -Ddaisyrepository -udaisy -ppassword < daisy-2_1-to-2_2.sql

On many MySQL installations you can use "root" as user (thus specify -uroot instead of -udaisy) without password, thus without the -p option.

2.11.2.7 Start Daisy

Start Daisy as usual, using the normal scripts or the wrapper scripts.

2.11.2.8 Reinstall review workflow

As the result of an incompatible API change in Daisy, you need to reinstall the sample workflows. Of course, if you have not used workflow and don't ever plan to use it, you can skip this.

In case you're wondering, the problem is more specifically situated in the use of Version.setState() in the review workflow.

Reinstalling the sample workflows is simple:

Log in to Daisy in the role Administrator, go to the Administration console, select "Manage process definitions" (Below workflow) and select "Reinstall sample workflows".

2.12 2.2-RC to 2.2 upgrade

2.12.1 Changes since 2.2-RC

Misc fixes and translation updates.

2.12.2 Upgrade instructions

These are the upgrade instructions for when you have currently Daisy 2.2-RC installed.

This release requires no special upgrade steps, besides putting the new Daisy distribution in place.

In case you have problems during the upgrade or notice errors or shortcomings in the instructions below, please let us know on the Daisy mailing list.

2.12.2.1 Daisy installation review

In case you're not very familiar with Daisy, it is helpful to identify the main parts involved. The following picture illustrates these.

There is the application directory, which is simply the extracted Daisy download, and doesn't contain any data (to be safe don't remove it yet though).

Next to this, there are 3 locations where data (and configuration) is stored: the relational database (MySQL), the repository data directory, and the wiki data directory. The Daisy repository and the Daisy Wiki are two independent applications, therefore each has its own data directory.

The text between the angle brackets (< and >) is the way we will refer to these directories further on in this document. Note that <DAISY_HOME> is the new extracted download (see later on), not the old one.

2.12.2.2 Stop your existing Daisy

Stop your existing Daisy, both the repository server and the Daisy Wiki.

2.12.2.3 Download and extract Daisy 2.2

If not done already, download Daisy 2.2 from the distribution area (Sourceforge). For Windows, download the zip or autoextract.exe (not the installer!). For Unix-based systems, the .tar.gz is recommended. The difference is that the .zip contains text files with DOS line endings, while the .tar.gz contains text files with unix line endings. When using non-Linux unixes such as Solaris, be sure to use GNU tar to extract the archive.

Extract the download at a location of your choice. Extract it next to your existing Daisy installation, do not copy it over your existing installation.

2.12.2.4 Update environment variables

Make sure the DAISY_HOME environment variable points to the just-extracted Daisy 2.2 directory.

Note that when you start/stop Daisy using the wrapper scripts, you don't need to set DAISY_HOME, though you do need to update or re-generate the service wrapper configuration (see next section).

How this is done depends a bit on your system and personal preferences:

2.12.2.5 Start Daisy

Start Daisy using the normal scripts or the wrapper scripts.

3 Source Code

Sources can be obtained through SVN. Instructions for setting up a development environment with Daisy (which is slightly different from using the packaged version) are included in the README.txt's in the source tree. For anonymous, read-only access to Daisy SVN, use the following command:

svn co http://svn.cocoondev.org/repos/daisy/trunk/daisy

This will give the latest development code (the "trunk"). To get the source code of a specific release, use a command like this:

svn co http://svn.cocoondev.org/repos/daisy/tags/RELEASE_2_2_0 daisy

See also the existing tags.

No authentication is required for anonymous access. If you're behind a (transparent) proxy, you might want to verify whether your proxy supports the extended HTTP WebDAV methods.

3.1 Daisy Build System

We should consider removing this document, Maven is common enough these days.

The build system used by Daisy is Maven, an Apache project.

3.1.1 Maven intro

What follows is the very-very-quick Maven intro, for those not familiar with Maven.

Unlike Ant, where you tell how your code should be build, in Maven you simply tell what directory contains your code, and what the dependencies are (i.e. what other jars it depends on), and it will build your code. This information is stored in the project.xml files that you'll see across the Daisy source tree. There are a lot of them, since Daisy is actually composed of a whole lot of mini-projects, whereby some of these projects depend on one or more of the others.

An important concept of Maven is the repository, which is a repository of so-called artifacts, usually jar files. An artifact in the repository is identified uniquely by a group id and an id (both are simply descriptive names). Declaring the dependencies of a project is done by specifying repository references, thus for each dependency you specify the group id and id of the dependency. An example dependency declaration, as defined in a project.xml file:

    <dependency>
      <groupId>lucene</groupId>
      <artifactId>lucene</artifactId>
      <version>1.3</version>
    </dependency>

So where does the repository physically exist? Well, there can be many repositories. The most important public one is on ibiblio:

http://www.ibiblio.org/maven/

The repository is simply accessed using HTTP, so you can take your browser and surf to that URL. A repository like the one on ibiblio is called a remote repository. After initially downloading an artifact from the remote repository, it is installed in your local repository, which is by default located in ~/.maven/repository.

When you build a project, the result of the build is usually a jar file. Maven will install this jar file in your local repository, so that when you build another project that depends on this jar file, it can be found over there. When searching a dependency, Maven always checks the local repository first, and then goes off checking remote repositories. Which remote repositories are searched is of course configurable.

I should also tell you something about the build.properties and project.properties files. Both files contain properties for the build and configuration for Maven. The difference is that the project.properties files are committed to the source repository (SVN in Daisy's case), while the build.properties files are intended for local customisations (thus on your computer). So if you see something in a project.properties file that you'd like to change, don't change it over there (as this will otherwise show up as a modified file when doing svn status), but do it in the build.properties file. The build.properties file thus has a higher precedence than the project.properties file.

There is a lot more to tell about Maven, such as that it is actually composed of a whole lot of plugins, that there is something like "goals" to execute, that there is the possibility to have a maven.xml file to define custom goals with custom build instructions, and that all artifacts are also versioned. But I'll let you explore the Maven documentation to learn about that.

3.1.2 Extra dependencies

Daisy has some dependencies on artifacts (remember, jar files) that are not available in the public ibiblio repository. We make these available in our own repository on http://cocoondev.org/repository/.

3.1.3 Building Daisy

Instructions for building Daisy can be found in the README.txt file in the root of the Daisy source tree. At some point it will tell you to execute maven in the root of the source tree, which will actually build all the little mini-projects of which Daisy consists, in the correct sequence so that all dependencies are satisfied.

4 Repository server

The repository server is the core of Daisy. It provides the pure content management functionality without GUI (graphical user interface).

The main purpose of the repository is managing documents.

The repository server consists of a core and some non-essential extension components that add additional functionality. The repository can be accessed by a variety of client applications (such as web applications, command-line tools, desktop applications, ...) through its programming interfaces.

4.1 Documents

4.1.1 Introduction

The purpose of the Daisy Repository Server is managing documents. The main content of a document is contained in its so-called parts and fields. Parts contain arbitrary binary data (e.g. an XML document, a PDF file, an image). Fields contain simple information of a certain data type (string, date, decimal, ...).

The diagram below gives an overview of the document structure, this is explained in more detail below.

4.1.2 No hierarchy

Daisy has no folders or directories like a filesystem, all documents are stored in one big bag. When saving a document, you only have to choose a name for it (which acts in fact as the title of the document), and this name is not even required to be unique (see below). Documents are retrieved by searching or browsing. Front-end applications like the Daisy Wiki allow to define multiple hierarchical views on the same set of repository documents.

4.1.3 Documents & document variants

A document can exist in multiple variants, e.g. in multiple languages. A document in itself does not consist of much, most of the data is contained in the document variants. From another point of view (which closer matches the implementation), one could say that the repository server actually manages document variants, which happen to share a few properties (most notably their identity) through the concept of a document.

A document has always at least one document variant, a document cannot exist by itself without variants.

A document is identified uniquely by its ID, a document variant is identified by the triple {document ID, branch, language}.

If you are not interested in using variants, you can mostly ignore them. In that case each document will always be associated with exactly one document variant. Therefore, often when we speak about a document in Daisy, we implicitly mean "a certain variant of a document" (a "document variant"). In a practical working environment like the Daisy Wiki, the branch and language which identify the particular variant of the document are usually a given (Daisy Wiki: configured per site), and you'll only work with document IDs, so it is as if the existence of variants is transparent.

Refer to the diagram above to see if a certain aspect applies to a document, a document variant, or a version of a document variant.

For more details on this topic, see variants.

4.1.4 Document properties

4.1.4.1 ID

When a document is saved for the first time, it is assigned a unique ID. The ID is the combination of a sequence counter and the repository namespace. If the repository namespace is FOO, then the first document will get ID 1-FOO, the second 2-FOO, and so on. The ID of a document never changes.

4.1.4.2 Owner

The owner of a document is a person who is always able to access (read/write) the document, regardless of what the ACL specifies. The owner is initially the creator of the document, but can be changed afterwards.

4.1.4.3 Created

The date and time when the document was created. This value never changes.

4.1.4.4 Last Modified and Last Modifier

Each time a document is saved, the user performing the save operation is stored as the last modifier, and the date and time of the save operation as the "last modified" timestamp.

Note that each document variant has their own last modified and last modifier properties, which are usually more interesting: the last modified and modifier of the document are only updated when some of the shared document properties change.

4.1.4.5 Reference language

This field is used for translation management purposes.

It specifies which language variant is the reference language, that is the language on which translations are based. For example, if you first write all your content in English and than translate it to other language, English is the reference language.

See  Translation management for more information on this topic.

4.1.5 Document variant properties

4.1.5.1 Versions

A document consists of versioned and non-versioned data. Versioned data means that each time the document is saved (and some of the versioned aspects of the document changed), a new version will be stored, so that the older state of the data can still be viewed afterwards.

It hence provides a history of who made what changes at what time. It also allows to work on newer versions of a document while an older version stays the live version, as explained in version state.

4.1.5.2 Versioned Content

The versioned content of a document consists of the following:

So if any changes are made to any of these, and the document is stored, a new version is created.

4.1.5.2.1 Version ID

Each version has an ID, which is simply a numeric sequence number: the first version has number 1, the next number 2, and so on.

4.1.5.2.2 Document Name

The name of a document is required (it cannot be empty). The name is not required to be unique. Thus there can be multiple documents with the same name. The ID of the document is its unique identification.

The name is usually rendered as the title of the document.

4.1.5.2.3 Parts

A part contains arbitrary binary data. "Binary data" simply means that it can be any sort of information, such as plain text, XML or HTML, an image, a PDF or OpenOffice document.

In contrast with many repositories or file systems, a Daisy document can contain multiple parts. This allows to store different types of data in one document (e.g. text and an image), and makes these parts separately retrievable.

For example, one could have a document with a part containing an abstract and a part containing the main text. It is then very easy and efficient to show a page with the abstracts of a set of document.

As another example, a document for an image could contain a part with the rendered image (e.g. as PNG), a part with a thumbnail image and a part with the source image file (e.g. a PhotoShop or SVG file).

The parts that can be added to a document are controlled by its document type.

Each part:

4.1.5.2.4 Fields

Fields contain simple information of a certain data type (string, date, decimal, ...). Depending on how you look at it, fields could be metadata about the data stored in the parts, or can be data by themselves.

One of the data types supported for fields is link, which allows the field to contain a link to another Daisy document. Link-type fields are useful for defining structured links (associations) between documents. For example, you could have documents describing wines, and other documents describing regions. Using a link-type field you can connect a wine to a region. By having this association in a field, it is easy to perform searches such as all wines associated with a certain region. The Daisy Wiki allows, by means of the Publisher, to aggregate data from linked documents when displaying a document, which combined with some custom styling allows to do very interesting things.

Fields can be multi-valued. The order of the values in a multi-value field is maintained. The same value can appear more than once.

A field can be hierarchical, meaning that its value represents a hierarchical path. A field can be multi-value and hierarchical at the same time.

The fields that can be added to a certain document are specified by its document type.

Each field:

A document can contain links in the content of parts (for example, an <a> element in HTML) or in link-type fields. Next to this a document can have a number of so-called out-of-line links. These are links stored separately from the content. Each link consists of a title and a target (some URL). These links are usually rendered at the bottom of a page in as a bulleted list.

Out-of-line links are useful in case you want to link to related documents (or any URL) and either don't want or can't (e.g. in case of non-HTML content) link to them from the content of a part.

4.1.5.3 Version metadata

The content of versions is immutable after creation, but versions also have some metadata which can be updated at any time.

4.1.5.3.1 Version state & the live version

Each version can have a state indicating whether it is a draft version (i.e. you started editing the document but are not finished yet, in other words the changes should not yet be published), or a publishable version. The most recent version having the state 'publish' becomes the live version. The live version is the version that is typically shown by default to the user. It is also the version whose data is indexed in the full-text index, and whose properties are used by default when querying. The ACL enables to restrict access for users to only the live versions of documents.

4.1.5.3.2 Change type

Indicates if the version contains a major change or minor change.

In the context of translation management, this field is important because it indicates whether the changes in this version invalidate the translations which are based on this content. See  Translation management for more information on this topic.

If you don't use translation management, you can use this field for your own appreciation of what is a major or minor change.

4.1.5.3.3 Synced-with link

This field is used for translation management purposes.

It is a couple {language, version ID} which specifies that the content of this version is brought up to date with the content of the specified language variant version (within the same branch variant).

See  Translation management for more information on this topic.

4.1.5.3.4 Comment

A flexible comment field (up to 1000 characters), usually used to describe in a few words what was changed with respect to the previous version.

4.1.5.4 Non-versioned properties

4.1.5.4.1 Document type

Each document is associated with a document type, describing the parts and fields the document can contain. See repository schema for more information on document types.

4.1.5.4.2 Collections and collection membership

Collections are sets of documents. A document can belong to zero, one or more collections, thus collections can overlap. A collection is simply a way to combine some documents in order to do something with them or treat them in some special way. In other words, they are a sort of built-in (always present) metadata to identify a set of documents.

Collections themselves can be created or deleted only by Administrators (in the Daisy Wiki, this is done in the administration interface). Deleting a collection does not delete the documents in it. You can limit who can put documents in a collection by ACL rules.

4.1.5.4.3 Custom fields

Custom fields are arbitrary name-value pairs assigned to a document. The name and value are both strings. In contrast with the earlier-mentioned fields that are part of the document type, these fields are non-versioned. This makes it possible to stick tags to documents without causing a new version to be created, and without formally defining a field type.

4.1.5.4.4 Private

A document marked as private can only be read (and written) by its owner.

While the global access control system of Daisy makes it easy to centrally handle access control for sets of documents, sometimes it could be useful to simply say "I want nobody else to see this (for now)". This can be done by enabling the private flag. The document will then not be accessible for others, and also won't turn up in search results done by others. The private flag can be set on or off at any time, by the owner or by an Administrator.

There is however one big exception: Administrators can always access all documents, and thus will be able to read your "private" documents. The content is not encrypted.

4.1.5.4.5 Retired

If a document variant is no longer needed, because its content is outdated, replaced by others, or whatever, you can mark the document variant as retired. This makes the document variant virtually deleted. It won't show up in search results anymore.

The retired flag can be set on or off at any time, retiring is not a one-time operation.

4.1.5.4.6 Lock

A lock can be taken on a document variant to make sure nobody else edits the document variant while you're working on it.

Daisy automatically performs so-called optimistic locking, this means that if person A starts editing the document, and then person B starts editing the document, and then person A saves the document, and then person B tries to save the document, this last operation will fail because the document has changed since the time person B loaded it. This mechanism is always enabled, it is not needed to take an explicit lock.

A lock can then be taken to make others aware that you are editing the document. A lock can be of two types: an exclusive lock or a warn lock. An exclusive lock is pretty much as its name implies: it is a lock exclusively for the user who requested it, and avoids that any one else will be able to save the document until you release the lock. A warn lock isn't really a lock, it is just an informational mechanism to let others know that someone else also started to edit the document, but it doesn't enforce anything. Anyone else can still at any time save the document or replace the lock with their own.

A lock can optionally have a certain duration, if the duration is expired, the lock is automatically removed.

For example, the Daisy Wiki application by default uses exclusive locks with a duration of 15 minutes, and automatically extends them as longs as the user continues editing.

A lock can be removed either by the person who created it, or by an Administrator.

4.1.5.4.7 Last Modified and Last Modifier

Each time a document is saved, the user performing the save operation is stored as the last modifier, and the date and time of the save operation as the "last modified" timestamp. This will often fall together with the Created/Creator fields of the last version, but not necessarily so: if only non-versioned properties are changed, no new version will be created.

4.1.5.5 Calculated properties

4.1.5.5.1 last / live major change version ID

These are automatically calculated properties kept in the repository and available through the API and in the query language. They are useful for translation management purposes. See also the LangInSync() and LangNotInSync() conditions of the query language.

The last major change version is the last version to which a major change happened.

The live major change version is last version, up to the live version, to which a major change happened.

The following table illustrates this.

Version

State

Change type

Last major change

Live major change

1

Draft

Major

1

NULL

2

Draft

minor

1

NULL

3

Live

minor

1

1

4

Live

Major

4

4

4.2 Repository schema

4.2.1 Overview

The repository schema controls the structure of documents.

The repository schema defines part types, field types and document types. A document type is a combination of zero or more part types and zero or more field types. Part and field types are defined as independent entities, meaning that the same part and field types can be reused across different document types. The diagram below shows the structure and relation of all these entities.

4.2.1.1 Common aspects of document, part and field types

Let us first look at the things document, part and field types have in common. Their primary, unchangeable identifier is a numeric ID, though they also have a unique name (which can be changed after creation), which you will likely prefer to use.

Next to the name, they can be optionally assigned a localized label and a description. Localized means that a different label and description can be given for different locales. A locale can be a language, language-country, or language-country-variant specification. For example, a label entered for the locale "fr-BE " would mean it is in French, and specifically for Belgium. The labels and descriptions are retrieved using a fallback system. For example, if the user's locale is "fr-BE", the system will first check if a label is available for "fr-BE", if not found it will check for "fr", and finally for the empty locale "". Thus if you want to provide labels and descriptions but are not interested in localization, you can simply enter them for the empty locale.

Document, part and field types cannot be deleted as long as they are still in use in the repository. Once a document has been created that uses one of these types, the type can thus not be deleted anymore (unless the documents using them are deleted). However, it is possible to mark a type as deprecated to indicate it should not be used anymore. This deprecation flag is purely informational, the system simply stores it.

4.2.1.2 Document types

A document type combines a number of part types and field types. The association with the part and field types, in the diagram shown as the "Part Type Use" and "Field Type Use", are not stand-alone entities but part of the document type.

The associations have a property to indicate whether or not the parts and fields are required to have a value.

The associations also have a property called 'editable'. This property is a hint towards the document editing GUI that the part or field should not be editable. This is just a GUI hint, not an access control restriction. This can for example be useful if the values of certain fields or parts are assigned by an automated process.

4.2.1.3 Part types

A part type defines a part that can be added to a document.

4.2.1.3.1 Mime-type

A part type allows to restrict which types of data (thus which mime-types) are stored in the part, but this is not required. This restriction is done by specifying a list of allowed mime types.

4.2.1.3.2 The Daisy HTML flag

A part type has a flag indicating whether the part contains "Daisy HTML". Daisy HTML is basically HTML formatted as well-formed XML (with element and attribute names lowercased). It is not the same as XHTML, because the elements are not in the XHTML namespace. If the "Daisy HTML" flag is set to true, the mime-type should be limited to text/xml. For the repository server, the Daisy-HTML flag on the part type has little meaning. Currently it serves only to enable the creation of document summaries (which might even be replaced with a more flexible mechanism in the future). The Daisy Wiki front end application will show a wysiwyg editor for Daisy HTML parts, and display the content of such parts inline.

4.2.1.3.3 Link extraction

For each part type a link extractor can be defined to extract links from the content contained in the part. The most common link extractor is the "daisy-html" one, which will extract links from the href attribute of the <a> element, the src attribute of the <img> element, and the character content of <p class="include">. The format of the links is:

daisy:<document id>
or
daisy:<document id>@<branch id or name>:<language id or name>:<version id>#fragment_id

Links that don't conform to this form will be ignored. The <version id> can take the special value "LAST" (case insensitive). A link without a version specification denotes a link to the live version of the document. The branch, language and version and fragment ID parts are all optional. For example, daisy:15@:nl is a link to the Dutch version of document 15.

The repository server also has link extractors for extracting links from navigation and book definition documents.

4.2.1.4 Field types

A field type defines a field that can be added to a document.

4.2.1.4.1 Value Type

The most important thing a field type tells about a field is its value type. A value type identifies the kind of data that can be stored in a field, the available value types are listed in the table below, together with their matching Java class.

Value type name

Corresponding Java class

string

java.lang.String

date

java.util.Date

datetime

java.util.Date

long

java.lang.Long

double

java.lang.Double

decimal

java.math.BigDecimal

boolean

java.lang.Boolean

link

org.outerj.daisy.repository.VariantKey

The link type is somewhat special: it defines a link to another document variant. Its value is thus a triple (document ID, branch ID, language ID). The branch ID and language ID are optional (value -1 in the VariantKey object) to denote they should default to the same as the containing document (in other words, the branch and language are relative to the document). The branch and language will usually be unspecified, since this allows copying content between the variants while the links stay relative to the actual variant.

4.2.1.4.2 Multi-value

The multi-value property of a field type indicates whether the fields of that type can have multiple values. All the values of a multi-value field should be of the same value type.

A multi-value field can have more than once the same value, and the order of values of a multi-value field is maintained. Thus the values of a multi-value field form an ordered list.

In the Java API, a multi-value value is represented as an Object[] array, in which the entries are objects of the type corresponding to the field's value type (e.g. an array of String's, or an array of Long's).

4.2.1.4.3 Hierarchical

The hierarchical property of a field type indicates that the value of the fields of that type is a hierarchical path (a path in some hierarchy). A path is often represented as a slash-separated string, e.g. Animals/Four-legged/Dogs.

Hierarchical fields are technically quite similar to multi-value fields, because a hierarchical path is also an ordered set of values. It is however possible for a field type to be both hierarchical and multi-value at the same time.

In the Java API, a hierarchical value is represented by a HierarchyPath object:

org.outerj.daisy.repository.HierarchyPath

A multi-value hierarchical value is an array (Object[]) of HierarchyPath objects.

4.2.1.4.4 Selection Lists

It is