Website Downloads Documentation Knowledgebase Wiki Issue tracker Commercial support
WARNING: you are not looking at the live version but at an older version.

Feature overview (technical)

The Daisy project encompasses two major parts: a featurful document repository and a web-based, wiki-like frontend. If you have different frontend needs then those covered by the standard Daisy frontend, you can still benefit hugely from building upon its repository part.

Daisy is a Java-based application, and is based on the work of many valuable open source packages, without which Daisy would not have been possible. All third-party libraries or products we redistribute are unmodified (unforked) copies.

The document repository

Some of the main features of the document repository are:

  • Storage and retrieval of documents.
  • Documents can consists of multiple content parts and fields, document types define what parts and fields a document should have. Fields can be of different data types (string, date, decimal, boolean, ...) and can have a list of values to choose from. Parts can contain arbitrary binary data, but the document type can limit the allowed mime types. So a document (or more correctly a part of a document) could contain XML, an image, a PDF document, ... Parts cannot contain arbitrary large amounts of data (ie like 3 GB of data), since they are loaded fully in memory when reading or writing (streaming support might be added in the future).
  • Versioning of the content parts and fields. Each version can have a state of 'published' or 'draft'. The most recent version which has the state published is the 'live' version, ie the version that is displayed by default (depends on the behaviour of the frontend application of course).
  • Documents can be marked as 'retired', which makes them appear as deleted, they won't show up unless explicitely requested.
  • The repository doesn't care much what kind of data is stored in its parts, but if it is "HTML-as-well-formed-XML", some additional features are provided:
    • link-extraction is performed, which allows to search for referers of a document.
    • a summary (first 300 characters) is extracted to display in search results
    • (these features could potentially be supported for other formats also)
  • all documents are stored in one "big bag", there are no directories. Each document is identified by a unique ID (an ever-increasing sequence number starting at 1), and has a name (which does not need to be unique). Hierarchical structure is provided by the frontend by the possibility to create hierarchical navigation trees.
  • Documents can be combined in so-called "collections". Collections are sets of the documents. One document can belong to multiple collections, in other words, collections can overlap.
  • possibility to take exclusive locks on documents for a limitted or unlimitted time. Checking for concurrent modifications (optimistic locking) happens automatically.
  • documents are automatically full-text indexed (Jakarta Lucene based). Currently supports plain text, XML, PDF (through PDFBox) and MS-Word (through Jakarta POI).
  • repository data is stored in a relation database. Currently only targets MySQL/InnoDB, support for other databases is for later (serializable transaction level and row-level locking are required database features). The part content is stored in normal files on the file system (to offload the database). The usage of these familiar, open technologies, combined with the fact that the daisywiki frontend stores plain HTML, makes that your valuable content is easily accessible with minimal "vendor" lock-in.
  • a high-level, sql-like query language provides flexible querying without knowing the details of the underlying SQL database schema. The query language also allows to combine full-text (Lucene) and metadata (SQL) searches. Search results are filtered to only contain documents the user is allowed to access (see also access control). The content of parts (if HTML-as-well-formed-XML) can also be selected as part of a query, which is useful to retrieve eg the content of an "abstract" part of a set of documents.
  • Accesscontrol: instead of attaching an ACL to each individual document, there is a global ACL which allows to specify the access rules for sets of documents by selecting those documents based on expressions. This allows for example to define access control rules for all documents of a certain type, or for all documents in a certain collection.
  • The full functionality of the repository is available via an HTTP+XML protocol, thus providing language and platform independent access.
  • A high-level, easy to use Java API, available both as an "in-JVM" implementation for embedded scenarios or services running in the daisy server VM, as well as an implementation that communicates transparently using the HTTP+XML protocol.
  • For various repository events, such as document creation and update, events are broadcasted via JMS (currently we include OpenJMS). The content of the events are XML messages. Internally, this is used for updating the full-text index, notification-mail sending and clearing of remote caches. Logging all JMS events gives a full audit log of all updates that happened to the repository.
  • Repository extensions can provide additional services, included are:
    • a notification email sender (which also includes the management of the subscriptions)
    • a navigation tree management component and a publisher component, which plays hand-in-hand with our frontend (see further on)
  • A JMX console allows some monitoring and maintenance operations, such as optimization or rebuilding of the fulltext index, monitoring memory usage, document cache size, or database connection pool status.

The "Daisywiki" frontend

The frontend is called the "Daisywiki" because, just like wikis, it provides a mixed browsing/editing environment with a low entry barrier. However, it also differs hugely from the original wikis, in that it uses wysiwyg editing, has a powerful navigation component, and inherits all the features of the underlying daisy repository such as different document types and powerful querying.

Here are some of the main features and differentiators:

  • wysiwyg HTML editing
    • supports recent Internet Explorer and Mozilla/Firefox (gecko) browsers, with fallback to a textarea on other browsers. The editor is customized version of HTMLArea (through plugins, not a fork).
    • We don't allow for arbitrary HTML, but limit it to a small, structural subset of HTML, so that it's future-safe, output medium independent, secure and easily transformable. It is possible to have special paragraph types such as 'note' or 'warning'. The stored HTML is always well-formed XML, and nicely layed-out. Thanks to a powerful (server-side) cleanup engine, the stored HTML is exactly the same whether edited with IE or Mozilla, allowing to do source-based diffs.
    • insertion of images by browsing the repository or upload of new images (images are also stored as documents in the repository, so can also be versioned, have metadata, access control, etc)
    • easy insertion document links by searching for a document
    • a heartbeat keeps the session alive while editing
    • an exlusive lock is automatically taken on the document, with an expire time of 15 minutes, and the lock is automatically refreshed by the heartbeat
    • editing screens are built dynamically for the document type of the document being edited.
  • Version overview page, from which the state of versions can be changed (between published and draft), and diffs can be requested.
  • Nice version diffs, including highlighting of actual changes in changed lines (ignoring re-wrapping).
  • Support for includes, ie the inclusion of one document in the other.
  • Support for embedding queries in pages.
  • A hierarchical navigation tree manager. As many navigation trees as you want can be created. Navigation trees are defined as XML and stored in the repository as documents, thus access control (for authoring them, read access is public), versioning etc applies. One navigation tree can import another one. The nodes in the navigation tree can be listed explicitely, but also dynamically inserted using queries. When a navigation tree is generated, the nodes are filtered according to the access control rules for the requesting user. Navigation trees can be requested in "full" or "contextualized", this last one meaning that only the nodes going to a certain document are expanded. The navigtion tree manager produces XML, the visual rendering is up to XSL stylesheets.
  • Powerful document-publishing engine, supporting:
    • processing of includes (works recursive, with detection of recursive includes)
    • processing of embedded queries
    • document type specific styling (XSLT-based), also works nicely combined with includes, ie each included document will be styled with its own stylesheet depending on its document type.
  • PDF publishing (using Apache FOP), with all the same features as the HTML publishing, thus also document type specific styling.
  • search pages:
    • fulltext search
    • searching using Daisy's query language
    • display of referers ("incoming links")
  • Multiple-site support, allows to have multiple perspectives on top of the same daisy repository. Each site can have a different navigation tree, and is associated with a default collection. Newly created documents are automatically added to this default collection, and searches are limited to this default collection (unless requested otherwise).
  • XSLT-based skinning, with resuable 'common' stylesheets (in most cases you'll only need to adjust one 'layout' xslt, unless you want to customise heavily). Skins are configurable on a per-site basis.
  • Management pages for managing:
    • the repository schema (the document types)
    • the users
    • the collections
    • access control
  • The frontend currently doesn't perform any caching, all pages are published dynamically, since this also depends on the access rights of the current user. For publishing of high-trafic, public (ie all public access as the same user), read-only sites, it is probably best to develop a custom publishing application.
  • Built on top of Apache Cocoon (an XML-oriented web publishing and application framework), using Cocoon Forms, Apples (for stateful flow scenarios), and the repository client API.
Comments (0)
Advertisement

Daisy hosting, installation, support. Workshops and turnkey Daisy CMS projects. Get Daisy from its creators.

outerthought.org

Downloads provided by

SourceForge.net Logo

Open source stats