Documentation updates

This commit is contained in:
Roberto Rosario
2012-02-01 14:41:45 -04:00
parent a90896e774
commit eff399612d
7 changed files with 238 additions and 20 deletions

View File

@@ -0,0 +1,21 @@
======================
Document visualization
======================
Mayan EDMS tries to avoid having users to download a document and leave
Mayan EDMS to be able to see them, so in essence making Mayan EDMS a
visualization tool too. The conversion backend is a stack of functions,
first the mimetype is evaluated, if it is an office document it is passed
to libreoffice working in headless mode (and managed by supervisor)
via unoconv for conversion to PDF. The PDF is stored in a temporary
cache along side all the other files that were not office documents,
from here they are inspected to determine the page count and the
corresponding blank database entires are created. After the database
update they all go to the conversion driver specified by the user
(``python``, ``graphicsmagick``, imagemagick``) and a high resolution
master preview of each file is generated and stored in the persistent
cache. From the master previews in the persistent cache, volatile
previews are then created on demand for the different sizes requested
(thumbnail, page preview, full preview) and rotate interactively
in the details view.

View File

@@ -0,0 +1,25 @@
============
File storage
============
The files are stored and placed under Mayan EDMS "control" to avoid
filename clashes (each file gets renamed to its UUID and with an extension)
and stored in a simple flat arrangement in a directory. This doesn't
stop access to the files but it is not recommended because moving,
renaming or updating the files directly would throw the database out
of sync. For access to the files the recommended way is to create and
index which would create a directory tree like structure in the database
and then turn on the index filesystem mirror options which would create
an actual directory tree and links to the actual stored files but using
the filename of the documents as stored in the database. This
filesystem mirror of the index can them be shared with Samba across the
network. This access would be read-only, and new versions of the files
would have to be uploaded from the web GUI using the new document
versioning support.
Mayan's EDMS components are as decoupled from each other as possible,
storage in this case is very decoupled and its behavior is controlled
not by the project but by the Storage progamming class. Why this design?
All the other part don't make any assumptions about the actual file
storage, so that Mayan EDMS can work saving files locally, over the
network or even across the internet and still operate exactly the same.

12
docs/topics/indexes.rst Normal file
View File

@@ -0,0 +1,12 @@
=======
Indexes
=======
Administrators first define the template of the index and an instance
of the index is then auto-populated with links to the documents depending
on the rules of each branch of the index evaluated againts the metadata
of the documents. The index cannot be edited manually, only changing
the rules or the metadata of the documents would cause the index to be
regenerated. For manual organization of documents there are the folders,
their structure is however flat, and they have to be manually updated and
curated.

19
docs/topics/ocr.rst Normal file
View File

@@ -0,0 +1,19 @@
===
OCR
===
Because OCR is an intensive operation, documents are queued for OCR for
later handling, the amount of documents processed in parallel is
controlled by the ``OCR_NODE_CONCURRENT_EXECUTION`` configuration
option. Ideally the machine serving **Mayan EDMS** should disable OCR
processing by settings this options to 0, with other machines or cloud
instances then connected to the same database doing the OCR processing.
The document is checked to see if there are text parsers available, is
no parser is available for that file type then the document is passed
to tesseract page by page and the results stored per page, this is to
keep the page image in sync with the transcribed text. However when
viewing the document in the details tab all the pages text are
concatenated and shown to the user. Setting the ``OCR_AUTOMATIC_OCR``
option to ``True`` would cause all newly uploaded documents to be
queued automatically for OCR.

View File

@@ -0,0 +1,9 @@
===========
Smart links
===========
Smart links are usefull for navigation between documents. They are rule
based but don't created any organizational structure just show the documents
that match the rules as evaluated against the metadata of currently
displayed document. The index is global, the smart links are dependant
on the current document the user is viewing.