HTML Office Library: bridge between desktop and web
The first Delphi/Lazarus library for reading all office formats (including PDF) and converting to HTML on the fly.
The HTML Office Library is designed to work with the most popular document formats and convert documents from any source (file, DB, etc) to HTML. Converted document contains only plain HTML/CSS/SVG and can be displayed using HTML Component library or any browser.
Library provides a uniform access to an entire document and its parts, document resources (fonts, images, etc) and properties (title, Table of Contents, etc).
HTML Office Library doesn't depend on any external components (DLLs, OLE, ActiveX, etc) and is cross-platform. Fully written in Delphi and comes with full source code.
Following document formats are supported:
- Rich Text Format (RTF)
- MS Word 6-2007 binary format (DOC)
- MS Word XML document (DOCX)
- MS Power Point binary format (PPT)
- MS Power Point XML format (PPTX)
- MS Excel binary format (XLS)
- MS Excel XML format (XLSX)
- MS Excel XML binary format (XLSB)
- Adobe PDF format (PDF)
- Supercalc format (SXC)
- EPUB (electronic books).
- FB2 (electronic books).
- Markdown.
- Outlook Message (MSG)
- MIME message (.EML)
- Outlook databases (.OST, .PST)
- The Bat! database (.TBB)
- RAR archives
- MBOX files (Thunderbird and other mail app. mailboxes)
- CHM (help) file format.
Besides the document conversion classes it also contains the following:
- EMF/WMF to SVG conversion
- TTF to WOFF conversion
- TTF normalization
- TTF to SVG conversion
- CFF to TTF conversion
- JPX (JPEG2000) to PNG conversion
- PICT to PNG conversion
- Adobe PostScript to TTF conversion.
- ODTTF to TTF conversion.
Library also contains fully functional and database independent full text search engine which can easily seacrh across tens of thousands documents in different formats.
Features:
- Index documents in any formats supported by Office library.
- Fast indexing - up to 200 documents / sec.
- Compact index - around 5% of document size.
- Search of word sequences
- google-like search query language
- Search in document parts, f.e. only in title.
- Snippets
Supported Delphi versions are: Delphi 7 - Delphi 11.3 and Lazarus2+
Supported platforms: Windows 32/64 VCL and FMX, MacOS, Linux, Android, iOS.
For Delphi 7 - 2007 unicode is fully supported using widestrings
How fast is it? Some measurements:
Document | Convert to HTML with embedded images | Convert to HTML with referenced images | Convert to text |
---|---|---|---|
DOC, 838 pages, 17 Mb. | 437 ms, 20 Mb | 290 ms, 3.4 Mb. | 40 ms, 1.6 Mb. |
DOCX, 41 page, 1 Mb. | 40ms, 1.6 Mb | 40 ms, 306 Kb | 10 ms, 76 Kb |
PDF, 182 pages, 31 Mb | 3500 ms, 75 Mb | 312 ms, 2.7 Mb | 200 ms, 380 Kb |
PPT, 16 slides, 4.8 Mb | 218 ms, 6.8 Mb | 140 ms, 104 Kb | 170 ms, 98 Kb |
XLS, 9000 rows, 7 columns, 2 Mb | 94 ms, 3.5 Mb | 62 ms, 1.2 Mb | |
XLSX, 115000 rows, 95 columns, 44 Mb (320 Mb uncompressed) | 9200 ms, 275 Mb | 6900 ms, 40 Mb |
Search engine test
Sample database | 50 000 documents |
---|---|
Indexing time | 10 min |
Index size | 120 Mb |
Search time | 50-300 ms (depending on number of found documents) |
Dictionary size | 230000 words |
Total words in documents | 21 million |
There are two compiled demos available:
- Simple document viewer: allows to view any document from disk using file tree on a left side and HtPanel on a right side.
https://delphihtmlcomponents.com/FileBrowser.zip
To view final HTML press View in browser button. No installation required. - Code search application built using Office Library create full text search index for documents located in selected folders and find any document from application or Web.
https://delphihtmlcomponents.com/codefinder.html
Purchase link (Site License) Purchase link (Single Developer License)