Packages required for cattaSearch

cattaSearch is based on a number of packages. They are all included in the cattaSearch download file

cattaSearch uses a number of components from other sources, called packages. They are all included in the cattaSearch download file, in the packages subfolder. This page is therefore only for information.

 

ADOdb ADOdb is a database abstraction layer for PHP, which hides the differences between the different database access functions in PHP. PHP database functions are not standardised, unfortunately. ADOdb makes it easier in the future to provide cattaSearch support for other databases.

Furthermore, ADOdb is fast and mature (devloped since 2000).

cattaSearch 1.0 is based on ADOdb version 5.20.17.

 

Bootstrap is the framework that enables the responsive design in cattaSearch. With Bootstrap cattaSearch automatically adapts to different devices such as smartphones, tablets and PCs with different screen sizes.

cattaSearch is designed for tablets like the Apple iPad with a screen resolution of 1024 x 768. With a larger screen more can be made visible at the same time which obviously is better. Small devices like smartphones can be used, but it is not good for complex apps like cattaSearch.

Bootstrap is a collection of JavaScript (JS) and style sheet (CSS) components.

cattaSearch 1.0 is based on Bootstrap version 4.5.0.

Note: In cattaSearch, the Glyphicons component from Bootstrap 3 is used and integrated into Bootstrap 4.

 

Bootstrap is based on jQuery and therefore jQuery is also required for running cattaSearch. jQuery is a fast and feature-rich JavaScript library.

jQuery and Bootstrap together (almost) eliminates the need for making various adaptions in cattaSearch for different web browsers. Another very big advantage.

cattaSearch 1.0 is based on jQuery version 3.5.1.

 

In order to enable full-text search, the plain text elements of documents are extracted and stored in the database. This requires the use of different utilities.

Encoding

The package Encoding - ref. github.com/neitanod/forceutf8 - is used to encode plain text from HTML files and ordinary text files into UTF-8 irrespective of the original encoding.

Encoding is required for full-text indexing of HTML and text files.

Strip out (X)HTML tags and invisible content

The PHP package strip_html_tags - ref. nadeausoftware.com/articles/2007/09/php_tip_how_strip_html_tags_web_page - removes tags and invisible content from HTML files so that only the plain text elements are left behind.

strip_html_tags is required for full-text indexing of HTML files.

Filetotext

The PHP class Filetotext - ref. www.phpclasses.org/ - is used to extract plain text from newer Microsoft Word documents (.docx).

Filetotext is required for full-text indexing of .docx documents.

cattaSearch 1.0 is based on Filetotext version 2015-03-01.

pdftotext

pdftotext is an optional add-on for cattaSearch.

pdftotext is an operating system component used by PHP in cattaSearch to extract plain text from PDF documents to enable full-text search.

pdftotext is part of the Xpdf software suite which is also ported to Windows. Poppler, which is derived from Xpdf, also includes an implementation of pdftotext. On most Linux distributions, pdftotext is included as part of the poppler-utils package, installed by default in many distributions.

www.foolabs.com/xpdf/home.html is the official home site for Xpdf from where the Windows version can be downloaded.

Note: pdftotext is operating system-specific and therefore not included in the cattaSearch download file.

 

Leave a Comment

 
Revised: 2020-05-24