More useful info from Trend Micro...
How Does Content Filtering Work?
The Web filter uses two main methods to determine whether or not to block a given URL request based on the Web site content:
- URL matching—Web requests are checked against the Web Reputation database; the URL, category, user, and the highest-priority policy that contains the requesting user are all resolved and the appropriate action taken.
- Real-time rating—Web pages not found in the Web Reputation database are assessed and a content probability-rating is determined on the fly (in real time); the page rating and category are then evaluated against the highest priority policy containing the requesting user.
A Web page is categorized based on the aggregate weighting of all words on that site, which helps prevent over-blocking or incorrectly blocking a specific page. Once a URL/domain has been categorized and verified in another layer of the filter, the URL/domain is placed in the local cached database, which can be periodically uploaded to join the aggregate URL list.
More information
Each word gets assigned a probability rating for each of the 80-plus categories. For example, the English word ”casino,” which may appear most frequently in the ”gambling” category
—where it receives the highest probability rating
—may also have a statistical rating in all of the other categories. This implies that just because ”casino” is associated with gambling does not mean it is necessarily a gambling page.
Currently, real-time ratings focus on the most common languages that appear on the Web. As the need increases, full or partial coverage in other languages is being added.
Pornographic Content
Because of the exposure, liability and risk that organizations face when users access pornographic or adult material, the URL Filter is very good at recognizing and accurately rating this type of content. It is also very good at recognizing the ”potentially offensive” categories listed.