Explore the Seajas way to better data extraction and analysis.

Seajas Search

Up-to-date results of continuously updated internet information, blazing fast indexing, user friendly maintenance.
The Seajas Search Profiler gives you easy control of your external information sources.

Control your information.

Through effective use of existing innovations in the area of structured internet information, the Search Profiler is the go-to platform for controlling your external information sources. By continuously indexing the internet information from your sources you will always be up-to-date. More about sources

Existing infrastructure Apache Solr, Autonomy IDOL

What kind of sources?

Seajas Search's acquisition component supports a broad range of information sources. From simple internet RSS feeds to renowned news organization archives.

  • RSS: 0.90, 0.91 Netscape, 0.91 Userland, 0.92, 0.93, 0.94, 1.0 en 2.0
  • Atom: 0.3 en 1.0
  • Free-form internet news content (such as OverheidsVideo)
  • Newsarchives via FTP (such as ANP and Kluwer)
  • Newsarchives via Webservices (such as Basis Wetten Bestand)
In addition to traditionally structured sources the Search Profiler also offers functionality with which semi-structured content can easily be added as a source. This is the case with, for example, OverheidsVideo.
RSS, Rijksoverheid, ANP, BISTRO

Easily maintainable.

Using our user interface, information sources can be easily maintained. Watch the video

Adding a source

Integrates with existing infrastructure.

Do you have an existing search infrastructure? Aside from a revolutionary search interface based on Lucene/Solr the Search Profiler also provides support for, amongst others, Autonomy IDOL. Features such as classification are offered for both environments. By using an open enricher component the Search Profiler offers a layer of abstraction which offers a lot of freedom. More about the enricher component.

The Search Enricher

The Search Enricher is an Apache Tika based enricher. It is a component which aids in making your content and documents more relevant and more easily searchable.

  • ACI support. With support for Autonomy's open ACI exchange protocol the Search Enricher can seamlessly connect to your Autonomy IDOL search engine.
  • Solr support. Aside from support for Autonomy ACI, Lucene/Solr can also be connected seamlessly through the use of its binary exchange protocol.
  • Sectioning. To improve search relevance the Search Enricher provides configurable content-splitting functionality.
  • Key Content Extraction. Using Key Content Extraction (KCE) the Search Enricher can heuristically determine the effective content of a page. This processing step removes unnecessary information, such as menus and advertisements.
  • Open Source. Proven Technology. The Seach Enricher is an open source project developed by the central government of the Netherlands. It has already proven itself in production environments, one of which is the central government's Government Reform Programme.
When connecting to Autonomy IDOL the FileSystemFetch component is not necessary. The Search Enricher, using its ACI support, forms the bridge between the application and the search engine.

Extraction

Information extraction.

The Seajas Search Profiler distinguishes itself by fully indexing external information, rather than only using short summaries (meta information) from the external source. The Seajas Search Profiler extracts the most relevant information from the actual page using Key Content Extraction.

When all information is extracted from the original web page, possibilities are created for more extensive search results and more advanced features. Page meta-data is also extracted, along with the relevant page content. More about search functionality

Search functionality

By performing smart meta-dating the Search Profiler offers possibilities for complex and extensive search-features. Using these, powerful functionalities can then be achieved.

  • Full-text searching. The entire document, both content as well as title and other metadata can be searched.
  • Facetted search. Several metadata fields lend themselves perfectly for facetted search. This offers opportunities for searching through calendarized dates, as well as classifications.
  • Classification search. With the availability of a classification-tree, searches can be narrowed down to a particular (sub-)collection. A search can, for example, be narrowed to merely Economic News, or only to National Newspapers (through ANP).
  • Possibilities for alert-services. By offering both the indexing- and publication dates, it becomes possible to connect to search alert- and attender-based systems. For example through integration with Autonomy Federator.
  • Highlighting and spellchecking. After splitting the search terms, these terms can then be highlighted in the search result summary through changing a word's background color. User input may also be checked for spelling errors.
  • Automatic completion. So-called autocomplete functionality can be derived from both the content of document titles as well as the entire document content.
Aside from serving as a back-end to existing search infrastructures, the Search Profiler also offers its own user interface which contains some of these features.

Simple and flexible.

In addition to automatic extraction, external information is also processed through a configurable modification pipeline. In this pipeline the content of the web page or feed is taken, and the start of the effective content can then be configured from the maintenance interface. This can be done in a simple manner, but for additional flexibility regular expression and scripts in various programming languages may also be entered. More about modifiers

Content Modifiers

The Search Profiler can use so-called modifiers to dynamically alter your content so that the information therein can be made to be even more relevant. This can be done simply, or flexibly. Three possibilities are offered:

  • Text matching. The easiest option, which calls for entering a simple beginning- and end text from the source document. An example is when your effective content starts at <div id="news"> and ends at the corresponding </div>.
  • Regular expressions. With this option a beginning- and end expression may be entered. This is useful when an element which marks the boundaries of your content contains a dynamic value. An example would be /<div id="news-[0-9]">/.
  • Scripting. For even more complex cases - and also for semi-structured sources such as free-form internet content - scripting support is offered. Here the document-content is offered in its entirety to either JavaScript, Groovy, Ruby or an XSL template which then modify the output in any way you see fit. The caching, cleaning and retrieval features of the Search Profiler are also exposed to these scripts through a set of utility functions.
These modifiers can be applied to both the document offering an index of the information, as well as the result pages taken from said index. An example of an index or supply document is the RSS feed of a news website. An example of a result page is a newspage itself from said news website.

Automatic document detection.

The Seajas Search Profiler provides functionality to automatically detect which type of documents your external source provides, without requiring any additional information. In addition to this, the character set of the given documents can also be detected automatically, ensuring that your search results are anomaly-free.

All document-types

The search interface makes all document types accessibly searchable. Watch the video

Document detection

This video is not yet available.

Classify your information.

With the help of classification your content can be categorized using an easy to navigate visual categorization tree viewer. A component is also available to navigate through this tree when composing a query on the front-end, so that users can create even more specific searches. And in addition to built-in support for categorization, integration with external systems is also available so you can quickly integrate with your existing infrastructure.

How to classify

Using the handy maintenance interface your can easily classify your sources. Watch the video

Classifying sources

This video is not yet available.

Taxonomy-interface

Information freedom.

We offer our product for local installation within your own environment. This gives you full access - and full control - over both the indexed and raw data. Useful, but also strategically smart not be be dependent on a third party for the storage of your potentially sensitive or strategic data. And in addition to application maintenance we also offer the possibility to extend your license so that you have access to the application source code, for absolute certainty. More about licenses

Licensing

Licensing is on a per-server basis. Our licensing model precludes any restrictions on the number of results or the number of sources.

  Description Cost
Single server The Search Profiler will be available on a single machine within your organization. Email and phone support is available. Contact us.
Single server + source code The Search Profiler will be available on a single machine within your organization. In addition to email and phone support the source-code may also be viewed. Contact us.

For more information regarding licensing you may contact us.

Blazing fast processing.

Thanks to advanced parallel processing innovations the Seajas Search Profiler can effortlessly and efficiently talk to all of your external sources at the same time. A maximum number of documents to be retrieved simulatenously can be configured, as well as a maximum number of simultaneous documents per source or server. This ensures maximum throughput while ensuring minimal load on your external sources.

Whenever and however often you'd like.

Your information can be retrieved at a configurable interval, and indexed along with or separetely from retrieval. This way, information can be retrieved all day long and can then be indexed at a given time or interval so to not interfere with your production processes. And in addition to dynamic background scheduling, you can always start the retrieval and indexing tasks from the maintenance interface manually. Should you choose to be continuously up-to-date, then immediate intervals can ensure that the latest documents can be found right away.

We still need to finish this.
We've been working hard on our product launch over the last couple of days, so please bear with us as we complete the last parts of the website. In the meantime—if you're Dutch—take a look at this factsheet containing a concise overview of everything that Seajas Search offers.
We still need to finish this.
We've been working hard on our product launch over the last couple of days, so please bear with us as we complete the last parts of the website. In the meantime—if you're Dutch—take a look at this factsheet containing a concise overview of everything that Seajas Search offers.
We still need to finish this.
We've been working hard on our product launch over the last couple of days, so please bear with us as we complete the last parts of the website. In the meantime—if you're Dutch—take a look at this factsheet containing a concise overview of everything that Seajas Search offers.

Check out Seajas Search and make it your own.

It’s time to start searching smarter.