Products

Here is a short overview over IILLC' products

Business address and POI databases

IILLC' business address and POI data is a comprehensive, accurate, up-to-date and extremely rich source for POI data in +30 markets worldwide. It is the result of a unique combination of several web-extraction approaches that make it unrivaled in depth and freshness. It consists of the following three segments:

Free-crawled records

This segment is comprised of records extracted from business homepages. Through a proprietary crawling process of locating business homepages, a constant feed of new businesses are added to the DB. IILLC also offers homepage appending services to clients' databases that have little or no website information for their business listings.

Content extraction is conducted through rule-based information extraction systems. Repeated automated and manual (sample-based and frequency-based) quality checks ensure high quality data extraction.

All records have a URL, business name, address and telephone number. All records are also classified to IILLC's own category system, which is mappable to common category systems such as SIC or NAICS.

Full re-crawls and extractions are conducted each month. Quicker update intervals are possible.

Enriched records

This segment contains many additional rich content fields per business record. We extract a variety of additional fields from our own URLs or URLs provided to us. Some of the additional fields that we are able to extract are as follows (the percentages represent the portion of our database that contains this field):


  • Secondary trade name: 55%
  • Fax: 40%
  • E-mail address: 78% (45% also contain e-mails for individuals)
  • Keywords (10 and more per record): 92%
  • Opening hours: 20%
  • "About us" statements and tag lines: 75%
  • Payments/credit cards accept 8%
  • Multi-media (e.g. images, videos, etc..): 35%
  • Parking space: 6%
  • Facebook/Twitter URLs/IDs:  4%
  • Management person with titles: 35%
  • Wi-Fi availability: 3.5%
  • Website statistics: Number of pages, number of incoming links, quality of incoming links

Many other general and category-specific fields (such as smoking policy, wheelchair accessibility or restaurant menus) are available or can be developed upon request.


Chain records

Finally, a third segment of the IILLC POI database is built up by harvesting large chain websites (e.g. franchise operations or retail chains – normally this addresses the top 50-200 chains per country). This covers for example: supermarkets, restaurant chains, and gas stations, where information on individual locations is only available by querying web forms. Typically, these chain locations contain details such as opening hours, parking availability, and/or payment methods accepted.

Chain extractions are conducted each month. Quicker update intervals are possible.


Local search term banks

The IILLC local search term bank is a rich and flexible terminology system tailored for the local search domain. It is available in English, Spanish, French and German. Other languages can be added on request.

Suggested usage and implementation


The following non-exhaustive list of use cases illustrate what can be achieved with the IILLC local search term bank:


  1. Did-you-mean dictionary: Correcting misspelled user entries

  2. Suggest dictionary: Populating a suggest / search-as-you-type box

  3. Related searches: Display relevant similar searches to users

  4. Website classification: Determine local search categories for business websites based on term matches

  5. Business name classification: Determine local search categories for business names based on (partial) term matches

  6. Query analysis on-the-fly: Recognize category / keyword parts of user queries and resolve to standardized categories

  7. Query-log analysis: Improve understanding of user base through segmenting the query-log into standardized categories

Sources and editing process

The local search term bank was built up using text mining and computational semantic methodology based on the following resources:


  1. Extensive crawls of more than 12 million business homepages world-wide

  2. Multiple query-log analysis conducted over the last 8 years

  3. Text mining of user-generated content (especially review sites)


Each candidate term has to be approved manually by in-house editors before entering the system.

The local search term bank is regularly updated. Standard export cycles are once per quarter, but more rapid updated are possible.