Here is a short overview over IILLC' products
Business address and POI databases
IILLC' business address and POI data is a comprehensive,
accurate, up-to-date and extremely rich source for POI data in +30
markets worldwide. It is the result of a unique combination of
several web-extraction approaches that make it unrivaled in depth and
freshness. It consists of the following three segments:
Free-crawled
records
This segment is
comprised of records extracted from business homepages. Through a
proprietary crawling process of locating business homepages, a
constant feed of new businesses are added to the DB. IILLC also
offers homepage appending services to clients' databases that have
little or no website information for their business listings.
Content
extraction is conducted through rule-based information extraction
systems. Repeated automated and manual (sample-based and
frequency-based) quality checks ensure high quality data extraction.
All records have
a URL, business name, address and telephone number. All records are
also classified to IILLC's own category system, which is mappable to
common category systems such as SIC or NAICS.
Full re-crawls
and extractions are conducted each month. Quicker update intervals
are possible.
Enriched
records
This segment
contains many additional rich content fields per business record. We
extract a variety of additional fields from our own URLs or URLs
provided to us. Some of the additional fields that we are able to
extract are as follows (the percentages represent the portion of our
database that contains this field):
- Secondary
trade name: 55%
- Fax: 40%
- E-mail address: 78% (45% also contain e-mails for individuals)
- Keywords (10 and more per record): 92%
- Opening hours: 20%
- "About
us" statements and tag lines: 75%
- Payments/credit cards
accept 8%
- Multi-media (e.g. images, videos, etc..): 35%
- Parking space: 6%
- Facebook/Twitter URLs/IDs: 4%
- Management person with
titles: 35%
- Wi-Fi
availability: 3.5%
- Website
statistics: Number of pages, number of incoming links, quality of
incoming links
Many other
general and category-specific fields (such as smoking policy,
wheelchair accessibility or restaurant menus) are available or can be
developed upon request.
Chain records
Finally, a third
segment of the IILLC POI database is built up by harvesting large
chain websites (e.g. franchise operations or retail chains –
normally this addresses the top 50-200 chains per country). This
covers for example: supermarkets, restaurant chains, and gas
stations, where information on individual locations is only available
by querying web forms. Typically, these chain locations contain
details such as opening hours, parking availability, and/or payment
methods accepted.
Chain extractions
are conducted each month. Quicker update intervals are possible.
Local search term banks
The IILLC
local search term bank is a rich and flexible terminology system
tailored for the local search domain. It is available in English,
Spanish, French and German. Other languages can be added on request.
Suggested usage and implementation
The following
non-exhaustive list of use cases illustrate what can be achieved with
the IILLC local search term bank:
Did-you-mean
dictionary: Correcting misspelled user entries
Suggest
dictionary: Populating a suggest / search-as-you-type box
Related
searches: Display relevant similar searches to users
Website
classification: Determine local search categories for business
websites based on term matches
Business
name classification: Determine local search categories for business
names based on (partial) term matches
Query
analysis on-the-fly: Recognize category / keyword parts of user
queries and resolve to standardized categories
Query-log
analysis: Improve understanding of user base through segmenting the
query-log into standardized categories
Sources and editing process
The local
search term bank was built up using text mining and computational
semantic methodology based on the following resources:
Extensive
crawls of more than 12 million business homepages world-wide
Multiple
query-log analysis conducted over the last 8 years
Text mining
of user-generated content (especially review sites)
Each
candidate term has to be approved manually by in-house editors before
entering the system.
The local
search term bank is regularly updated. Standard export cycles are
once per quarter, but more rapid updated are possible.