Extractiv

Take a Tour > Semantic Web Crawling

Semantic Web Crawling brings together web crawling and natual language processing to provide a powerful entity extraction service for web content. If you're interested in classifying web content, performing advanced searches or other related tasks, then Semantic Web Crawling is for you.

How It Works

For example, let's say I want to find web pages that discuss business or financial news. Or more specifically, I want to discover what companies are discussed in articles that mention Apple, Inc. To do this with Extractiv, I would first select the entities I want to extract. I might select commercial_org, electronics_org, entertainment_org, financial_org, media_org and organization as my entity types.

Next, I would tell Extractiv which URLs to crawl. In this case, starting from a few business web sites might be a good approach, so I'll enter http://www.bloomberg.com, http://www.motleyfool.com, and http://www.forbes.com.

In order to make sure only web pages that mention Apple are included in my results, I'll put in Apple as one of my keyword filters.

With the job filled out, I'll submit it and wait for my results. When the job completes, I'll have a list of documents pertaining to Apple, along with the other organizations mentioned in those documents.

Use Cases

Online Advertising

Online ad networks are constantly striving to do better in two areas: broadening their inventory of ad channels and targeting specific ad delivery. Extractiv's Semantic Web Crawling can help with both. By running large semantic crawls over web content such as blogs, news sites and other media, an Extractiv user can quickly classify a large number of websites based on the entities found on those web sites.

For example, a crawl across several sports blogs would quickly determine which blogs are about which specific sports teams. The advertising network could then use this information to include these blogs in their distribution channel while also knowing which types of sports ads should be sent to specific blogs.

News and Publishing

A semantic web crawl across a news site can determine which articles cover similar topics and are related to each other. For example, if there are several articles pertaining to political parties, a semantic web crawl would highlight the organizations mentioned in each article and provide insight into which articles are more closely focused on particular political parties.

More Information

If you'd like more information, check out our Semantic Web Crawling documentation.