PromptCloud expands DaaS offering with large-scale document-level crawl service

14 Jan, 2013

PromptCloud, which deals with large-scale data crawl and extraction and offers Big Data solutions, has launched a document-level crawl service or a 'custom search engine' that will  enable enterprises to get near real-time alerts on topics of their interests without having to search manually, a top executive of the company told Techcircle.

Run by Bangalore-based SDF Technologies Pvt Ltd, PromptCloud offers customised large-scale data crawl and extraction by leveraging cloud computing solutions, and follows a data-as-a-service model.

"Document-level crawls are valid if one is interested in discovering blogs/articles/news of interest based on custom set of keywords, geographies or categories. To enable this, we crawl thousands of sites and use special techniques to extract only relevant documents specific to each client's requirements," said Prashant Kumar, founder, PromptCloud. "Earlier, we mostly did site-based crawls where we categorised pages to be crawled on each site before running the crawls. The delivered data provided record-level details like product name, price, specifications, reviews, author, etc. However, document-level crawls are done on a very large scale, and so the data being extracted contains only document-level details, such as the URL of the page, title of the article, keywords matched, etc."

According to Kumar, this is not exactly a search engine but an on-demand data-as-a-service platform that feeds one's requirements into itself and customises the solution so that users get only relevant data. More importantly, one doesn't have to skim through the results to check the accuracy since the programmes do that job for him/her, he added.

"For instance," Kumar continued, "one of our clients had a team of editors who on a daily basis manually looked for interesting gossips on the web across almost 400 sites for their list of celebrities. The company also had the celebrities' Twitter handles that it wanted to closely watch and write about for its gossip column. We partnered with this company to provide our mass crawling service. Now, the editors just grab the article URLs/tweets of their interest, which is essentially the accurate result set. This data reaches them via our application programming interface (API) on a daily basis making the entire process easy and viable. Whenever there's a new story on any of their relevant topics, the system also alerts them with links and other details in a structured format. The entire pipeline is automated."

PromptCloud claims that the offering is vertical-agnostic and that the platform can process around a million sources for a client on a daily basis.

Founded in 2009 by IIT Kanpur alumnus Kumar, PromptCloud works with clients in sectors including travel, finance, healthcare, marketing and analytics, among others. It aggregates data from multiple sources of interest and then extracts relevant data and structures it as per a pre-defined scheme given by its clients.

(Edited by Prem Udayabhanu)