DocParser is a new take on the age old templates, first build in 1990s, to automate extracting data from static forms. It makes it easy to build templates and provides ready templates for a few documents that include a few of the fields from these documents. If you need to process a few hundred documents unique to your own organization, DocParser can be a good solution. However, for large volume documents or for commonly processed documents (such as invoices and receipts), Hypatos provides a templateless solution that provides more accurate results.
For more details, please read on:
What are templates?
Templates are created by users to explain software where to look for to find a specific data entity. This works greatly for a company’s internal documents if they are not changed over time.
Why do we not need them anymore?
When dealing with documents from outside the company, like in the case of invoices, templates are quite inadequate since:
- Enterprise have tens of thousands of suppliers, each relying on their own invoice format. It is an incredibly manual job to create a template for each new supplier. It can take 30 minutes to set up a template. Building enough templates for a large company in 6 months would require 200 employees working full time on creating templates. Here we assumed that a large company is served by 100k suppliers. No company is rich and foolish enough to dedicate 200 employees on a document extraction task when there is significantly more work that needs to be done after the data is extracted.
- Enterprises change suppliers. As much as 10% of suppliers can be renewed every year.
- Companies change their invoice formats, sometimes multiple times a year, to reflect address/logo/name changes, acquisitions or changing aesthetic preferences
- Templates are brittle even when dealing with a single company’s documents. For example, adding more line items changes the document structure, breaking the templates.
This lack of resilience makes templates ill-suited for modern data extraction tasks.
What are features that an enterprise data extraction solution would need to have?
An enterprise ready data extraction should enable companies to
- run the solution on premise/private cloud/public cloud depending on their technical requirements or data privacy guidelines
- automate downstream processes. Or else the high quality extracted data will need to be manually processed, limiting the level of automation
- enable continuous learning with an intuitive human-in-the-loop interface
With this in mind, we have built Hypatos and are working to improve it. Feel free to