How many samples are needed for adding a document to the Database?

Why a Single Sample Is Not Always Enough to Add a Document to the Database?

We’re always happy to add a new document to the database, even if you can provide only one sample. However, it’s important to keep in mind that this greatly increases the risk of errors during later processing. The reason is simple: one example gives a very limited view of how the document actually appears in real-life conditions.

Why Multiple Document Samples Are Needed?

The system doesn’t store the document as a static image. Instead, it learns to recognize it under different conditions: from various owners, with different wear and tear, lighting, series variations, and changes in printed elements.

One sample represents just a single scenario. Several samples, on the other hand, help the algorithms confidently distinguish consistent template elements from those that may vary, correctly handle different fonts, shades, backgrounds, stamp variations, and other features. As a result, the Document Reader SDK becomes more accurate and reliable.

Images taken in ultraviolet and infrared light allow the system to detect things that aren’t visible under normal lighting—security elements, paper structure, marker placement, and more. This additional information significantly improves recognition quality and helps ensure the document is processed correctly in any environment.

How Many Document Samples Are Optimal for Successful Addition?

Standardized Documents (ID1/ID2/ID3):

To add an ID, passport, or any other standardized ID1, ID2, or ID3 document, we typically need 3–5 samples. This is because such documents may include optional fields; personal data may appear in one or multiple lines; and different print batches may have slight variations in color or design.

Non-Standard Document Sizes:

Adding documents that don’t match common standards is more challenging for our SDK. For this reason, we need 3–5 samples, and it is essential to include the document’s physical size in millimeters. Without this information, we can’t guarantee proper cropping, document type classification, or accurate extraction of text and graphic fields.

A4-Format Documents:

For A4-format documents that don’t follow a fixed template, a larger number of samples is needed. These documents often vary from sample to sample in terms of layout, barcodes, fonts, and font sizes. More examples allow us to establish a consistent layout of information fields, train OCR algorithms to recognize different fonts, and ensure smooth processing.

Sample Requirements:

To achieve a high success rate in document processing, all samples must meet the image quality requirements described in the corresponding article.

Was this article helpful?
0 out of 0 found this helpful

Comments

0 comments

Please sign in to leave a comment.

Articles in this section

See more