← Back to Blog
Developer

Building a Hindi Document Scanner with Python and an OCR API

BharatOCR Team6 min read

Building a Hindi Document Scanner with Python and an OCR API

A Hindi document scanner Python script does not need to be complicated. If you can send an HTTP request, you can extract Devanagari text from any document. No ML libraries to install, no model weights to download, no GPU required.

We will build a complete working script that reads a Hindi document image, sends it to BharatOCR's OCR API, and prints the extracted text. The whole thing fits in under 20 lines of code.

What You Will Need

Before we start, make sure you have:

  • Python 3.8+ installed
  • The requests library (pip install requests)
  • A BharatOCR API key (sign up at bharatocr.com — you get 3 free pages)
  • A Hindi document image (JPEG, PNG, PDF, TIFF, or BMP)

That is it. No TensorFlow, no PyTorch, no OpenCV. The OCR processing happens on BharatOCR's servers, so your local machine just needs to send the image and receive the results.

The Complete Hindi Document Scanner Script

Here is the full working script:

import base64
import requests

API_URL = "https://api.bharatocr.com/api/v1/ocr"
API_KEY = "boc_your_api_key_here"

# Read and encode the document image
with open("hindi_document.jpg", "rb") as f:
    image_data = base64.b64encode(f.read()).decode("utf-8")

# Send to BharatOCR API
response = requests.post(
    API_URL,
    headers={"Authorization": f"Bearer {API_KEY}"},
    json={"image": image_data, "language": "hi"},
    timeout=30,
)
response.raise_for_status()

# Extract and print the text
result = response.json()
for block in result["data"]["text_blocks"]:
    print(f"[{block['confidence']:.2f}] {block['text']}")

That is 17 lines including blank lines. Let us break down what each part does.

Try BharatOCR Free

95%+ accuracy on Hindi documents. First 3 pages free, no credit card.

Start Free

Step-by-Step Breakdown

Reading the Image File

with open("hindi_document.jpg", "rb") as f:
    image_data = base64.b64encode(f.read()).decode("utf-8")

We read the image file in binary mode and encode it as a base64 string. Base64 encoding converts binary data into ASCII text, which is safe to include in a JSON request body. The .decode("utf-8") converts the bytes object to a regular string.

This works with any image format BharatOCR supports: JPEG, PNG, TIFF, BMP, or even PDF files.

Sending the API Request

response = requests.post(
    API_URL,
    headers={"Authorization": f"Bearer {API_KEY}"},
    json={"image": image_data, "language": "hi"},
    timeout=30,
)
response.raise_for_status()

We send a POST request to /api/v1/ocr with the API key in the Authorization header. The request body contains the base64-encoded image and the language code ("hi" for Hindi).

The timeout=30 prevents the script from hanging indefinitely if there is a network issue. raise_for_status() throws an exception if the API returns an error (4xx or 5xx status code) instead of silently failing.

Parsing the Response

result = response.json()
for block in result["data"]["text_blocks"]:
    print(f"[{block['confidence']:.2f}] {block['text']}")

The API returns JSON with the extracted text organized into blocks. Each block has the recognized text, a confidence score (0.0 to 1.0), and the bounding box coordinates on the image. We print each block with its confidence score.

Adding Error Handling

The basic script works, but a production Hindi document scanner Python script needs proper error handling:

import base64
import sys
import requests

API_URL = "https://api.bharatocr.com/api/v1/ocr"
API_KEY = "boc_your_api_key_here"


def extract_text(image_path: str) -> list[dict]:
    """Extract Hindi text from a document image."""
    try:
        with open(image_path, "rb") as f:
            image_data = base64.b64encode(f.read()).decode("utf-8")
    except FileNotFoundError:
        print(f"Error: File not found — {image_path}")
        sys.exit(1)

    try:
        response = requests.post(
            API_URL,
            headers={"Authorization": f"Bearer {API_KEY}"},
            json={"image": image_data, "language": "hi"},
            timeout=30,
        )
        response.raise_for_status()
    except requests.exceptions.Timeout:
        print("Error: API request timed out. Try again.")
        sys.exit(1)
    except requests.exceptions.HTTPError as e:
        print(f"Error: API returned {e.response.status_code}")
        if e.response.status_code == 401:
            print("Check your API key.")
        elif e.response.status_code == 429:
            print("Rate limit exceeded. Wait and retry.")
        sys.exit(1)

    result = response.json()
    return result["data"]["text_blocks"]


if __name__ == "__main__":
    if len(sys.argv) < 2:
        print("Usage: python scanner.py <image_path>")
        sys.exit(1)

    blocks = extract_text(sys.argv[1])
    full_text = "\n".join(b["text"] for b in blocks)
    print(full_text)

Now you can run it from the command line:

python scanner.py hindi_document.jpg

This version handles missing files, timeouts, authentication errors, and rate limits gracefully.

Tips for Better OCR Accuracy

The quality of your input image directly affects OCR accuracy. Here are practical tips for your Hindi document scanner Python project.

Resolution Matters

Aim for 300 DPI when scanning documents. Phone cameras usually produce sufficient resolution, but if you are processing old or faded documents, higher resolution helps. Anything below 150 DPI will noticeably hurt accuracy.

Straighten the Image

Skewed documents reduce accuracy. If you are capturing documents with a phone camera, try to keep the camera perpendicular to the page. For programmatic deskewing, OpenCV's getRotationMatrix2D works well:

import cv2
import numpy as np

def deskew(image_path: str) -> np.ndarray:
    img = cv2.imread(image_path, cv2.IMREAD_GRAYSCALE)
    coords = np.column_stack(np.where(img > 0))
    angle = cv2.minAreaRect(coords)[-1]
    if angle < -45:
        angle = -(90 + angle)
    else:
        angle = -angle
    h, w = img.shape
    center = (w // 2, h // 2)
    matrix = cv2.getRotationMatrix2D(center, angle, 1.0)
    return cv2.warpAffine(img, matrix, (w, h),
                          flags=cv2.INTER_CUBIC,
                          borderMode=cv2.BORDER_REPLICATE)

Improve Contrast

Faded or low-contrast documents benefit from simple contrast enhancement before sending to OCR. A basic threshold or adaptive threshold in OpenCV can make a significant difference:

img = cv2.imread("faded_document.jpg", cv2.IMREAD_GRAYSCALE)
enhanced = cv2.adaptiveThreshold(
    img, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C,
    cv2.THRESH_BINARY, 11, 2
)
cv2.imwrite("enhanced.jpg", enhanced)

Remove Noise

Stamps, stains, and background patterns confuse OCR engines. A light Gaussian blur followed by thresholding can clean up noisy documents.

Processing Multiple Documents

For batch processing, loop through a directory of images:

from pathlib import Path

input_dir = Path("documents")
for image_file in input_dir.glob("*.jpg"):
    blocks = extract_text(str(image_file))
    text = "\n".join(b["text"] for b in blocks)

    output_file = image_file.with_suffix(".txt")
    output_file.write_text(text, encoding="utf-8")
    print(f"Processed: {image_file.name}")

This creates a .txt file alongside each image with the extracted Hindi text.

Checking Your API Usage

Keep track of how many pages you have processed with the usage endpoint:

usage = requests.get(
    "https://api.bharatocr.com/api/v1/usage",
    headers={"Authorization": f"Bearer {API_KEY}"},
).json()
print(f"Pages used: {usage['data']['pages_used']}")

How BharatOCR Helps

Building a Hindi document scanner in Python is straightforward with BharatOCR because the hard part — accurate Devanagari text recognition — is handled by our API.

BharatOCR uses PaddleOCR PP-OCRv5, optimized for Hindi and Devanagari script. You get 95%+ accuracy on printed Hindi text with sub-2-second processing per page. The API accepts JPEG, PNG, PDF, TIFF, and BMP, so you do not need to worry about format conversion.

For developers, the integration is minimal: one HTTP POST, one JSON response. No SDKs to install, no dependencies to manage, no model updates to track. Your code stays simple while the OCR engine handles the complexity.

Start with 3 free pages to test your integration. Pay-as-you-go is Rs 5 per page, or choose a monthly plan from Rs 999 to Rs 9,999 for production workloads. BharatOCR is built by Meridian Intelligence Pvt. Ltd.

Try BharatOCR Today

Extract text from Hindi documents with 95%+ accuracy. Start free.

Related Posts