Azure AI Tuturial 7 -  OCR Và Form Recognizer

Azure AI Tuturial 7 - OCR Và Form Recognizer

Nhận diện ký tự (OCR) là một trong những thách thức của Computer Vision. Hiện đang sử dụng rất rộng rãi trong nhiều lĩnh vực ví dụ như ghi chú tài liệu từ hình ảnh hay video; Số hóa các biểu mẫu, hóa đơn ; scan tài liệu từ máy in, nhận diện ký tự viết tay….

Azure Cognitive service cung cấp dịch vụ OCR gồm OCR API (dùng chính cho ký tự nhỏ, xuất hiện trên hình ảnh)  và READ API (optimized cho documents).

Bài này chúng ta sẽ thực hiện:

  1. Tạo Cognitive Service Resource
  2. Tạo notebook, install thư viện cognitive service
  3. Sử dụng OCR API cho hình ảnh
  4. Sử dụng READ API
  5. Form Recognizer

Let's start

1. Tạo Cognitive Service Resource

2. Tạo notebook và install thư viện

  • Intall thư viện cognive computer vision

! pip install azure-cognitiveservices-vision-computervision

  • Nhập thông tin key

cog_key = '0afbab280c9f42e589143a683845a0c5'
cog_endpoint = 'https://mvcognitiveservice.cognitiveservices.azure.com/'

3. Sử dụng OCR API cho hình ảnh

from azure.cognitiveservices.vision.computervision import ComputerVisionClient
from msrest.authentication import CognitiveServicesCredentials
import matplotlib.pyplot as plt
from PIL import Image, ImageDraw
import os
%matplotlib inline

# Get a client for the computer vision service
computervision_client = ComputerVisionClient(cog_endpoint, CognitiveServicesCredentials(cog_key))

# Read the image file
image_path = os.path.join('data', 'ocr', anh01.jpg')
image_stream = open(image_path, "rb")

# Use the Computer Vision service to find text in the image
read_results = computervision_client.recognize_printed_text_in_stream(image_stream)

# Process the text line by line
for region in read_results.regions:
    for line in region.lines:

        # Read the words in the line of text
        line_text = ''
        for word in line.words:
            line_text += word.text + ' '
        print(line_text.rstrip())

# Open image to display it.
fig = plt.figure(figsize=(7, 7))
img = Image.open(image_path)
draw = ImageDraw.Draw(img)
plt.axis('off')
plt.imshow(img)

Input:

Output

Microsoft Certified Azure 
Al Engineer Associate 
CAO MINH VINH 
Has successfully completed the requirements to be recognized as a Microsoft Certified: Azure Al Engineer Associate.
Date Of achievement: November 02, 2020 
Valid until: November 02, 2022 
Microsoft
Microsoft
CERTIFIED
ASSOCIATE 
Satya Nadella Chief Executive Officer
Certification number. H559-6562

  • Vẽ bodingbox cho hình ảnh với code

fig = plt.figure(figsize=(7, 7))
img = Image.open(image_path)
draw = ImageDraw.Draw(img)

# Process the text line by line
for region in read_results.regions:
    for line in region.lines:

        # Show the position of the line of text
        l,t,w,h = list(map(int, line.bounding_box.split(',')))
        draw.rectangle(((l,t), (l+w, t+h)), outline='magenta', width=5)

        # Read the words in the line of text
        line_text = ''
        for word in line.words:
            line_text += word.text + ' '
        print(line_text.rstrip())

# Show the image with the text locations highlighted
plt.axis('off')
plt.imshow(img)

4. Sử dụng READ API

from azure.cognitiveservices.vision.computervision import ComputerVisionClient
from azure.cognitiveservices.vision.computervision.models import OperationStatusCodes
from msrest.authentication import CognitiveServicesCredentials
import matplotlib.pyplot as plt
from PIL import Image
import time
import os
%matplotlib inline

# Read the image file
image_path = os.path.join('data', 'ocr', 'letter.jpg')
image_stream = open(image_path, "rb")

# Get a client for the computer vision service
computervision_client = ComputerVisionClient(cog_endpoint, CognitiveServicesCredentials(cog_key))

# Submit a request to read printed text in the image and get the operation ID
read_operation = computervision_client.read_in_stream(image_stream,
                                                      raw=True)
operation_location = read_operation.headers["Operation-Location"]
operation_id = operation_location.split("/")[-1]

# Wait for the asynchronous operation to complete
while True:
    read_results = computervision_client.get_read_result(operation_id)
    if read_results.status not in [OperationStatusCodes.running]:
        break
    time.sleep(1)

# If the operation was successfuly, process the text line by line
if read_results.status == OperationStatusCodes.succeeded:
    for result in read_results.analyze_result.read_results:
        for line in result.lines:
            print(line.text)

# Open image and display it.
print('\n')
fig = plt.figure(figsize=(12,12))
img = Image.open(image_path)
plt.axis('off')
plt.imshow(img)

Input là lá thư:

Output:

January 23rd 2020 
For the attention of: 
The manager
Northwind Traders 
123 Any Street Bellevue, WA
Dear Sir or Madam, 
I am writing to thank you for the fantastic service I received at 
your store on January 20th. The store assistant who helped me was 
extremely pleasant and attentive; and took the time to find all of 
the fresh produce I needed. 
I've always found the quality of the produce in your store to be
high, and the prices to be competitive; and the helpfulness of your employees is another reason I will continue to remain a loyal 
Northwind Traders customer. 
Sincerely, 
A customer 
A. Customer

5. Form Recognizer

  • Sử dung lại key và Endpoint Cognitive
  • Install thư viên Form Recognizer

! pip install azure_ai_formrecognizer

  • Sử dụng form recognizer đê detect form này

import os
from azure.ai.formrecognizer import FormRecognizerClient
from azure.core.credentials import AzureKeyCredential

# Create a client for the form recognizer service
form_recognizer_client = FormRecognizerClient(endpoint=form_endpoint, credential=AzureKeyCredential(form_key))

try:
    print("Analyzing receipt...")
    # Get the receipt image file
    image_path = os.path.join('data', 'form-receipt', 'receipt.jpg')

    # Submit the file data to form recognizer
    with open(image_path, "rb") as f:
        analyze_receipt = form_recognizer_client.begin_recognize_receipts(receipt=f)
    
    # Get the results
    receipt_data = analyze_receipt.result()

    # Print the extracted data for the first (and only) receipt
    receipt = receipt_data[0]
    receipt_type = receipt.fields.get("ReceiptType")
    if receipt_type:
        print("Receipt Type: {}".format(receipt_type.value))
    merchant_address = receipt.fields.get("MerchantAddress")
    if merchant_address:
        print("Merchant Address: {}".format(merchant_address.value))
    merchant_phone = receipt.fields.get("MerchantPhoneNumber")
    if merchant_phone:
        print("Merchant Phone: {}".format(merchant_phone.value))
    transaction_date = receipt.fields.get("TransactionDate")
    if transaction_date:
        print("Transaction Date: {}".format(transaction_date.value))
    print("Receipt items:")
    items = receipt.fields.get("Items")
    if items:
        for idx, item in enumerate(receipt.fields.get("Items").value):
            print("\tItem #{}".format(idx+1))
            item_name = item.value.get("Name")
            if item_name:
                print("\t - Name: {}".format(item_name.value))
            item_total_price = item.value.get("TotalPrice")
            if item_total_price:
                print("\t - Price: {}".format(item_total_price.value))
    subtotal = receipt.fields.get("Subtotal")
    if subtotal:
        print("Subtotal: {} ".format(subtotal.value))
    tax = receipt.fields.get("Tax")
    if tax:
        print("Tax: {}".format(tax.value))
    total = receipt.fields.get("Total")
    if total:
        print("Total: {}".format(total.value))

except Exception as ex:
    print('Error:', ex)

Input:

Output:

Analyzing receipt…
 Receipt Type: Itemized
 Merchant Address: 123 Main Street
 Merchant Phone: +15551234567
 Transaction Date: 2020-02-17
 Receipt items:
     Item #1
      - Name: Apple
      - Price: 0.9
     Item #2
      - Name: Orange
      - Price: 0.8
 Subtotal: 1.7 
 Tax: 0.17
 Total: 1.87