Azure AI Tuturial 7 - OCR Và Form Recognizer
Nhận diện ký tự (OCR) là một trong những thách thức của Computer Vision. Hiện đang sử dụng rất rộng rãi trong nhiều lĩnh vực ví dụ như ghi chú tài liệu từ hình ảnh hay video; Số hóa các biểu mẫu, hóa đơn ; scan tài liệu từ máy in, nhận diện ký tự viết tay….
Azure Cognitive service cung cấp dịch vụ OCR gồm OCR API (dùng chính cho ký tự nhỏ, xuất hiện trên hình ảnh) và READ API (optimized cho documents).
Bài này chúng ta sẽ thực hiện:
- Tạo Cognitive Service Resource
- Tạo notebook, install thư viện cognitive service
- Sử dụng OCR API cho hình ảnh
- Sử dụng READ API
- Form Recognizer
Let's start
1. Tạo Cognitive Service Resource
-
Xem lại Azure AI Tutotial 6
2. Tạo notebook và install thư viện
- Intall thư viện cognive computer vision
! pip install azure-cognitiveservices-vision-computervision
- Nhập thông tin key
cog_key = '0afbab280c9f42e589143a683845a0c5'
cog_endpoint = 'https://mvcognitiveservice.cognitiveservices.azure.com/'
3. Sử dụng OCR API cho hình ảnh
from azure.cognitiveservices.vision.computervision import ComputerVisionClient
from msrest.authentication import CognitiveServicesCredentials
import matplotlib.pyplot as plt
from PIL import Image, ImageDraw
import os
%matplotlib inline
# Get a client for the computer vision service
computervision_client = ComputerVisionClient(cog_endpoint, CognitiveServicesCredentials(cog_key))
# Read the image file
image_path = os.path.join('data', 'ocr', anh01.jpg')
image_stream = open(image_path, "rb")
# Use the Computer Vision service to find text in the image
read_results = computervision_client.recognize_printed_text_in_stream(image_stream)
# Process the text line by line
for region in read_results.regions:
for line in region.lines:
# Read the words in the line of text
line_text = ''
for word in line.words:
line_text += word.text + ' '
print(line_text.rstrip())
# Open image to display it.
fig = plt.figure(figsize=(7, 7))
img = Image.open(image_path)
draw = ImageDraw.Draw(img)
plt.axis('off')
plt.imshow(img)
Input:
Output
Microsoft Certified Azure Al Engineer Associate CAO MINH VINH Has successfully completed the requirements to be recognized as a Microsoft Certified: Azure Al Engineer Associate. Date Of achievement: November 02, 2020 Valid until: November 02, 2022 Microsoft Microsoft CERTIFIED ASSOCIATE Satya Nadella Chief Executive Officer Certification number. H559-6562
- Vẽ bodingbox cho hình ảnh với code
fig = plt.figure(figsize=(7, 7))
img = Image.open(image_path)
draw = ImageDraw.Draw(img)
# Process the text line by line
for region in read_results.regions:
for line in region.lines:
# Show the position of the line of text
l,t,w,h = list(map(int, line.bounding_box.split(',')))
draw.rectangle(((l,t), (l+w, t+h)), outline='magenta', width=5)
# Read the words in the line of text
line_text = ''
for word in line.words:
line_text += word.text + ' '
print(line_text.rstrip())
# Show the image with the text locations highlighted
plt.axis('off')
plt.imshow(img)
4. Sử dụng READ API
from azure.cognitiveservices.vision.computervision import ComputerVisionClient
from azure.cognitiveservices.vision.computervision.models import OperationStatusCodes
from msrest.authentication import CognitiveServicesCredentials
import matplotlib.pyplot as plt
from PIL import Image
import time
import os
%matplotlib inline
# Read the image file
image_path = os.path.join('data', 'ocr', 'letter.jpg')
image_stream = open(image_path, "rb")
# Get a client for the computer vision service
computervision_client = ComputerVisionClient(cog_endpoint, CognitiveServicesCredentials(cog_key))
# Submit a request to read printed text in the image and get the operation ID
read_operation = computervision_client.read_in_stream(image_stream,
raw=True)
operation_location = read_operation.headers["Operation-Location"]
operation_id = operation_location.split("/")[-1]
# Wait for the asynchronous operation to complete
while True:
read_results = computervision_client.get_read_result(operation_id)
if read_results.status not in [OperationStatusCodes.running]:
break
time.sleep(1)
# If the operation was successfuly, process the text line by line
if read_results.status == OperationStatusCodes.succeeded:
for result in read_results.analyze_result.read_results:
for line in result.lines:
print(line.text)
# Open image and display it.
print('\n')
fig = plt.figure(figsize=(12,12))
img = Image.open(image_path)
plt.axis('off')
plt.imshow(img)
Input là lá thư:
Output:
January 23rd 2020 For the attention of: The manager Northwind Traders 123 Any Street Bellevue, WA Dear Sir or Madam, I am writing to thank you for the fantastic service I received at your store on January 20th. The store assistant who helped me was extremely pleasant and attentive; and took the time to find all of the fresh produce I needed. I've always found the quality of the produce in your store to be high, and the prices to be competitive; and the helpfulness of your employees is another reason I will continue to remain a loyal Northwind Traders customer. Sincerely, A customer A. Customer
5. Form Recognizer
- Sử dung lại key và Endpoint Cognitive
- Install thư viên Form Recognizer
! pip install azure_ai_formrecognizer
- Sử dụng form recognizer đê detect form này
import os
from azure.ai.formrecognizer import FormRecognizerClient
from azure.core.credentials import AzureKeyCredential
# Create a client for the form recognizer service
form_recognizer_client = FormRecognizerClient(endpoint=form_endpoint, credential=AzureKeyCredential(form_key))
try:
print("Analyzing receipt...")
# Get the receipt image file
image_path = os.path.join('data', 'form-receipt', 'receipt.jpg')
# Submit the file data to form recognizer
with open(image_path, "rb") as f:
analyze_receipt = form_recognizer_client.begin_recognize_receipts(receipt=f)
# Get the results
receipt_data = analyze_receipt.result()
# Print the extracted data for the first (and only) receipt
receipt = receipt_data[0]
receipt_type = receipt.fields.get("ReceiptType")
if receipt_type:
print("Receipt Type: {}".format(receipt_type.value))
merchant_address = receipt.fields.get("MerchantAddress")
if merchant_address:
print("Merchant Address: {}".format(merchant_address.value))
merchant_phone = receipt.fields.get("MerchantPhoneNumber")
if merchant_phone:
print("Merchant Phone: {}".format(merchant_phone.value))
transaction_date = receipt.fields.get("TransactionDate")
if transaction_date:
print("Transaction Date: {}".format(transaction_date.value))
print("Receipt items:")
items = receipt.fields.get("Items")
if items:
for idx, item in enumerate(receipt.fields.get("Items").value):
print("\tItem #{}".format(idx+1))
item_name = item.value.get("Name")
if item_name:
print("\t - Name: {}".format(item_name.value))
item_total_price = item.value.get("TotalPrice")
if item_total_price:
print("\t - Price: {}".format(item_total_price.value))
subtotal = receipt.fields.get("Subtotal")
if subtotal:
print("Subtotal: {} ".format(subtotal.value))
tax = receipt.fields.get("Tax")
if tax:
print("Tax: {}".format(tax.value))
total = receipt.fields.get("Total")
if total:
print("Total: {}".format(total.value))
except Exception as ex:
print('Error:', ex)
Input:
Output:
Analyzing receipt… Receipt Type: Itemized Merchant Address: 123 Main Street Merchant Phone: +15551234567 Transaction Date: 2020-02-17 Receipt items: Item #1 - Name: Apple - Price: 0.9 Item #2 - Name: Orange - Price: 0.8 Subtotal: 1.7 Tax: 0.17 Total: 1.87