HOME > Development > Intelligently Extract Text Data from Document with OCR NER

Intelligently Extract Text Data from Document with OCR NER

  • Development
  • Mar 10, 2025
SynopsisIntelligently Extract Text & Data from Document with OCR...
Intelligently Extract Text Data from Document with OCR NER  No.1

Intelligently Extract Text & Data from Document with OCR NER, available at $69.99, has an average rating of 4.65, with 92 lectures, based on 374 reviews, and has 3408 subscribers.

You will learn about Develop and Train Named Entity Recognition Model Not only Extract text from the Image but also Extract Entities from Business Card Develop Business Card Scanner like ABBY from Scratch High Level Data Preprocess Techniques for Natural Language Problem Real Time NER apps This course is ideal for individuals who are Anyone who wants to Develop Business Card Reader App or Data Scientist, Analyst, Python Develop who want to enhance skills in NLP It is particularly useful for Anyone who wants to Develop Business Card Reader App or Data Scientist, Analyst, Python Develop who want to enhance skills in NLP.

Enroll now: Intelligently Extract Text & Data from Document with OCR NER

Summary

Title: Intelligently Extract Text & Data from Document with OCR NER

Price: $69.99

Average Rating: 4.65

Number of Lectures: 92

Number of Published Lectures: 89

Number of Curriculum Items: 92

Number of Published Curriculum Objects: 89

Original Price: $129.99

Quality Status: approved

Status: Live

What You Will Learn

  • Develop and Train Named Entity Recognition Model
  • Not only Extract text from the Image but also Extract Entities from Business Card
  • Develop Business Card Scanner like ABBY from Scratch
  • High Level Data Preprocess Techniques for Natural Language Problem
  • Real Time NER apps
  • Who Should Attend

  • Anyone who wants to Develop Business Card Reader App
  • Data Scientist, Analyst, Python Develop who want to enhance skills in NLP
  • Target Audiences

  • Anyone who wants to Develop Business Card Reader App
  • Data Scientist, Analyst, Python Develop who want to enhance skills in NLP
  • Welcome to Course “Intelligently Extract Text & Data from Document with OCR NER” !!!

    In this course you will learn how to develop customized Named Entity Recognizer. The main idea of this course is to extract entities from the scanned documents like invoice, Business Card, Shipping Bill, Bill of Lading documents etc. However, for the sake of data privacy we restricted our views to Business Card. But you can use the framework explained to all kinds of financial documents. Below given is the curriculum we are following to develop the project.

    To develop this project we will use two main technologies in data science are,

    1. Computer Vision

    2. Natural Language Processing

    In Computer Vision module, we will scan the document, identify the location of text and finally extract text from the image. Then in Natural language processing, we will extract the entitles from the text and do necessary text cleaning and parse the entities form the text.

    Python Libraries used in Computer Vision Module.

  • OpenCV

  • Numpy

  • Pytesseract

  • Python Libraries used in Natural Language Processing

  • Spacy

  • Pandas

  • Regular Expression

  • String

  • As are combining two major technologies to develop the project, for the sake of easy to understand we divide the course into several stage of development.

    Stage -1:We will setup the project by doing the necessary installations and requirements.

  • Install Python

  • Install Dependencies

  • Stage -2:We will do data preparation. That is we will extract text from images using Pytesseract and also do necessary cleaning.

  • Gather Images

  • Overview on Pytesseract

  • Extract Text from all Image

  • Clean and Prepare text

  • Stage -3:We will see how to label NER data using BIO tagging.

  • Manually Labeling with BIO technique

  • B – Beginning

  • I  –  Inside

  • O – Outside

  • Stage -4: We will further clean the text and preprocess the data for to train machine learning.

  • Prepare Training Data for Spacy

  • Convert data into spacy format

  • Stage -5:With the preprocess data we will train the Named Entity model.

  • Configuring NER Model

  • Train the model

  • Stage -6: We will predict the entitles using NER and model and create data pipeline for parsing text.

  • Load Model

  • Render and Serve with Displacy

  • Draw Bounding Box on Image

  • Parse Entitles from Text

  • Finally, we will put all together and create document scanner app.

    Are you ready !!!

    Let start developing the Artificial Intelligence project.

    Course Curriculum

    Chapter 1: Introduction

    Lecture 1: Introduction

    Lecture 2: Project Plan

    Lecture 3: Project Document

    Lecture 4: Download the Resources

    Lecture 5: Facing any Issue with the Course ? Here is the solution

    Chapter 2: Project Setup

    Lecture 1: Install Python

    Lecture 2: Install Virtual Environment

    Lecture 3: Install Packages into Virtual Environment

    Lecture 4: Install Tesseract OCR & Pytesseract

    Lecture 5: Install spaCy

    Lecture 6: Test, the packages are installed

    Chapter 3: Data Preparation

    Lecture 1: Load Business Card using OpenCV & PIL

    Lecture 2: Pytesseract: Extract text from Image

    Lecture 3: Pytesseract: Tesseract Error

    Lecture 4: Pytesseract: How Pytesseract with work ?

    Lecture 5: Pytesseract: Image to text to dataframe

    Lecture 6: Pytesseract: Clean Text in Dataframe

    Lecture 7: Pytesseract: Draw Bounding Box around each word

    Lecture 8: Extract Text and Data from all Business Card

    Lecture 9: Save data in csv

    Lecture 10: Labeling

    Chapter 4: Data Preprocessing and Cleaning

    Lecture 1: Spacy Training Data Format

    Lecture 2: Load Data and convert into Pandas DataFrame

    Lecture 3: Updated Code.

    Lecture 4: Cleaning Text

    Lecture 5: Convert Data into spacy format

    Lecture 6: Testing Entities

    Lecture 7: Convert data into spacy format for all Business card text

    Lecture 8: Splitting Data into Training and Testing Set

    Chapter 5: Train Named Entity Recognition (NER) model

    Lecture 1: Spacy: Fill the Configuration

    Lecture 2: Spacy: Prepare Data

    Lecture 3: Spacy: Train NER pipeline model

    Lecture 4: Spacy: Save NER Model

    Chapter 6: Predictions

    Lecture 1: Import Required Libraries

    Lecture 2: Clean Text Function

    Lecture 3: Load Spacy NER Model

    Lecture 4: Extract Text from Image and Convert into Data Frame

    Lecture 5: Convert Data Frame into Content

    Lecture 6: Get Named Entities from model

    Lecture 7: Displacy render

    Lecture 8: Tagging Each Word

    Lecture 9: Join Label to tokens dataframe

    Lecture 10: Join token dataframe with Pytesseract data

    Lecture 11: Bounding Box and Tagging Predicted Entities

    Lecture 12: Combine the BIO information

    Lecture 13: Bounding Box

    Lecture 14: Parsing Function

    Lecture 15: Testing

    Lecture 16: Parse Entitles

    Lecture 17: Predictions Function

    Lecture 18: Final Prediction Pipeline

    Chapter 7: Improve Model Performance

    Lecture 1: Ideas to Improve model accuracy

    Lecture 2: Version-2 model framework: Data Preprocessing

    Lecture 3: Train Version 2 model

    Lecture 4: Get Predictions from the model

    Chapter 8: Document Scanner

    Lecture 1: Download the Resources

    Lecture 2: What and Why Document Scanner in OpenCV ?

    Lecture 3: Setup and Read Image

    Lecture 4: Resize Image with same aspect ratio

    Lecture 5: Edge Detection (Enhance, Blur and Canny) to Document

    Lecture 6: Dilate Edges with morphological transform

    Lecture 7: Find Four Point Countours (Identify Location of document)

    Lecture 8: Apply Wrap transform and crop only document

    Lecture 9: Document Scanner Function: Putting All together

    Lecture 10: Magic Color to Image

    Lecture 11: Integrate NER Predictions

    Chapter 9: Document Scanner Web App

    Lecture 1: What will you Develop ?

    Lecture 2: Download Web App

    Lecture 3: Setting Up Web App Project

    Lecture 4: Install VS Code

    Lecture 5: Install Flask

    Lecture 6: First Flask App

    Lecture 7: Run HTML file with Flask server

    Lecture 8: Our Web App design steps

    Lecture 9: Step-1: Design Page: Create Navigation Bar in HTML

    Lecture 10: Step-1: Create About Page

    Lecture 11: Step-2: Create HTML form to Upload Image or File in HTML

    Lecture 12: Step-3: How to Predict document coordinates with Python in Flask

    Lecture 13: Step-2: Upload and save image Backend : create settings.py

    Lecture 14: Step-2: Upload and save image Backend: save image from HTML form

    Lecture 15: Step-3: Document Scanning

    Lecture 16: Adjust coordinates of document using JavaScript

    Lecture 17: Wrap and Crop the document and save the image

    Lecture 18: Get Predictions

    Lecture 19: Design Predictions page

    Lecture 20: Display results in table

    Lecture 21: Final

    Chapter 10: Appendix

    Lecture 1: Limitations of Pytesseract

    Chapter 11: BONUS

    Lecture 1: Bonus Lecture: Next Steps

    Instructors

  • Intelligently Extract Text Data from Document with OCR NER  No.2
    G Sudheer
    Instructor
  • Intelligently Extract Text Data from Document with OCR NER  No.3
    datascience Anywhere
    Team of Engineers
  • Intelligently Extract Text Data from Document with OCR NER  No.4
    Brightshine Learn
    Instructor Team
  • Rating Distribution

  • 1 stars: 2 votes
  • 2 stars: 5 votes
  • 3 stars: 32 votes
  • 4 stars: 115 votes
  • 5 stars: 220 votes
  • Frequently Asked Questions

    How long do I have access to the course materials?

    You can view and review the lecture materials indefinitely, like an on-demand channel.

    Can I take my courses with me wherever I go?

    Definitely! If you have an internet connection, courses on Udemy are available on any device at any time. If you don’t have an internet connection, some instructors also let their students download course lectures. That’s up to the instructor though, so make sure you get on their good side!