Ask Your Question
6

How to extract tables from an image?

asked 2015-06-10 15:58:06 -0600

I luv OpenCV gravatar image

updated 2020-11-30 03:23:50 -0600

I've converted some pdf pages into images that contains tables.I want to crop those tables from the images and save as separate images.I'm new to Open CV and any guidance will be helpful.I want to know which algorithms should i use and how to do it.If any tutorials are there please post the links.I'm using OpenCV 3.0.0 and visual studio 2013.

image description

In this sample image, I want to extract all the images. Thank You

edit retag flag offensive close merge delete

Comments

A table in most cases consists of many vertical and horizontal lines so you could use sobel filters or a hough transformation to find the areas in which the tables are located.

FooBar gravatar imageFooBar ( 2015-06-11 02:50:46 -0600 )edit

would be easy/allowed to provide a sample with the actual image where you want to detect the tables?

theodore gravatar imagetheodore ( 2015-06-13 20:06:02 -0600 )edit

Okay, I've uploaded a real image.I want to extract those 2 tables(or if there are more than 2 tables, then all those tables.)

I luv OpenCV gravatar imageI luv OpenCV ( 2015-06-14 02:21:00 -0600 )edit

can any one help me .how to read data in tables

vanitha410 gravatar imagevanitha410 ( 2019-12-25 04:25:08 -0600 )edit

2 answers

Sort by ยป oldest newest most voted
13

answered 2015-06-14 07:12:53 -0600

theodore gravatar image

As the others proposed finding the horizontal and vertical lines seems to be a nice way to go. Below you can find such a solution. In case you have any question feel free to ask, though I have added comments through my code so it should not be hard to follow.

#include <iostream>
#include <opencv2/opencv.hpp>

using namespace std;
using namespace cv;

int main()
{
    // Load source image
    string filename = "table.jpg";
    Mat src = imread(filename);

    // Check if image is loaded fine
    if(!src.data)
        cerr << "Problem loading image!!!" << endl;

//    // Show source image
//    imshow("src", src);

    // resizing for practical reasons
    Mat rsz;
    Size size(800, 900);
    resize(src, rsz, size);

    imshow("rsz", rsz);

    // Transform source image to gray if it is not
    Mat gray;

    if (rsz.channels() == 3)
    {
        cvtColor(rsz, gray, CV_BGR2GRAY);
    }
    else
    {
        gray = rsz;
    }

    // Show gray image
    imshow("gray", gray);

    // Apply adaptiveThreshold at the bitwise_not of gray, notice the ~ symbol
    Mat bw;
    adaptiveThreshold(~gray, bw, 255, CV_ADAPTIVE_THRESH_MEAN_C, THRESH_BINARY, 15, -2);

    // Show binary image
    imshow("binary", bw);

image description

    // Create the images that will use to extract the horizonta and vertical lines
    Mat horizontal = bw.clone();
    Mat vertical = bw.clone();

    int scale = 15; // play with this variable in order to increase/decrease the amount of lines to be detected

    // Specify size on horizontal axis
    int horizontalsize = horizontal.cols / scale;

    // Create structure element for extracting horizontal lines through morphology operations
    Mat horizontalStructure = getStructuringElement(MORPH_RECT, Size(horizontalsize,1));

    // Apply morphology operations
    erode(horizontal, horizontal, horizontalStructure, Point(-1, -1));
    dilate(horizontal, horizontal, horizontalStructure, Point(-1, -1));
//    dilate(horizontal, horizontal, horizontalStructure, Point(-1, -1)); // expand horizontal lines

    // Show extracted horizontal lines
    imshow("horizontal", horizontal);

image description

    // Specify size on vertical axis
    int verticalsize = vertical.rows / scale;

    // Create structure element for extracting vertical lines through morphology operations
    Mat verticalStructure = getStructuringElement(MORPH_RECT, Size( 1,verticalsize));

    // Apply morphology operations
    erode(vertical, vertical, verticalStructure, Point(-1, -1));
    dilate(vertical, vertical, verticalStructure, Point(-1, -1));
//    dilate(vertical, vertical, verticalStructure, Point(-1, -1)); // expand vertical lines

    // Show extracted vertical lines
    imshow("vertical", vertical);

image description

    // create a mask which includes the tables
    Mat mask = horizontal + vertical;
    imshow("mask", mask);

image description

    // find the joints between the lines of the tables, we will use this information in order to descriminate tables from pictures (tables will contain more than 4 joints while a picture only 4 (i.e. at the corners))
    Mat joints;
    bitwise_and(horizontal, vertical, joints);
    imshow("joints", joints);

image description

    // Find external contours from the mask, which most probably will belong to tables or to images
    vector<Vec4i> hierarchy;
    std::vector<std::vector<cv::Point> > contours;
    cv::findContours(mask, contours, hierarchy, CV_RETR_EXTERNAL, CV_CHAIN_APPROX_SIMPLE, Point(0, 0));

    vector<vector<Point> > contours_poly( contours.size() );
    vector<Rect> boundRect( contours.size() );
    vector<Mat> rois;

    for (size_t i = 0; i < contours.size(); i++)
    {
        // find the area of each contour
        double area = contourArea(contours[i]);

//        // filter individual lines of blobs that might exist and they do not represent a table
        if(area < 100) // value is randomly chosen, you will need to find that by yourself with trial and error procedure
            continue;

        approxPolyDP( Mat(contours[i]), contours_poly ...
(more)
edit flag offensive delete link more

Comments

1

Woow!! Straight answer.And you explained it very well.Amazing.Thank You very much.You are a life saver.Thank you

I luv OpenCV gravatar imageI luv OpenCV ( 2015-06-14 11:04:37 -0600 )edit

Again a neat answer from @theodore!

StevenPuttemans gravatar imageStevenPuttemans ( 2015-06-15 09:31:32 -0600 )edit

@HeranBuddhika you're welcome, I am glad that I helped. @StevenPuttemans merci once again ;-)

theodore gravatar imagetheodore ( 2015-06-15 11:23:13 -0600 )edit

nice solution for removing lines. in many forms, there is a problem, that the letters touches the horizontal lines how to remove the horizontal lines, but preserve the letters as origin, un-damaged ?

if the letters will be damaged, it will cause OCR engines to fail

ell gravatar imageell ( 2015-06-28 04:04:45 -0600 )edit
1

@ell in that case what you can do is to find the joints of the letters with the horizontal lines apply a bitwise operation and keep that part. Another approach would be, after you found the ROI/s that corresponds to the table/s apply again the morphological operation in order to extract the vertical lines with a different scale that way you can extract the letters as well, which you can use individually for further processing through OCR. Anyway, you can open another thread with some examples and discuss it there. ;-)

theodore gravatar imagetheodore ( 2015-06-28 06:46:47 -0600 )edit
1

@ell look for example here how I extract the music notes from a music sheet

theodore gravatar imagetheodore ( 2015-06-28 07:15:51 -0600 )edit
1

I tested the the music notes,follow your example I done it with EMGU.CV and it works ok, even I did not understand exactly what is the meaning of each step. I'll try same steps on my letters and open a thread about

ell gravatar imageell ( 2015-06-28 13:23:08 -0600 )edit

Hi theodore, I opened new question: extract numbers and bound rectangle on each number

ell gravatar imageell ( 2015-06-28 16:41:50 -0600 )edit

It looks like you find contours in the matrix of joints (inside the loop). I'm wondering what this means intuitively?

rasterdetect gravatar imagerasterdetect ( 2015-11-04 15:23:30 -0600 )edit

@rasterdetect you need that because we find the joint contours for each table separetely.

theodore gravatar imagetheodore ( 2015-11-04 16:17:59 -0600 )edit
0

answered 2015-06-11 02:54:50 -0600

pklab gravatar image

updated 2015-06-11 04:54:31 -0600

I suppose you have been already registered image of the page ... I mean page alignment

If tables don't have borders it's bit hard .. you could try searching for text as standard paragraph and save as separate images all that fails your check

If table has border it should have lines at 0 or 90 degrees +/- small tolerance, than you could try:

  • template matching using an image of crossing lines as template to detect cells and 4 templates for the 4 boundary corners.
  • hough line to detect horiz and vert lines, than check/group if/which lines are compatible with a table... vert lines should have same Y in cartesian space, while horizz lines should have same X.

    Once you have table boundaries coordinates you can easily define a rect roi on your page image and save it to a new file.

edit flag offensive delete link more

Comments

At least I have an idea how to do it.template matching,hough lines,rect roi all those i can find from open cv documentation right?

I luv OpenCV gravatar imageI luv OpenCV ( 2015-06-12 22:05:59 -0600 )edit

What if there are more than one tables in one image?

I luv OpenCV gravatar imageI luv OpenCV ( 2015-06-13 12:09:36 -0600 )edit

how to read data in table

vanitha410 gravatar imagevanitha410 ( 2019-12-25 04:25:57 -0600 )edit

Question Tools

4 followers

Stats

Asked: 2015-06-10 15:58:06 -0600

Seen: 64,852 times

Last updated: Jun 14 '15