How to extract tables from an image?

answered 2015-06-14 07:12:53 -0600

4133 ●12 ●41 ●101

As the others proposed finding the horizontal and vertical lines seems to be a nice way to go. Below you can find such a solution. In case you have any question feel free to ask, though I have added comments through my code so it should not be hard to follow.

#include <iostream>
#include <opencv2/opencv.hpp>

using namespace std;
using namespace cv;

int main()
{
    // Load source image
    string filename = "table.jpg";
    Mat src = imread(filename);

    // Check if image is loaded fine
    if(!src.data)
        cerr << "Problem loading image!!!" << endl;

//    // Show source image
//    imshow("src", src);

    // resizing for practical reasons
    Mat rsz;
    Size size(800, 900);
    resize(src, rsz, size);

    imshow("rsz", rsz);

    // Transform source image to gray if it is not
    Mat gray;

    if (rsz.channels() == 3)
    {
        cvtColor(rsz, gray, CV_BGR2GRAY);
    }
    else
    {
        gray = rsz;
    }

    // Show gray image
    imshow("gray", gray);

    // Apply adaptiveThreshold at the bitwise_not of gray, notice the ~ symbol
    Mat bw;
    adaptiveThreshold(~gray, bw, 255, CV_ADAPTIVE_THRESH_MEAN_C, THRESH_BINARY, 15, -2);

    // Show binary image
    imshow("binary", bw);

image description

    // Create the images that will use to extract the horizonta and vertical lines
    Mat horizontal = bw.clone();
    Mat vertical = bw.clone();

    int scale = 15; // play with this variable in order to increase/decrease the amount of lines to be detected

    // Specify size on horizontal axis
    int horizontalsize = horizontal.cols / scale;

    // Create structure element for extracting horizontal lines through morphology operations
    Mat horizontalStructure = getStructuringElement(MORPH_RECT, Size(horizontalsize,1));

    // Apply morphology operations
    erode(horizontal, horizontal, horizontalStructure, Point(-1, -1));
    dilate(horizontal, horizontal, horizontalStructure, Point(-1, -1));
//    dilate(horizontal, horizontal, horizontalStructure, Point(-1, -1)); // expand horizontal lines

    // Show extracted horizontal lines
    imshow("horizontal", horizontal);

image description

    // Specify size on vertical axis
    int verticalsize = vertical.rows / scale;

    // Create structure element for extracting vertical lines through morphology operations
    Mat verticalStructure = getStructuringElement(MORPH_RECT, Size( 1,verticalsize));

    // Apply morphology operations
    erode(vertical, vertical, verticalStructure, Point(-1, -1));
    dilate(vertical, vertical, verticalStructure, Point(-1, -1));
//    dilate(vertical, vertical, verticalStructure, Point(-1, -1)); // expand vertical lines

    // Show extracted vertical lines
    imshow("vertical", vertical);

image description

    // create a mask which includes the tables
    Mat mask = horizontal + vertical;
    imshow("mask", mask);

image description

    // find the joints between the lines of the tables, we will use this information in order to descriminate tables from pictures (tables will contain more than 4 joints while a picture only 4 (i.e. at the corners))
    Mat joints;
    bitwise_and(horizontal, vertical, joints);
    imshow("joints", joints);

image description

    // Find external contours from the mask, which most probably will belong to tables or to images
    vector<Vec4i> hierarchy;
    std::vector<std::vector<cv::Point> > contours;
    cv::findContours(mask, contours, hierarchy, CV_RETR_EXTERNAL, CV_CHAIN_APPROX_SIMPLE, Point(0, 0));

    vector<vector<Point> > contours_poly( contours.size() );
    vector<Rect> boundRect( contours.size() );
    vector<Mat> rois;

    for (size_t i = 0; i < contours.size(); i++)
    {
        // find the area of each contour
        double area = contourArea(contours[i]);

//        // filter individual lines of blobs that might exist and they do not represent a table
        if(area < 100) // value is randomly chosen, you will need to find that by yourself with trial and error procedure
            continue;

        approxPolyDP( Mat(contours[i]), contours_poly ...

(more)

edit flag offensive delete link

Comments

Woow!! Straight answer.And you explained it very well.Amazing.Thank You very much.You are a life saver.Thank you

I luv OpenCV ( 2015-06-14 11:04:37 -0600 )edit

Again a neat answer from @theodore!

StevenPuttemans ( 2015-06-15 09:31:32 -0600 )edit

@HeranBuddhika you're welcome, I am glad that I helped. @StevenPuttemans merci once again ;-)

theodore ( 2015-06-15 11:23:13 -0600 )edit

nice solution for removing lines. in many forms, there is a problem, that the letters touches the horizontal lines how to remove the horizontal lines, but preserve the letters as origin, un-damaged ?

if the letters will be damaged, it will cause OCR engines to fail

ell ( 2015-06-28 04:04:45 -0600 )edit

@ell in that case what you can do is to find the joints of the letters with the horizontal lines apply a bitwise operation and keep that part. Another approach would be, after you found the ROI/s that corresponds to the table/s apply again the morphological operation in order to extract the vertical lines with a different scale that way you can extract the letters as well, which you can use individually for further processing through OCR. Anyway, you can open another thread with some examples and discuss it there. ;-)

theodore ( 2015-06-28 06:46:47 -0600 )edit

@ell look for example here how I extract the music notes from a music sheet

theodore ( 2015-06-28 07:15:51 -0600 )edit

I tested the the music notes,follow your example I done it with EMGU.CV and it works ok, even I did not understand exactly what is the meaning of each step. I'll try same steps on my letters and open a thread about

ell ( 2015-06-28 13:23:08 -0600 )edit

Hi theodore, I opened new question: extract numbers and bound rectangle on each number

ell ( 2015-06-28 16:41:50 -0600 )edit

It looks like you find contours in the matrix of joints (inside the loop). I'm wondering what this means intuitively?

rasterdetect ( 2015-11-04 15:23:30 -0600 )edit

@rasterdetect you need that because we find the joint contours for each table separetely.

theodore ( 2015-11-04 16:17:59 -0600 )edit

see more comments

answered 2015-06-11 02:54:50 -0600

pklab

4284 ●7 ●30 ●60

updated 2015-06-11 04:54:31 -0600

I suppose you have been already registered image of the page ... I mean page alignment

If tables don't have borders it's bit hard .. you could try searching for text as standard paragraph and save as separate images all that fails your check

If table has border it should have lines at 0 or 90 degrees +/- small tolerance, than you could try:

template matching using an image of crossing lines as template to detect cells and 4 templates for the 4 boundary corners.
hough line to detect horiz and vert lines, than check/group if/which lines are compatible with a table... vert lines should have same Y in cartesian space, while horizz lines should have same X.

Once you have table boundaries coordinates you can easily define a rect roi on your page image and save it to a new file.

edit flag offensive delete link

Comments

At least I have an idea how to do it.template matching,hough lines,rect roi all those i can find from open cv documentation right?

I luv OpenCV ( 2015-06-12 22:05:59 -0600 )edit

What if there are more than one tables in one image?

I luv OpenCV ( 2015-06-13 12:09:36 -0600 )edit

how to read data in table

vanitha410 ( 2019-12-25 04:25:57 -0600 )edit

add a comment

A table in most cases consists of many vertical and horizontal lines so you could use sobel filters or a hough transformation to find the areas in which the tables are located.

FooBar ( 2015-06-11 02:50:46 -0600 )edit

would be easy/allowed to provide a sample with the actual image where you want to detect the tables?

theodore ( 2015-06-13 20:06:02 -0600 )edit

Okay, I've uploaded a real image.I want to extract those 2 tables(or if there are more than 2 tables, then all those tables.)

I luv OpenCV ( 2015-06-14 02:21:00 -0600 )edit

can any one help me .how to read data in tables

vanitha410 ( 2019-12-25 04:25:08 -0600 )edit

How to extract tables from an image?

Comments

2 answers

Comments

Comments

Links

Question Tools

Stats

Related questions

How to extract tables from an image? edit

Comments

2 answers

Comments

Comments

Links

Question Tools

Stats

Related questions

How to extract tables from an image?