Ask Your Question
2

forum related question

asked 2018-09-14 02:48:51 -0600

holger gravatar image

Hello,

For one of my play projects i want to train a question and answer model (text classification) based on the opencv forum. Is there any way to get a copy of the database of the opencv forum or another way to get the posting data without writing a crawler?

Any information is welcome. Just a little joke as its friday and i feel like it - i want to: "Fight natural dumbness with artifical intelligence"

edit retag flag offensive close merge delete

1 answer

Sort by ยป oldest newest most voted
1

answered 2018-09-14 03:18:45 -0600

berak gravatar image

updated 2018-09-14 03:31:26 -0600

that's a cute idea !

edit flag offensive delete link more

Comments

1

That you berak -again- ! I was thinking about rss too - but was too lazy to search :-( The api approach looks like the thing i am looking for.

Well picking the right answer is the labeling part a human needs to do. Its like that one guy who was asking "need software to draw automatic bounding box for xxx" If you have such a software - you have the thing you want to train :-) I found this very funny(have some strange humor).

I will try to semi automate this (pick upvoted comments or answers and review manually) If this is working correctly i will let you know.

holger gravatar imageholger ( 2018-09-14 03:37:35 -0600 )edit

i remember trying something similar (spam/nospam) for the moderation posts, unfortunately i lost it.

also a good way to play with word2vec, glove, etc (imho)

berak gravatar imageberak ( 2018-09-14 03:41:40 -0600 )edit

yes - word vectorization/bag of words/sequence of words thats exactly what i am trying to learn right now and i always need soem real live examples to try out. Funny thing is that they also use a 128- floating point representation called "word embedding"

This is somehow very similar to facenet. So representing data in such a way seems to be a common approach. I still find the whole machine learning area not easy but it starts to make fun after 4 month.

holger gravatar imageholger ( 2018-09-14 03:53:43 -0600 )edit

yea, similar embedding. (though you can use any length you want with word2vec)

berak gravatar imageberak ( 2018-09-14 04:02:33 -0600 )edit

Question Tools

1 follower

Stats

Asked: 2018-09-14 02:48:51 -0600

Seen: 561 times

Last updated: Sep 14 '18