[Boards: 3 / a / aco / adv / an / asp / b / bant / biz / c / can / cgl / ck / cm / co / cock / d / diy / e / fa / fap / fit / fitlit / g / gd / gif / h / hc / his / hm / hr / i / ic / int / jp / k / lgbt / lit / m / mlp / mlpol / mo / mtv / mu / n / news / o / out / outsoc / p / po / pol / qa / qst / r / r9k / s / s4s / sci / soc / sp / spa / t / tg / toy / trash / trv / tv / u / v / vg / vint / vip / vp / vr / w / wg / wsg / wsr / x / y ] [Search | Free Show | Home]

OCR on videos

This is a blue board which means that it's for everybody (Safe For Work content only). If you see any adult content, please report it.

Thread replies: 20
Thread images: 1

File: ocr.jpg (31KB, 307x273px) Image search: [Google]
ocr.jpg
31KB, 307x273px
/g/, why is this so difficult? Why doesn't a program that does it well exist?
Isn't a video a succession of images?
I need a program that scans a video frame by frame for text, and that outputs what it finds in an SRT file.
I don't know any program other that SubRip and AviSubDetector, and both of them are pretty much shit.
I don't necessarely want an already trained program, I can input letters myself for a while if necessary, but I need one that fucking works.
Is this too much to ask, /g/?
>>
>>60705892
It is not practical.

Usually, you apply OCR to still images where you know there is text, like on documents.
Finding the place where there is text is a huge problem.
This is why we have used forms for so long rather where people fill in a single letter at a time in a box rather than writing on a blank piece of paper.

Finding text in a video is more complicated as you have a lot of images and the text is moving.
>>
>>60705956
the real problem is you need to train a neural net to recognise text and the text in movies is all different fonts, angles, upside down, back to front, colors, pixelated, distorted etc etc etc
if you want a net to recognise that text you have to train it on every possible way the text could appear

now if you were just looking to scrape hard coded subtitles you might have a chance
>>
>>60705956
>>60706141

My bad, I didn't specify that my goal is to rip hardsubs.
You can define the area where they could appear, and they don't move (let alone special cases), so shouldn't it be simpler now?

AviSubDetector can detect well the presence of subs, and with a bit of user inputs it can OCR them them as well.
The problem with this program is that it's too buggy, I just can't use it.

SubRip is less buggy but can't distinguish subs from random dots well.

excuse my ignorance but I just can't unerstand why, given the font, the position and the colors of the text, doesn't exist something that searches for lines of letters and puts it in an .srt with they timing they appear.
>>
>>60706221
because it is too intensive to look at that many frames
>>
>>60706221
why do you want to rip subs anyway
>>
>>60706221
Here's an impractical approach:

>take the video
>generate image frames (5-15fps should be sufficient)
>use those images and apply OCR
>glue OCR + frame number together to generate a .srt
>>
OCR is really fucking slow. Doing it on every frame would be impractical.
>>
>>60706255
Because in Italy nobody releases anime or films in MKV format. Subs are Hardcoded 90% of the times.

>>60706251
Well, the above cited programs can do it, and they are pretty old now.
The problem with them is that they are bad coded, but the OCR engine is decent.

>>60706268
Already had that idea, but I couldn't get pass step 4. How do I do it without knowing close to nothing about coding?

>>60706292
as long as it works, I wouldn't mind it taking eaven days
>>
>>60705956
>Finding text in a video is more complicated as you have a lot of images and the text is moving.
Lol /g/ already has an OCR encoder/decoder for whole files to video.
>>
>>60706347
>as long as it works, I wouldn't mind it taking eaven days

then make a brute version yourself
vlc can export frames to images and matlab can do decent batch ocr and cropping as well as handling overlaps
>>
>>60706347
Just use an open source OCR software, and write a script that takes the text, writes down the time when it appears and dissapears and stores all data into a text file (which a .srt is).
>>
>>60706439
>exporting frames to images
summer on /g/

Just look at /g/'s youtube cloud "software", and use that for subtitles.
>>
>>60706459
I'm just saying there are obvious options available when he's acting like it doesnt exist
>>
>>60706347
>Because in Italy nobody releases anime or films in MKV format. Subs are Hardcoded 90% of the times.

I'm still not understanding what you want to extract the subs for
>>
>>60706473
Because he's an Italian and he's new here.
Don't expect any sense from southern europeans.
>>
>>60706485
To remux them with better video streams of my choice.

>>60706439
>>60706448
So I need to learn to code, no other options?
>>
>>60706536
>So I need to learn to code, no other options?

you literally got told the other options
>>
>>60706459
Can you explain more?
>>
>>60706547
As far as I know every option requires a bit of scripting
Thread posts: 20
Thread images: 1


[Boards: 3 / a / aco / adv / an / asp / b / bant / biz / c / can / cgl / ck / cm / co / cock / d / diy / e / fa / fap / fit / fitlit / g / gd / gif / h / hc / his / hm / hr / i / ic / int / jp / k / lgbt / lit / m / mlp / mlpol / mo / mtv / mu / n / news / o / out / outsoc / p / po / pol / qa / qst / r / r9k / s / s4s / sci / soc / sp / spa / t / tg / toy / trash / trv / tv / u / v / vg / vint / vip / vp / vr / w / wg / wsg / wsr / x / y] [Search | Top | Home]

I'm aware that Imgur.com will stop allowing adult images since 15th of May. I'm taking actions to backup as much data as possible.
Read more on this topic here - https://archived.moe/talk/thread/1694/


If you need a post removed click on it's [Report] button and follow the instruction.
DMCA Content Takedown via dmca.com
All images are hosted on imgur.com.
If you like this website please support us by donating with Bitcoins at 16mKtbZiwW52BLkibtCr8jUg2KVUMTxVQ5
All trademarks and copyrights on this page are owned by their respective parties.
Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.
This is a 4chan archive - all of the content originated from that site.
This means that RandomArchive shows their content, archived.
If you need information for a Poster - contact them.