[Boards: 3 / a / aco / adv / an / asp / b / bant / biz / c / can / cgl / ck / cm / co / cock / d / diy / e / fa / fap / fit / fitlit / g / gd / gif / h / hc / his / hm / hr / i / ic / int / jp / k / lgbt / lit / m / mlp / mlpol / mo / mtv / mu / n / news / o / out / outsoc / p / po / pol / qa / qst / r / r9k / s / s4s / sci / soc / sp / spa / t / tg / toy / trash / trv / tv / u / v / vg / vint / vip / vp / vr / w / wg / wsg / wsr / x / y ] [Search | Free Show | Home]

/d/ tools

This is a red board which means that it's strictly for adults (Not Safe For Work content only). If you see any illegal content, please report it.

Thread replies: 8
Thread images: 4

File: 1478301959567.jpg (106KB, 450x703px) Image search: [Google]
1478301959567.jpg
106KB, 450x703px
Hey guys! random /d/eveloper here

I've been wondering if any of you use tools that enhance your lurking/storing/dumping chores. Personally I myself have been looking for a tool that would automatically archive /d/ for me in the background and would allow me to have a digital library with advanced tagging and automatic encryption. Upon finding nothing matching my expectations I decided to code my own thing in the .net framework and so far I finished 4chan browsing, image saving and some other stuff. Since I recently discovered pixiv and tumblr api I'm thinking of making this a /d/eviant swiss toolset. I wonder if it'd be worth releasing the sourcecode on github, and perhaps even getting people to help out to bring a true faplord program into existence!

>traps
>>
>>7111423
Old pic, but it can handle pretty much any chan I throw at it. It's complex as fuck, though, so I'm not sure if releasing it would be a good idea. Currently requires a crawler, the web software (HHVM), and a maintenance/thumbnailing worker. Also requires TokuDB for efficient indexing.

It automatically archives and tags posts, images, SWFs, and webms based on a complex matching language I've come up with that runs on the crawler.


>'(r"\banal\b" win titlebody) and (r"\bfilth\b" nwin titlebody) and (r"\bdump\b" nwin titlebody)':
> addtags: ['anal']

This basically finds any thread with the WORD "anal" in the title or body, but ignores any thread with "filth" or "dump" in the title or body (win also means it checks for negation words, like "no", which would indicate someone said "no filth". nwin means "not win", so logical not is applied to nwin). If found, the thread is archived with the tag "anal".

Project's five years old now.
>>
>>7111584
Holy shit that's impressive. I'm nowhere near that yet and haven't even committed a month to the project. Initially I just had a web extension that sent picture's I rightclicked into a listening socket, later moved on to reading api's and analyzing them. You gave me good ideas though, thanks
>>
File: ChanMan Tag Relationships.png (13KB, 441x342px) Image search: [Google]
ChanMan Tag Relationships.png
13KB, 441x342px
>>7111641
One thing I would suggest is to store all the images in a pool, based on MD5 hash. This reduces a SHITLOAD of duplication, and you can store the tags in a database for quick lookup. The SQL queries are hairy and slow, currently, but my current schema looks something like this.

For those who don't know how to read crows-foot diagrams:
|| means exactly 1 at that end of the relationship
O< means zero to many at that end.
>>
I have a pretty shitty system going, It's all in python (unfortunately a horrible mishmash of both 2 and 3) but mostly automated.

>use a custom 4chan downloader to download any interesting threads on /d/ (it updates once daily)
>once a thread 404's it gets auto moved to the dead-thread bin
>de-duplication script deletes dupes in the dead-thread bin (I run this whenever I feel like it)

>use a pixiv downloader to download various pixiv artists (it also checks for new content once a month)
>other custom script searches the dead-thread bin and deletes any images that already exist in the pixiv folder structure
>when I feel like it, I go through some of the remaining images, reverse search and/or delete them, then add the artist id's to the pixiv downloader artist backlog (I could probably automate this, but I haven't bothered since I rarely do this)

I started downloading shit from the chans in 2013, had 350gb of images until I made the de-duplicator script, which reduced it to 250ish. Currently I have 100~ artists downloaded from pixiv, the backlog is around 2-3 thousand (I don't run the downloader very often). Also, I've got meta info, tags, etc. for every image from pixiv, so that'll be useful for when I eventually set up a personal booru, I guess.

The whole thing started off as "I want to save all this images for a certain thread" and "another pixiv user deleted their account? shit" it kinda evolved from there. I'm planning to rewrite everything from the ground up at some point. I started without any programming knowledge so it's very, very messy. [spoiler]Unfortunately some medical stuff got in the way so I'm not doing much at the moment.[/spoiler]
[spoiler]also, screw deviantart and tumblr (and patreon, but only a little)[/spoiler]
>>
File: firefox_2016-11-11_15-52-12.png (111KB, 902x376px) Image search: [Google]
firefox_2016-11-11_15-52-12.png
111KB, 902x376px
>>7111778
That's sort of how my project started, although it had some roots in a more nefarious project that I stole the guts out of. First it was in Java, but Java had a lot of overhead and was a pain to update (see: Maven), so I switched to C#. C# worked OK but had really shitty HTTP support and even worse XML parsing, so I switched to Python. Next job will be converting the worker script from PHP to Python, as well, and to stop using the Qt MySQL connector.

Keep it up, you'll get there.
>>
I use hydrus client. Its got a lot of features I don't make use of, but new ones are still being added, and there's an update every wednesday.
>>
>>7112044
Interesting software! It was kinda what I was aiming towards, although it looks terribly overcomplicated. I will want to focus on making my application more userfriendly.
Thread posts: 8
Thread images: 4


[Boards: 3 / a / aco / adv / an / asp / b / bant / biz / c / can / cgl / ck / cm / co / cock / d / diy / e / fa / fap / fit / fitlit / g / gd / gif / h / hc / his / hm / hr / i / ic / int / jp / k / lgbt / lit / m / mlp / mlpol / mo / mtv / mu / n / news / o / out / outsoc / p / po / pol / qa / qst / r / r9k / s / s4s / sci / soc / sp / spa / t / tg / toy / trash / trv / tv / u / v / vg / vint / vip / vp / vr / w / wg / wsg / wsr / x / y] [Search | Top | Home]

I'm aware that Imgur.com will stop allowing adult images since 15th of May. I'm taking actions to backup as much data as possible.
Read more on this topic here - https://archived.moe/talk/thread/1694/


If you need a post removed click on it's [Report] button and follow the instruction.
DMCA Content Takedown via dmca.com
All images are hosted on imgur.com.
If you like this website please support us by donating with Bitcoins at 16mKtbZiwW52BLkibtCr8jUg2KVUMTxVQ5
All trademarks and copyrights on this page are owned by their respective parties.
Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.
This is a 4chan archive - all of the content originated from that site.
This means that RandomArchive shows their content, archived.
If you need information for a Poster - contact them.