Say I have a big .txt file with information structured like this:
"name":"Hitler","Stallman likes to eat dead skin off of his feet";"name":" "Dick_Butt","I like turles";"name":"Ahmed_Muhammed":"I made a clock!";"name":"anon","Install gentoo.";"name":"12345","435464523"
I want to make a script that looks for the word "clock", and if it finds it it should save the name to the left of it to a text file, in this case "Ahmed_Muhammed", then continue looking for more instances of "clock" until it reach the end of the .txt file.
Can this be done with a batch file?
Pic unrelated
>>51546770
Your inconsistent use of delimiters indicates that you are a huge faggot.
Somehow
I won't tell
>>51546802
Yeah? Is it possible to save the output to "output.txt" with a batch file?
>>51546827
You sure? Anon above says it's possible.
cat input | tr ';' '\n' | grep -o "\"name\":\"\([[:alpha:]]\|_\)*\":\".*clock.*\"" | cut -d '"' -f 4 > output
Very ad hoc.
>>51546865
> yeah
No
> you sure
It's a secret
>>51546845
>Your inconsistent use of delimiters indicates that you are a huge faggot.
I didn't make the data set, the above is just an example based on the same structure.
Also follow up question: I have roughly 1TB of this data, will a batch file be able to handle that kind of amount you think?
>>51546894
Have you try ed zsh?
I guess you could explode on : and then strip "" from your entries in the array or something.
>Go fuck yourself though.
>>51546914
>Have you try ed zsh?
I'm on windows, so I can't do shell scripting.
>>51546950
Cygwin/MSYS/Babun. Yes you can.
>>51546950
http://gnuwin32.sourceforge.net/packages/coreutils.htm
Also, just get a VM. Or powershell. And get some motivation. Nobody will do your work for you, hopefully.
>>51546950
Hahahahaahahahahahqhahahahahahahahahahahahahqhqhqhahahqhahahahahahahahahahahahahhahahahahahaahhahahahahahahahahahahahhahahahahahahahahahahahahahahahahahahahahahahahahahahahhahahahahahahaahhahahahahahahahahahahahahahahahhahahahahahahahahahahhahahahahahahahhahaahhahahhahahahahahahahaahahahhahahaahahhaahahahahahhahahahaahahahhahahahahahhahahahahhahahahahahahhahahahahahahahahahahhahahahahaha
>>51546881
>unix
I'm on Windows 7.
Curried troll thread
>>51546950
Yes you can.
>>51546992
Trust me, I'm trying my best.
>>51547049
Windows batch scripts are useless compared to bash + coreutils.
>>51546992
You're not tricking me into installing Linux again /g/. But I've installed coreutils. What now? How do start this thing?
>>51547020
Install gentoo.
(use mingw32 or something)
>>51547114
>use mingw32 or something
Actually, I have Eclipse installed. Maybe I could do it in Java? I took a Java class back in the day, with a little help I think I could pull it off.
>>51547153
>java
R u meming me again
From my understanding text files, and any file for that matter, is 100% immutable and all you can do is turn it on and off
>>51547020
Just boot a livecd with ntfs-3g installed, mount your partition and use the script faggot
>>51547108
Install Perl or Python.
>>51546942
I don't know what that means. I was thinking doing it like this:
>Detect word "clock"
>read six characters to the left of the word
>check if the string == "name"
>If it does not, jump one character to the left, read six characters, check if the string == "name"
>Do this until the string == "name" is found
>Jump two characters to the right
>Read 1 character
>Check if the character == "
>If it's not, check if the character == one of n numbers of ascii characters
>When the character == an ascii character, save this character to a file then move one character to the right, check if the character == ", if it isn't do the previous step
>Do this until the character == "
>Continue searching for more "clock" after the "
I know it's probably a shit way to do it, but in my head it seems like it could work??
>>51547369
>all that shit
Bruh.
>>51547387
Do you know a better way anon? because I don't and I'm trying very hard here.
>>51547369
>using set numbers for a variable
Damn son, I failed the shit out of intro to programing but even im not that dumb
>>51547425
Yeah
>>51547425
Also I'm sure you can just blindly cut on delimiters (unless they can be present if quoted/escaped, then you need to be smart about it). And I know nothing about Java but it probably can read a line at a time and do regular expressions.
>>51547369
>>51547468
Meant to quote >>51546881 m8.
Here is some of the real data I'm working with. It's roughly 1.7 billion reddit comments formatted like this:
{"score_hidden":false,"name":"t1_cnas8zv","link_id":"t3_2qyr1a","body":"Most of us have some family members like this. *Most* of my family is like this. ","downs":0,"created_utc":"1420070400","score":14,"author":"YoungModern","distinguished":null,"id":"cnas8zv","archived":false,"parent_id":"t3_2qyr1a","subreddit":"exmormon","author_flair_css_class":null,"author_flair_text":null,"gilded":0,"retrieved_on":1425124282,"ups":14,"controversiality":0,"subreddit_id":"t5_2r0gj","edited":false}
{"distinguished":null,"id":"cnas8zw","archived":false,"author":"RedCoatsForever","score":3,"created_utc":"1420070400","downs":0,"body":"But Mill's career was way better. Bentham is like, the Joseph Smith to Mill's Brigham Young.","link_id":"t3_2qv6c6","name":"t1_cnas8zw","score_hidden":false,"controversiality":0,"subreddit_id":"t5_2s4gt","edited":false,"retrieved_on":1425124282,"ups":3,"author_flair_css_class":"on","gilded":0,"author_flair_text":"Ontario","subreddit":"CanadaPolitics","parent_id":"t1_cnas2b6"}
{"score_hidden":false,"link_id":"t3_2qxefp","name":"t1_cnas8zx","created_utc":"1420070400","downs":0,"body":"Mine uses a strait razor, and as much as i love the clippers i love the razor so much more. Then he follows it up with a warm towel. \nI think i might go get a hair cut this week.","distinguished":null,"id":"cnas8zx","archived":false,"author":"vhisic","score":1,"subreddit":"AdviceAnimals","parent_id":"t3_2qxefp","retrieved_on":1425124282,
I'm trying to extract the user name for posts that contain certain words I'm interested in.
>>51546770
>I have roughly 1TB of this data,
Youi gonna be busy a looooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooonnnnnnnnnnnnnnnnnnnnnnnnnnnnnnng time if you insist on doing this blindly in Windows, nigga.
>>51547566
Then help me anon, you're my only hope!
I should mention that if I get this up and running, I intend to use it to PM anyone who have posted about specific rare disease on reddit. It'll help a lot of people.
>>51547613
Debian man get moving
>>51547566
Also I have a server I can run it on 24/7, if it takes a year to complete that's fine. I just need to get it working.
>>51547636
I have CentOS in a VM, can I use that? If so I'll boot it right up.
>>51547656
yes or I think there is sed for windows. If there is:cat file|tr ';' '\n'|sed '/clock/!d;s/^[^:]*:"\([^"]*\)".*/\1/'
would work
>>51547656
You can but it's gona be slower than tuning it natively do sent matter what destroy you use I would use Debian for something like this or xbuntu
>>51547613
I posted this in another thread but this has enough to get you started.
https://www.youtube.com/watch?v=smbeKPDVs2I
The topic isn't what is important, it's the commands you want to take note of.
>>51547701
I've tried to install The GnuWin port of Sed on windows, but I can't get it to work.
I'm booting into CentOS now.
You could do it within a blink using R
But you would need to sort that data properly (aka ":" dont belong into csv files)
i could do that in 2 minutes
get out
>>51547715
I'm running Windows on my server, that's why I was hoping for a Win solution. If a VM turns out to be too slow, I'll dualboot linux on it no problem.
>>51547875
I wish I were as pro as you brah
>>51547701
I tried, what am I doing wrong here?
>>51547978
oh that was for the file in the op, for
>>51547554
you'd need something different.
if all of the '{}''s are on different lines, just docat sample|sed '/body":.*searchterm[&"]*/!d;s/.*name":"\([^"]*\).*/\1/'otherwise add tr '{}' '\n\n'
>>51548073
All the '{}''s are on different lines.
In the pic I posted, the same error comes regardless of the content of the text file.
Am I correct in assuming I should substitute the "sample" for the path to the file containing the data?
Pic related is how the data is stored.
it's called json ya fag.
make a simple python script
>>51548161
try putting single quotes around or a \ before all !
>>51546770
>batch
fuck off
grep/regex should do what you want
>>51548073
How do you know all that regex crap but not know that you can just dosed s/example// file.txt, rather thancat file.txt | sed s/example//
>>51547554
m8, that's JSON
>>51547554
>Damn son, I failed the shit out of intro to programing but even im not that dumb
thats json buddy
just use python's json module and you're done
>>51548205
I somehow managed to reference an empty data file. Now i get this error though (see pic).
>>51548179
this
should be relatively simple
psuedo-code, don't take it literally
there's no way this'll work since I don't know the exact format of the fileimport json
with open('your_file', 'rb') as infile:
file_as_text = infile.read().decode('utf-8')
j = json.loads(file_as_text)
if 'clock in j':
print("%s mentioned clock' % j['author']")
>>51548425
>if 'clock in j':
correctionif 'clock' in j['comment']:
>>51548425
>print("%s mentioned clock' % j['author']")
another correctionprint("%s mentioned clock" % j['author'])
either way, it doesn't matter since pseudo-code
the point was to show that this is ridiculously simple
>>51548425
>>51548440
>>51548494
The files are a series of JSON blocks delimited by new lines (\n). The files them selves have no ending, but they open fine in UltraEdit.
If it's as simple as you say, can I pay you some bitcoin to slap a python script together for me?
bump
>reinventing a json parser
>not just import json and continue with life
And you guys make fun of python where you just
>import program
>>51548939
How is this done exactly? Say I want to find all the users who have posted the word "orange" in the data set?
>>51548997
Google it. Or just fuck off back to reddit already.