[Boards: 3 / a / aco / adv / an / asp / b / bant / biz / c / can / cgl / ck / cm / co / cock / d / diy / e / fa / fap / fit / fitlit / g / gd / gif / h / hc / his / hm / hr / i / ic / int / jp / k / lgbt / lit / m / mlp / mlpol / mo / mtv / mu / n / news / o / out / outsoc / p / po / pol / qa / qst / r / r9k / s / s4s / sci / soc / sp / spa / t / tg / toy / trash / trv / tv / u / v / vg / vint / vip / vp / vr / w / wg / wsg / wsr / x / y ] [Search | Free Show | Home]

Writing fast python code

This is a blue board which means that it's for everybody (Safe For Work content only). If you see any adult content, please report it.

Thread replies: 23
Thread images: 1

File: python.jpg (160KB, 1024x768px) Image search: [Google]
python.jpg
160KB, 1024x768px
I did the mistake to write a time critical software in python.

The script does the following:

listen on a named pipe
get relevant information with json[keyname]
get specific parts of the information with regex
check the result against a whitelist
e.g. if stringTest is in listOfStringsTest:
fire up a http request
again json
fire up again a http request


A question to the experienced python programmer:
what parts of the script wastes too much time ?
>>
the two http requests will absolutely dwarf the rest of the code
>>
>>62414831
the last http request is the 'goal' of the script.
i was able to subtract the first one while writing this thread.
how long does a "is in" need ?
150 entrys in the list and each ~8 bytes

would i profit from rewriting it in c ?
>>
>>62414893
>would i profit from rewriting it in c ?
only if you're running this program on a Pentium Pro from 1997
>>
>>62414767
Cython
>>
>>62414767
use pypy
>>
Program is I/O bound. Choice of language won't help you much.
>>
>>62415112
>>62414910
thanks at least i wont need to write it again

>>62414923
>>62414978
will still try this, i know i wont help but i will try it
>>
>>62414767
>get relevant information with json[keyname]
>get specific parts of the information with regex
>check the result against a whitelist
>e.g. if stringTest is in listOfStringsTest:

These could be really naive and dumb. Regex can be very slow if you're abusing it for something that you should be doing with a json parsing library.

If your program is comparing a string to a list of strings one by one that is stupid and slow.
Use whatever python calls a hashset or dictionary or similar, and its an order of magnitude faster. That's assuming you only care about unique strings in your list of strings and don't want duplicates for some reason. Even then maintaining a dictionary with the string hashed against a count of occurances would be faster than a naive list.
>>
>>62414767
check them all first, then open one tcp connection and fire all needed http requests with that single connection, it depends also on the speed and rate limiting of the server if applied
>>
>>62414767
Use requests/lxml. Didn't read full post til now, but I make a living off web scraping. There's no real way to make it 'fast' it's entirely dependent on your dl speed. if you are looking for something you want to run forever I'd recommend perl
>>
>>62415659
better yet put it on AWS or the cloud.
>>
>>62415208
use xpaths and NoSQL to query/store your data.
>>
>>62415687
It is already, i have a 0,7ms ping to the destinstion server
Doed that mean its in the same center ?
>>
>>62414767

>what parts of the script wastes too much time ?

Profile your program so you actually know which parts are slow. Don't guess based on what anonymous strangers on the internet say.
>>
>>62415746
yeah, post a paste or something so we can actually help you.
>>
>>62415760
I dont have permission to post it.
I do it for a customer and the code is not generic.
>>62415208
I think that could help!
The regex is still needed because i dont need it for json, but i will definitly try that hashmap or dictionary thing.
>>62415659
Do you mean the library requests ?
>>62415746
I should do this but im too lazy.
Will probably do this anyways because i have never done it and maybe i learn something. Thx!
>>
>>62415745
Still intereset if my aws instance is in the same data center as the destination server.
How good is 0.7ms ?
>>
>>62415977
I need to say i tryed multiple datacenters in AWS, so it wasnt luck
>>
>>62414767
Definitely the HTTP requests
>>
>>62415169
>will still try this, i know i wont help but i will try it

Then why try it? You are not going to notice much difference. There might be a millisecond shaved off or at least parts of a millisecond. But it's the http requests that is the killer. Unless you manually parse the JSON instead of using the standard library.

Here are some tips to check how much each line of code takes:

http://www.marinamele.com/7-tips-to-time-python-scripts-and-control-memory-and-cpu-usage

Just remember to remove it from the production code. You can get Python do be very high performance because a lot of the standard library and other libraries are based on C code made by very smart people. Python is often used in high performance environments like super computers and so on.
>>
>>62414767
Rewrite it in Nim
Python's standard library is now ported to Nim

https://nim-lang.org/
>>
>>62414767
If you'll be making frequent checks whether an item exists in a collection, don't use a list. Your program has to traverse through the entire list looking for the item. Use a set instead, since checking if an item exists in a set can be done instantly.

Like others mentioned, though, your http requests are probably the bottleneck here.
Thread posts: 23
Thread images: 1


[Boards: 3 / a / aco / adv / an / asp / b / bant / biz / c / can / cgl / ck / cm / co / cock / d / diy / e / fa / fap / fit / fitlit / g / gd / gif / h / hc / his / hm / hr / i / ic / int / jp / k / lgbt / lit / m / mlp / mlpol / mo / mtv / mu / n / news / o / out / outsoc / p / po / pol / qa / qst / r / r9k / s / s4s / sci / soc / sp / spa / t / tg / toy / trash / trv / tv / u / v / vg / vint / vip / vp / vr / w / wg / wsg / wsr / x / y] [Search | Top | Home]

I'm aware that Imgur.com will stop allowing adult images since 15th of May. I'm taking actions to backup as much data as possible.
Read more on this topic here - https://archived.moe/talk/thread/1694/


If you need a post removed click on it's [Report] button and follow the instruction.
DMCA Content Takedown via dmca.com
All images are hosted on imgur.com.
If you like this website please support us by donating with Bitcoins at 16mKtbZiwW52BLkibtCr8jUg2KVUMTxVQ5
All trademarks and copyrights on this page are owned by their respective parties.
Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.
This is a 4chan archive - all of the content originated from that site.
This means that RandomArchive shows their content, archived.
If you need information for a Poster - contact them.