Throttle memory/net I/O
Bug #931211 reported by
Drew Smathers
This bug affects 1 person
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Bafload |
New
|
Undecided
|
Unassigned |
Bug Description
Currently no throttling is done for uploads which is very bad for memory performance. Implement pluggable throttling strategy to prevent too much file data from being held in memory at the same time.
To post a comment you must log in.
Related conversation from Twisted IRC:
13:18 djfroofy: anyone have chops with unix mmap? writing some twisted (txaws) code and want to do some optimizations for multipart upload where i *think* mmap might
halts)
chunks at a time /code.launchpad .net/txaws
held in memory, will be your total memory pressure
programs
be of use
13:19 lifeless: sure. uhm, don't do it.
13:19 ivan: funny, I was playing with mmap today trying to make an optimized line reader and it was several times slower than the simplest Python
13:19 ! Vertlet [~<email address hidden>] has joined #twisted
13:20 lifeless: unlike read(), mmap blocks the process -> can't be deferred to a thread sensibly
13:20 lifeless: your local IO will be about a billion times faster than your network
13:20 ivan: interesting
13:20 lifeless: so whatever you save on memory, you'll pay for with seek latency if you get any serious load at all (and the seek latency turning into total process
13:20 djfroofy: lifeless: hehe ... yeah, ok
13:21 lifeless: f.read() -> allowThreads, read(), stopThread()
13:21 lifeless: mmap -> pagefault, stop, wait :)
13:21 djfroofy: lifeless: thanks for the sanity check. the idea was to do optimization on mp upload, mmap the different parts of the file. rather than reading in 5MB
13:22 djfroofy: any other ideas that doesn't involve using mmap?
13:22 lifeless: well, what are you trying to optimise ?
13:22 lifeless: not 'what it does', but 'what part of the thing'
13:22 ! Vertel [~<email address hidden>] has quit [Ping timeout: 245 seconds]
13:22 lifeless: speed, memory, cpu, robustness, ...
13:22 djfroofy: so for uploading a 20GB file for example, not loading that into memory buffers
13:22 djfroofy: memory + speed
13:23 lifeless: probably you want to contribute support for multipart upload
13:23 djfroofy: lifeless: yes
13:23 lifeless: then pick a memory size allowance and work on that size chunks
13:23 djfroofy: lifeless: i have
13:23 djfroofy: contributed support or mp that is
13:23 lifeless: cool
13:23 lifeless: yeah, I saw something go by
13:24 djfroofy: ichoate as it is: https:/
13:24 lifeless: from there, just pick a decent size - e.g. 1MB, and work in that size chunks
13:24 djfroofy: lifeless: from my understanding the minimum for upload_part is 5MB except for the last part
13:25 ! antihero_ is now known as antihero
13:25 lifeless: your system page cache will behave approximately the same as if you mmapped, and your process with however many chunks you allow to be inflight at once
13:25 lifeless: (oh, and the tcp socket)
13:26 djfroofy: lifeless: right, so without mmap (which is bad from the good reasons you gave me), i assume the right strategy is to throttle how many parts are held
in memory
13:26 lifeless: right (you'd have to do that with mmap too BTW)
13:27 lifeless: because otherwise you're basically /asking/ for your process to be arbitrarily swapped out by the VM subsystem, and that works terribly for python
13:28 lifeless: for python to stay fast you want your total ...