[haXe] Working with very large strings and remoting messages

Nicolas Cannasse ncannasse at motion-twin.com
Thu Dec 6 10:06:04 CET 2007


James W. Hofmann a écrit :
> Yesterday I worked on two very important pieces of my game and found
> some quirks of haXe and Neko that might interest everyone.
[...]
>     static public function main()
>     {
>         var f = neko.io.File.read("sortednumbers",false);
>         var rawstr = f.readAll();
>         f.close();
>         var data = rawstr.split(" ");
>         var newdata : Array<Int> = new Array();
>         for (n in data)
>             newdata.push(Std.parseInt(n));
>         trace(newdata[getval]);
>     }

"readAll" is not very good here, because it's reading from any stream
with an unknown size, so if you want to get the full content of a big
file, the correct way is to simply use :

var data = neko.io.File.getContent("sortednumbers");

Which is correctly optimized.

Now, in order to get optimized Array.push (which is used also by
String.split implementation), you'll have to use haXe 1.16 which was
optimized for that usage.

> There's probably a way to make Neko faster too, but I decided that for
> my server I could load the whole array into memory at startup. Neko
> takes 26 mb to store this 6mb file(and whatever overhead for the
> interpreter code and standard libraries), since as Nicholas has noted
> it's trading memory for speed. This wouldn't work if I had to deal with
> very large datasets.

The problem here is the GC, since for example in your code, you'll have
at some time three variables on the stack :

a) the complete file string (6MB)
b) the splited string array (6MB of string data + 4 MB for 1M array storage)
c) the 'converted to integers' array (4 MB for 1M array storage)

That's a total of around 20MB only for this part.

What you should consider in order to optimize the whole thing would be
to directly store your file in binary mode (using f.writeInt32 /
f.readInt32). It will remove the whole need for splitting+parsing
integers which whatever the programming language will use big amount of
CPU for large files.

> The second quirk, which has to with remoting, comes in transferring my
> serialized game recordings between the server and clients. (I was using
> raw sockets initially but they have a lot of unreliability and quirks
> that I decided I didn't want to have to deal with.) I discovered that I
> couldn't just dump the entire file in one message; the remoting code
> wasn't made to deal with this and would throw errors when it saw that it
> had to deal with 30kb+ of data.
> 
> The obvious solution is to split it into many messages, and I so I did
> that. But again I ran into a problem, as I initially had the code that
> was uploading fire off new messages as fast as possible. This tested OK
> going from Flash -> Neko, but when I did it the other way Flash would
> mysteriously disconnect in the middle. Very frustrating.

> The final implementation has both sides send messages: the recieving
> side sends back a "continueupload" message each time it gets new data.
> This seems to work pretty well. I haven't figured out yet the optimal
> size for my messages, though.

This is a normal problem with network programming, since you have a
outgoing buffer than once entirely filled doesn't allow anymore to send
messages. A good solution to this is to send a batch of messages (for
example 100 of them) then wait for the client to ask for the next 100
once it's been reading enough of them (for example 50). Doing that
one-by-one would be too much slow.

Hope that helps,

Nicolas




More information about the Haxe mailing list