Discussion:
Corrupted Packet Nano-X
Greg Haerr
2007-09-12 06:18:39 UTC
Permalink
Also, are you running a multithreaded application?
The Problem appears, when I am trying to do a lot of "focus in" and "focus
out". In fact, when I am switching two bitmaps, after a couple of times, it
is getting very slow until the "out of memory"/ "corrupted packet" message
appears...

The only easy fix to this issue will be to only allow additional
threads other than the main original thread to execute void GrXXX
functions (that is, typically draw functions) only, and allow ONLY
the main thread to execute non-void functions or any function that
could require a wait and/or a read from the server. In this way,
the THREADSAFE option protects the multiple threads doing
write-only client->server operations from stepping on each
other in the middle of a request, but the server->client
communication is read and processed only by a single
thread, the main thread.

BTW, the reason this can't be fixed given the current
protocol specification is that there isn't a standard-length
reply from the server, and there's only a basic queuing
mechanism in the client library. This means that any thread
reading the server pipe can't know how many bytes to
read, and thus may get interrupted and task switched
while in the middle of reading data from the server.
The next thread wakes up, does a read, and gets
unexpected crap from the middle of the previous
thread's response packet.

Regards,

Greg
Detzner, Peter
2007-09-12 06:50:59 UTC
Permalink
-----Ursprüngliche Nachricht-----
Von: Greg Haerr [mailto:greg-o/g02q+***@public.gmane.org]
Gesendet: Mittwoch, 12. September 2007 08:19
An: Detzner, Peter
Cc: Nanogui List
Betreff: Re: [nanogui] Corrupted Packet Nano-X
Also, are you running a multithreaded application?
The Problem appears, when I am trying to do a lot of "focus in" and
"focus
out". In fact, when I am switching two bitmaps, after a couple of times, it is getting very slow until the "out of memory"/ "corrupted packet" message appears...

The only easy fix to this issue will be to only allow additional threads other than the main original thread to execute void GrXXX functions (that is, typically draw functions) only, and allow ONLY the main thread to execute non-void functions or any function that could require a wait and/or a read from the server. In this way, the THREADSAFE option protects the multiple threads doing write-only client->server operations from stepping on each other in the middle of a request, but the server->client communication is read and processed only by a single thread, the main thread.

BTW, the reason this can't be fixed given the current protocol specification is that there isn't a standard-length reply from the server, and there's only a basic queuing mechanism in the client library. This means that any thread reading the server pipe can't know how many bytes to read, and thus may get interrupted and task switched while in the middle of reading data from the server.
The next thread wakes up, does a read, and gets unexpected crap from the middle of the previous thread's response packet.

Regards,

Greg
Greg Haerr
2007-09-12 20:24:25 UTC
Permalink
: I'm wondering if a simpler more appropriate fix would be to put a lock
: on the client read side for server responses. I'll cook something up
: and see how it works. otherwise I'll dust off our big lock code. (if
: anybody's interested, I'll post it to the list. it is GNU-specific.)

Yes, I'd like to see that code. (actually you may have sent
it some time ago, this sounds familiar). However, since
THREADSAFE wraps all Gr functions, what's the difference
with your approach?

The current THREADSAFE implementation uses the same
lock around all Gr calls, including client server read calls,
so the above shouldn't be an issue, right?

Regards,

Greg
Aaron J. Grier
2007-09-18 18:09:24 UTC
Permalink
Post by Greg Haerr
I'm wondering if a simpler more appropriate fix would be to put a
lock on the client read side for server responses. I'll cook
something up and see how it works. otherwise I'll dust off our big
lock code. (if anybody's interested, I'll post it to the list. it
is GNU-specific.)
Yes, I'd like to see that code. (actually you may have sent it some
time ago, this sounds familiar). However, since THREADSAFE wraps all
Gr functions, what's the difference with your approach?
The current THREADSAFE implementation uses the same lock around all Gr
calls, including client server read calls, so the above shouldn't be
an issue, right?
after studying the code for a couple days, you're correct... my old
big-lock code (attached) should be equivalent to the use of SERVER_LOCK
with LINK_APP_INTO_SERVER.
The corrupted packet has to do with a sync issue that arises when a
non-void GrXXX call reads the pipe to get its return data, but gets an
out-of-sync event from the server instead.
clients have to hold a lock to send a request to the server, and don't
release it until they read back any necessary response. the server is
single-threaded, and only handles one request at a time. what am I
missing?
--
Aaron J. Grier | Frye Electronics, Tigard, OR | aaron-***@public.gmane.org
Detzner, Peter
2007-09-12 06:50:44 UTC
Permalink
Hey,

Nope, the patch doesnt fix the problem. Yes it is a multithreading system. The UPNP Library creates a threads pool (2 <= threadspool <= 12).

So maybe the multithreading is the problem?

Regards,

Pete

-----Ursprüngliche Nachricht-----
Von: Greg Haerr [mailto:greg-o/g02q+***@public.gmane.org]
Gesendet: Montag, 10. September 2007 19:09
An: Detzner, Peter
Betreff: Re: [nanogui] Corrupted Packet Nano-X

Peter -

Try adding this patch to nanox/client.c, and let me know whether this fixes the problem.

Also, are you running a multithreaded application?

Regards,

Greg


----- Original Message -----
From: "Detzner, Peter" <P.Detzner-***@public.gmane.org>
To: "Greg Haerr" <greg-o/g02q+***@public.gmane.org>
Sent: Monday, September 10, 2007 6:51 AM
Subject: AW: [nanogui] Corrupted Packet Nano-X



Hey,

I have still this problem... I've changed already MAXREQST in the srvnet.c, but there is still the problem. I've attached 3 files of it. I guess, it is enough to understand my programm...

The Problem appears, when I am trying to do a lot of "focus in" and "focus out". In fact, when I am switching two bitmaps, after a couple of times, it is getting very slow until the "out of memory"/ "corrupted packet" message appears...

Please help me, it is my final dissertation and I have no clues any more...




-----Ursprüngliche Nachricht-----
Von: Greg Haerr [mailto:greg-o/g02q+***@public.gmane.org]
Gesendet: Montag, 3. September 2007 19:02
An: Detzner, Peter; nanogui-***@public.gmane.org
Betreff: Re: [nanogui] Corrupted Packet Nano-X
nxclient 548: Corrupted packet
Do you have an idea, why this happenes? After loading an Image from Buffer, I take care, that freeImage is also executed as the next step - of course after "drawImageToFit(...)"...

This seems to be something to do with the maximum request size packet overflowing a server buffer.
Grep the headers for a MAXREQSZ define or something like that and increase it. The system is supposed to break down images into smaller pieces but may not be for the GrDrawImageToFit function you're using.

The overflow buffer is in nanox/srvnet.c::GsHandleClient() IIRC.

Regards,

Greg
Detzner, Peter
2007-09-12 06:50:56 UTC
Permalink
-----Ursprüngliche Nachricht-----
Von: Greg Haerr [mailto:greg-o/g02q+***@public.gmane.org]
Gesendet: Mittwoch, 12. September 2007 08:11
An: Detzner, Peter
Cc: Nanogui List
Betreff: Re: [nanogui] Corrupted Packet Nano-X
Post by Detzner, Peter
So maybe the multithreading is the problem?
I should have asked this in the beginning, its definitely the problem.
Despite having THREADSAFE=Y, if more than one makes a request with a non-void GrXXX function (that is, one that requires a response from the server), then the client/server interaction on the single pipe to the application gets out of sync, and the "corrupted packet" message is generated.
This is because two threads have attempted to read or write the pipe at the same time, and junk gets written in the middle of a packet.

The THREADSAFE option puts mutex's to protect against a task switch between two writers, but can't protect against a thread trying to read a response while another, usually the main thread, is in GrGetNextEvent.

Regards,

Greg

ps: please post reponses to the list
Aaron J. Grier
2007-09-12 18:44:37 UTC
Permalink
Post by Greg Haerr
The only easy fix to this issue will be to only allow additional
threads other than the main original thread to execute void GrXXX
functions (that is, typically draw functions) only, and allow ONLY the
main thread to execute non-void functions or any function that could
require a wait and/or a read from the server. In this way, the
THREADSAFE option protects the multiple threads doing write-only
client->server operations from stepping on each other in the middle of
a request, but the server->client communication is read and processed
only by a single thread, the main thread.
BTW, the reason this can't be fixed given the current protocol
specification is that there isn't a standard-length reply from the
server, and there's only a basic queuing mechanism in the client
library. This means that any thread reading the server pipe can't
know how many bytes to read, and thus may get interrupted and task
switched while in the middle of reading data from the server. The
next thread wakes up, does a read, and gets unexpected crap from the
middle of the previous thread's response packet.
I've also run into this problem since trying our app with client/server.

I have previously been using a "big lock" approach with our
multithreaded application, replacing _all_ nano-X calls with mutex
wrappers via link-time magic. (it was implemented a couple years before
the THREADSAFE option appeared in nano-X.) the big lock has proven
reliable (we have shipped hundreds of instruments since late 2003 and
never run into this problem) but it does mean there is some risk of
denial-of-service / priority inversion since a lower priority thread
could potentially starve out a higher one by making repeated graphics
calls.

I'm wondering if a simpler more appropriate fix would be to put a lock
on the client read side for server responses. I'll cook something up
and see how it works. otherwise I'll dust off our big lock code. (if
anybody's interested, I'll post it to the list. it is GNU-specific.)
--
Aaron J. Grier | Frye Electronics, Tigard, OR | aaron-***@public.gmane.org
Greg Haerr
2007-09-12 06:11:23 UTC
Permalink
Post by Detzner, Peter
So maybe the multithreading is the problem?
I should have asked this in the beginning, its definitely the problem.
Despite having THREADSAFE=Y, if more than one makes
a request with a non-void GrXXX function (that is, one that
requires a response from the server), then the client/server
interaction on the single pipe to the application gets out of
sync, and the "corrupted packet" message is generated.
This is because two threads have attempted to read or
write the pipe at the same time, and junk gets written
in the middle of a packet.

The THREADSAFE option puts mutex's to protect
against a task switch between two writers, but
can't protect against a thread trying to read a response
while another, usually the main thread, is in GrGetNextEvent.

Regards,

Greg

ps: please post reponses to the list
Detzner, Peter
2007-09-03 09:26:41 UTC
Permalink
Hey,

I am using MicroWindows/Nano-X in Version 0.91. When I am doing some
stuff with Bitmaps, my application is killed with this message:

nxclient: bad readblock -1, errno 104
nxclient 548: Corrupted packet
[1] + Killed nano-X

Do you have an idea, why this happenes? After loading an Image from
Buffer, I take care, that freeImage is also executed as the next step -
of course after "drawImageToFit(...)"...

Please help me...

Thanks,

pete
Greg Haerr
2007-09-03 17:02:01 UTC
Permalink
nxclient 548: Corrupted packet
Do you have an idea, why this happenes? After loading an Image from
Buffer, I take care, that freeImage is also executed as the next step -
of course after "drawImageToFit(...)"...

This seems to be something to do with the maximum
request size packet overflowing a server buffer.
Grep the headers for a MAXREQSZ define or
something like that and increase it. The system
is supposed to break down images into smaller
pieces but may not be for the GrDrawImageToFit function
you're using.

The overflow buffer is in nanox/srvnet.c::GsHandleClient()
IIRC.

Regards,

Greg
nanogui-G/ASJRsgvFgvW+L+
2007-09-14 12:15:57 UTC
Permalink
Hi Greg,
Post by Detzner, Peter
So maybe the multithreading is the problem?
I should have asked this in the beginning, its definitely the problem.
Despite having THREADSAFE=Y, if more than one makes
a request with a non-void GrXXX function (that is, one that
requires a response from the server), then the client/server
interaction on the single pipe to the application gets out of
sync, and the "corrupted packet" message is generated.
This is because two threads have attempted to read or
write the pipe at the same time, and junk gets written
in the middle of a packet.
Am I missing something?
How can two threads read or write the pipe at the same time? All GrXXX
functions are protected by the nxGlobalLock mutex, which would mean,
that only one thread at the time can access the pipe. The one holding
the lock.
Post by Detzner, Peter
The THREADSAFE option puts mutex's to protect
against a task switch between two writers, but
can't protect against a thread trying to read a response
while another, usually the main thread, is in GrGetNextEvent.
I believe that only with my patch, which unlocks before select and
locks after select in _GrGetNextEventTimeout you can have a situation
like the one you describe.

Besides that if you apply my patch and call non void functions from
different threads you can have a deadlock because the read inside
ReadBlock is blocking. (You can make it non blocking, which might lead
to starvation;)
The following is the scenario:
In GrGetNextEvent unlock and call select
No event is coming

Another thread can run and since we are unlocked do stuff, which leads
to a call to read.
Since now this GrXXX took the lock and GrGetNextEvent would need the
lock to process the event, we are deadlocked.

With LINK_APP_INTO_SERVER everything seems to work, which looks like
there is something wrong with the client server communication.

Do you think it would help to have different read and write file descriptors?

Regards,

Robert
Greg Haerr
2007-09-14 17:20:08 UTC
Permalink
Post by nanogui-G/ASJRsgvFgvW+L+
How can two threads read or write the pipe at the same time? All GrXXX
functions are protected by the nxGlobalLock mutex, which would mean,
that only one thread at the time can access the pipe. The one holding
the lock.
That's correct, but its more complicated than that. The corrupted
packet has to do with a sync issue that arises when a non-void
GrXXX call reads the pipe to get its return data, but gets an
out-of-sync event from the server instead.

The "threading" issue is probably more of a client/server
synchronisation issue, that the current (only) nano-X
protocol requires client and server stay completely
in-sync. There was never any design in the protocol
to allow for asynchronous event delivery etc that requires
lots more tricky code and queuing on both client and
server sides.
Post by nanogui-G/ASJRsgvFgvW+L+
The THREADSAFE option puts mutex's to protect
against a task switch between two writers, but
can't protect against a thread trying to read a response
while another, usually the main thread, is in GrGetNextEvent.
I believe that only with my patch, which unlocks before select and
locks after select in _GrGetNextEventTimeout you can have a situation
like the one you describe.

I hadn't known you're running your patched code for threading
and now getting corrupted packet errors. I can't comment on
that without heavily studying your patch.
Post by nanogui-G/ASJRsgvFgvW+L+
With LINK_APP_INTO_SERVER everything seems to work, which looks like
there is something wrong with the client server communication.

Agreed.
Post by nanogui-G/ASJRsgvFgvW+L+
Do you think it would help to have different read and write file descriptors?
No

Regards,

Greg

Loading...