Discussion:
GUI hangs inside main event loop - I stop getting events!
nick aschberger
15 years ago
Permalink
Hi Folks,

Details are:

WX : 2.8.10
OS : Linux Centos-5 (Red hat 5)
COMPILER: gcc 4.1.2


I have a multithreaded application - there is the main thread, and a simple
wxthread whose job it is to block on a socket, and take data off the socket
and buffer it.
Periodically, the worker thread posts events (wxPostEvent) to the main
thread
to tell it what is going on, then waits on a condition variable to be
notified
it may again progress. All standard threading behaviour.

This works swimmingly well in my windows build of the application, and works

99% of the time on my linux build, however - on linux the main thread seems
to occasionally just get "lost".
I stop getting events - there is no updateUIevent, no refreshing, and when
the worker thread sends it's event to the main GUI thread, this is not
responded to either.

The application becomes unresponsive - the menu's are "blanked out" and
can't be clicked on, etc. The main thread is stuck!
Oddly, it a "CloseEvent" does fire (on "x" click), but the "exit - yes/no"
dialog I pop up is also locked up/blank.


Here is a backtrace of where the app is:

0x003da402 in __kernel_vsyscall ()
(gdb) bt
#0 0x003da402 in __kernel_vsyscall ()
#1 0x00e0ffeb in poll () from /lib/libc.so.6
#2 0x081486db in wxapp_poll_func (ufds=0x9cdc928, nfds=2, timeout=-1)
at ./src/gtk/app.cpp:259
#3 0x00a1b1c9 in ?? () from /lib/libglib-2.0.so.0
#4 0x00a1b557 in g_main_loop_run () from /lib/libglib-2.0.so.0
#5 0x02d9db84 in gtk_main () from /usr/lib/libgtk-x11-2.0.so.0
#6 0x08218127 in wxEventLoop::Run (this=0x9e19058) at
./src/gtk/evtloop.cpp:76
#7 0x08191fe6 in wxAppBase::MainLoop (this=0x9cb5f88)
at ./src/common/appcmn.cpp:312
#8 0x08191a23 in wxAppBase::OnRun (this=0x9cb5f88)
at ./src/common/appcmn.cpp:367
#9 0x082bc0a8 in wxEntry (argc=@0xbfcb2c00, argv=0xbfcb2c84)
at ./src/common/init.cpp:460
#10 0x08083380 in main (argc=Cannot access memory at address 0x2
) at ../../src/MyAppMain.cpp:42


If I step it, I can never get it to return from "g_main_loop_run()".

Has anyone got any suggestions at all for this? I'm stumped, and it's a a
showstopper for me.

cheers

Nick Aschberger
Andreas Mohr
15 years ago
Permalink
Hi,
Post by nick aschberger
Hi Folks,
WX : 2.8.10
OS : Linux Centos-5 (Red hat 5)
COMPILER: gcc 4.1.2
Pretty similar setup here (RHEL 5, 2.8.10, AFAIK 4.1.2). Works (at least
sort of) - I don't know of any such main loop lockup issues here.
...
strace -f -p ... might help by comparing poll() / select() behaviour
before/after it gets stuck (number of descriptors it's waiting on, etc.).
Also see /proc/[PID]/fd[info] for details of which descriptors it's
waiting on. Or maybe lsof, fuser etc. could help here too.

Also, definitely use valgrind/helgrind to pinpoint issues.

Note that the "gtk file chooser" is completely broken from a thread-safety
POV (as seen in Helgrind), you need to switch the backend from (IIRC) gtk+
to something else to avoid race conditions, see internet.
(if you use wxFileDialog, this might be a problem)

And in general my opinion of RHEL5-based systems is not very high,
IMHO that version is even rougher/more disappointing than previous versions.
(Debian user here)

HTH,

Andreas Mohr
Nick Aschberger
15 years ago
Permalink
Thanks Andreas,

Some more information:
- I can confirm that the problem shows up on a centos 4 (redhat4)
machine as well
- Also, the app still gets move/paint events, it's just that the GUI
is locked up, menu's unresponsive, etc.

I notice this in the install-gtk.txt file:

"wxWidgets/Gtk requires a thread library and X libraries known to work
with
threads. This is the case on all commercial Unix-Variants and all
Linux-Versions that are based on glibc 2 except RedHat 5.0 which is
broken in
many aspects. As of writing this, virtually all Linux distributions
have
correct glibc 2 support."

So... that's my exact OS. That's not good news for my threaded app,
when the worker thread posts events to the main GUI thread is it?

Can anyone confirm what the symptoms of this "GTK broken-ness" is?

cheers

Nick
...
Andreas Mohr
15 years ago
Permalink
Post by Nick Aschberger
Thanks Andreas,
- I can confirm that the problem shows up on a centos 4 (redhat4)
machine as well
Good data point.
Post by Nick Aschberger
- Also, the app still gets move/paint events, it's just that the GUI
is locked up, menu's unresponsive, etc.
You don't do any GUI in secondary threads, right?
(oh wait, yes perhaps you do; this sounds pretty similar to symptoms
I had in such cases after all)
If you do, then there should definitely be ::wxMutexGuiEnter() /
::wxMutexGuiLeave() calls around your secondary-thread GUI use.
Note that _even_ a simple wxBell() call can be considered "GUI"
since it also uses these mechanisms (X11 access?).
(guess why I'm mentioning this...)
Post by Nick Aschberger
So... that's my exact OS. That's not good news for my threaded app,
when the worker thread posts events to the main GUI thread is it?
OK, so this seems to indicate that you restricted GUI operations to
primary-thread-only (as is much healthier after all).
Still, watch out for other calls (wxBell() and similar) in your
secondary thread which straddle GUI fences.
A good way might be to watch a backtrace as to what characterizes a
"GUI access" and try to trap your secondary thread in the debugger
on such an access (not sure how exactly to do that though).
Post by Nick Aschberger
Can anyone confirm what the symptoms of this "GTK broken-ness" is?
That "gtk file chooser" issue I mentioned was a pretty specific issue
only; still it did cause recognizable damage here which went away
once disabled - but AFAIR it did not cause full lockups of the entire
event loop, but simply MALLOC_CHECK_ asserts (or maybe since it always
trapped on memory corruption asserts very early for me, it perhaps simply
never managed to go all the way to causing event loop lockups).

BTW, my related code/doc (unfinished) is:

#warning "WARNING!! gnome-vfs GTK file chooser backend causes heavy
memory corruption (thread-safety issues), need to add code to deselect
it on runtime, actively"
#if 0 // DO IT LATER!!
// see http://library.gnome.org/devel/gtk/2.15/GtkFileChooser.html
// Application Segfaults on wxFileDialog",
// http://trac.wxwidgets.org/ticket/2787
// Do something like this:
// - add "fetch GTK version" helper
// - check for affected GTK version range
// - call g_object_get_property()
// - if gnome-vfs detected, replace with gtk+
// - be happy
#ifdef __WXGTK__
#include <gtk/gtk.h>
static void gtk_file_chooser_sanitize(const wxFileDialog *pWxDlg)
{
GValue gvalue = { 0, };
g_value_init( &gvalue, G_TYPE_STRING );
g_value_set_string( &gvalue, "gnome-vfs");
g_object_set_property( G_OBJECT(pWxDlg->GetHandle()),
"file-system-backend", &gvalue );
g_value_unset( &gvalue );
}
#endif // __WXGTK__
#endif

Andreas Mohr
Nick Aschberger
15 years ago
Permalink
Thanks Again,
...
I don't... I have tried to ensure I safely "wxPostEvent" events from
the worker thread for any meaningful communication.
I don't make any direct GUI fn calls, but - the worker thread does
update some global data that the main thread uses in "updateUI events"
to update the window title, status bar, etc.
Does that qualify?

Helgrind complained about that - I will eliminate it later, but I
don't expect it's a problem.

Maybe it is a problem?
...
Unfortunately I'm not using that particular widget at all. :)
Also, I'm not sure things are completely locked up - I don't get my
custom events firing that I have posted back from the worker thread,
and the GUI is locked up (menu's blank and grey, etc) but move events
still come through, I can see it in the debugger.
Very strange.

cheers

Nick
Andreas Mohr
15 years ago
Permalink
...
At this point I'm now merely able to say "perhaps".
But due to wxString objects in events, one should now use wxQueueEvent
(see previous discussions or maybe source).
Post by Nick Aschberger
Helgrind complained about that - I will eliminate it later, but I
don't expect it's a problem.
Maybe it is a problem?
It might still be corruption via secondary-thread activity,
but maybe the primary thread itself has erroneous overflows
which cause corruption of display-related internals. Who knows...
Post by Nick Aschberger
Unfortunately I'm not using that particular widget at all. :)
Any file open/save dialog activity would do that - but I'm not entirely
sure whether only wxFileDialog pops up such dialogs.
Post by Nick Aschberger
Also, I'm not sure things are completely locked up - I don't get my
custom events firing that I have posted back from the worker thread,
and the GUI is locked up (menu's blank and grey, etc) but move events
still come through, I can see it in the debugger.
Very strange.
If your socket communication is wxSocket-based, then ponder using
CURL instead. wxSocket was/is iffy (on 2 of 3 platforms).
(see previous discussions)

Andreas Mohr
Nick Aschberger
15 years ago
Permalink
Hi Again,
...
Ok, I will mutex all that, and see if it helps.
...
I am using wx sockets - what was iffy about wxSocket? I have currently
had no socket problems, unless it's causing this error. :)

cheers

Nick
Andreas Mohr
15 years ago
Permalink
Hi,
Post by Nick Aschberger
Hi Again,
Post by Andreas Mohr
It might still be corruption via secondary-thread activity,
but maybe the primary thread itself has erroneous overflows
which cause corruption of display-related internals. Who knows...
Ok, I will mutex all that, and see if it helps.
And avoid inter-thread object data sharing
(assign objects newly-created via their character buffers,
not direct assignment with internal data objects).
...locking isn't all that counts, unfortunately ;)
(otherwise our daily work would be awfully boring :-P)
Post by Nick Aschberger
I am using wx sockets - what was iffy about wxSocket? I have currently
had no socket problems, unless it's causing this error. :)
Both Linux and Mac had/ve(?) issues such as thread-safety.

Andreas Mohr
Nick Aschberger
15 years ago
Permalink
Hi Again,
Post by Andreas Mohr
Post by Nick Aschberger
Ok, I will mutex all that, and see if it helps.
And avoid inter-thread object data sharing
(assign objects newly-created via their character buffers,
not direct assignment with internal data objects).
...locking isn't all that counts, unfortunately ;)
(otherwise our daily work would be awfully boring :-P)
What do you mean by this?

Don't use:
wxString mystring = someotherstring;

use:

wxString mystring = someotherstring.c_str();

or:

wxString mystring = wxString(someotherstring.c_str());

or something else?

cheers

Nick
Nick Aschberger
15 years ago
Permalink
Post by Andreas Mohr
Post by Nick Aschberger
I am using wx sockets - what was iffy about wxSocket? I have currently
had no socket problems, unless it's causing this error. :)
Both Linux and Mac had/ve(?) issues such as thread-safety.
I notice that when using helgrind, helgrind complains quite a lot
about the main thread accessing the socket, it seems to be responding
to socket events! I have the socket events disabled, and I use the
socket exclusively in the worker thread.
What is the deal? Why is the main thread doing anything with the
socket?

I am considering ripping out wxsocket and using boost::asio, but I'm
not sure this will fix anything...

cheers

Nick
Andreas Mohr
15 years ago
Permalink
...
Note that there are TWO different ways to disable socket notifications
on wxSocket, AFAIR depending on which platform one is.
I failed to realize the correct one to use at the beginning,
but setting it up properly didn't help either.

And yes, wxSocket usually does inter-thread activity,
and it was(?) not safe (mainly in the lower layers, not user-facing parts).

Probably socket notifications are simply shoveled to the main thread
to "have them" in case a handler is there to take them into account.


BTW, "partially safe" usually means it's just as good as "entirely unsafe" ;)
(since "partially safe" has the major disadvantage that in the
"entirely unsafe" case an enduser would realize to back off immediately
from using an application in any serious way)


Helgrind does log some false positives, though (especially in case of
- properly protected - access via atomic ops instead of a full mutex).
Post by Nick Aschberger
I am considering ripping out wxsocket and using boost::asio, but I'm
not sure this will fix anything...
In the case of CURL, it can be used within 2 productive hours.
(although my use is main-thread-only currently, and conversion to
secondary-thread has not been done yet since that code area would need
a general overhaul).

Andreas Mohr
Nick Aschberger
15 years ago
Permalink
Hi All,

Andreas: Big thanks for your help, I have the problem resolved.

In the end, I have replaced wxsockets with boost::asio sockets, and
the problem no longer occurs.

I think you are right in suggesting there is something thread-unsafe
that causes problems when using wxsockets.
Even though I had socket events disabled, helgrind was showing the
main thread waking up and doing... something... with the socket
events. Occasionally it would mess up completely.

My app is running well!

cheers

Nick
...
--
Please read http://www.wxwidgets.org/support/mlhowto.htm before posting.

To unsubscribe, send email to wx-users+***@googlegroups.com
or visit http://groups.google.com/group/wx-users
Andreas Mohr
15 years ago
Permalink
Post by Nick Aschberger
Hi All,
Andreas: Big thanks for your help, I have the problem resolved.
Nice!
Post by Nick Aschberger
In the end, I have replaced wxsockets with boost::asio sockets, and
the problem no longer occurs.
...which isn't absolute proof that the problem no longer exists ;))
(but it's hopefully pretty safe to assume that it's gone)
Post by Nick Aschberger
I think you are right in suggesting there is something thread-unsafe
that causes problems when using wxsockets.
That should be investigated or at least documented, I think.
At the time that I was using wxSockets I was having trouble trying to find
a way to fix it (after all we're talking about several platforms and
several layers, and I was pretty much a C++ freshman),
but a correction would be nice.
Alternatively, one could simply adopt the opinion that a GUI toolkit
shouldn't necessarily be used for non-GUI purposes :-P
Post by Nick Aschberger
My app is running well!
Let me guess, it isn't using a couple dozen sufficiently unstable
threads in a single address space? *smirk*

Andreas Mohr
--
Please read http://www.wxwidgets.org/support/mlhowto.htm before posting.

To unsubscribe, send email to wx-users+***@googlegroups.com
or visit http://groups.google.com/group/wx-users
Nick Aschberger
15 years ago
Permalink
Thanks Andreas,

Some more information:
- I can confirm that the problem shows up on a centos 4 (redhat4)
machine as well
- Also, the app still gets move/paint events, it's just that the GUI
is locked up, menu's unresponsive, etc.

I notice this in the install-gtk.txt file:

"wxWidgets/Gtk requires a thread library and X libraries known to work
with
threads. This is the case on all commercial Unix-Variants and all
Linux-Versions that are based on glibc 2 except RedHat 5.0 which is
broken in
many aspects. As of writing this, virtually all Linux distributions
have
correct glibc 2 support."

So... that's my exact OS. That's not good news for my threaded app,
when the worker thread posts events to the main GUI thread is it?

Can anyone confirm what the symptoms of this "GTK broken-ness" is?

cheers

Nick
...
Continue reading on narkive:
Search results for 'GUI hangs inside main event loop - I stop getting events!' (Questions and Answers)
14
replies
Creating a "Why we should switch to Mac" Speech. Help Please?
started 18 years ago
desktops
Loading...