| 1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889909192939495969798991001011021031041051061071081091101111121131141151161171181191201211221231241251261271281291301311321331341351361371381391401411421431441451461471481491501511521531541551561571581591601611621631641651661671681691701711721731741751761771781791801811821831841851861871881891901911921931941951961971981992002012022032042052062072082092102112122132142152162172182192202212222232242252262272282292302312322332342352362372382392402412422432442452462472482492502512522532542552562572582592602612622632642652662672682692702712722732742752762772782792802812822832842852862872882892902912922932942952962972982993003013023033043053063073083093103113123133143153163173183193203213223233243253263273283293303313323333343353363373383393403413423433443453463473483493503513523533543553563573583593603613623633643653663673683693703713723733743753763773783793803813823833843853863873883893903913923933943953963973983994004014024034044054064074084094104114124134144154164174184194204214224234244254264274284294304314324334344354364374384394404414424434444454464474484494504514524534544554564574584594604614624634644654664674684694704714724734744754764774784794804814824834844854864874884894904914924934944954964974984995005015025035045055065075085095105115125135145155165175185195205215225235245255265275285295305315325335345355365375385395405415425435445455465475485495505515525535545555565575585595605615625635645655665675685695705715725735745755765775785795805815825835845855865875885895905915925935945955965975985996006016026036046056066076086096106116126136146156166176186196206216226236246256266276286296306316326336346356366376386396406416426436446456466476486496506516526536546556566576586596606616626636646656666676686696706716726736746756766776786796806816826836846856866876886896906916926936946956966976986997007017027037047057067077087097107117127137147157167177187197207217227237247257267277287297307317327337347357367377387397407417427437447457467477487497507517527537547557567577587597607617627637647657667677687697707717727737747757767777787797807817827837847857867877887897907917927937947957967977987998008018028038048058068078088098108118128138148158168178188198208218228238248258268278288298308318328338348358368378388398408418428438448458468478488498508518528538548558568578588598608618628638648658668678688698708718728738748758768778788798808818828838848858868878888898908918928938948958968978988999009019029039049059069079089099109119129139149159169179189199209219229239249259269279289299309319329339349359369379389399409419429439449459469479489499509519529539549559569579589599609619629639649659669679689699709719729739749759769779789799809819829839849859869879889899909919929939949959969979989991000100110021003100410051006100710081009101010111012101310141015101610171018101910201021102210231024102510261027102810291030103110321033103410351036103710381039 | <HTML><HEAD><TITLE>LinuxThreads Frequently Asked Questions</TITLE></HEAD><BODY><H1 ALIGN=center>LinuxThreads Frequently Asked Questions <BR>                 (with answers)</H1><H2 ALIGN=center>[For LinuxThreads version 0.8]</H2><HR><P><A HREF="#A">A. The big picture</A><BR><A HREF="#B">B. Getting more information</A><BR><A HREF="#C">C. Issues related to the C library</A><BR><A HREF="#D">D. Problems, weird behaviors, potential bugs</A><BR><A HREF="#E">E. Missing functions, wrong types, etc</A><BR><A HREF="#F">F. C++ issues</A><BR><A HREF="#G">G. Debugging LinuxThreads programs</A><BR><A HREF="#H">H. Compiling multithreaded code; errno madness</A><BR><A HREF="#I">I. X-Windows and other libraries</A><BR><A HREF="#J">J. Signals and threads</A><BR><A HREF="#K">K. Internals of LinuxThreads</A><P><HR><P><H2><A NAME="A">A. The big picture</A></H2><H4><A NAME="A.1">A.1: What is LinuxThreads?</A></H4>LinuxThreads is a Linux library for multi-threaded programming.It implements the Posix 1003.1c API (Application ProgrammingInterface) for threads.  It runs on any Linux system with kernel 2.0.0or more recent, and a suitable C library (see section <A HREF="C">C</A>).<P><H4><A NAME="A.2">A.2: What are threads?</A></H4>A thread is a sequential flow of control through a program.Multi-threaded programming is, thus, a form of parallel programmingwhere several threads of control are executing concurrently in theprogram.  All threads execute in the same memory space, and cantherefore work concurrently on shared data.<P>Multi-threaded programming differs from Unix-style multi-processing inthat all threads share the same memory space (and a few other systemresources, such as file descriptors), instead of running in their ownmemory space as is the case with Unix processes.<P>Threads are useful for two reasons.  First, they allow a program toexploit multi-processor machines: the threads can run in parallel onseveral processors, allowing a single program to divide its workbetween several processors, thus running faster than a single-threadedprogram, which runs on only one processor at a time.  Second, someprograms are best expressed as several threads of control thatcommunicate together, rather than as one big monolithic sequentialprogram.  Examples include server programs, overlapping asynchronousI/O, and graphical user interfaces.<P><H4><A NAME="A.3">A.3: What is POSIX 1003.1c?</A></H4>It's an API for multi-threaded programming standardized by IEEE aspart of the POSIX standards.  Most Unix vendors have endorsed thePOSIX 1003.1c standard.  Implementations of the 1003.1c API arealready available under Sun Solaris 2.5, Digital Unix 4.0,Silicon Graphics IRIX 6, and should soon be available from othervendors such as IBM and HP.  More generally, the 1003.1c API isreplacing relatively quickly the proprietary threads library that weredeveloped previously under Unix, such as Mach cthreads, Solaristhreads, and IRIX sprocs.  Thus, multithreaded programs using the1003.1c API are likely to run unchanged on a wide variety of Unixplatforms.<P><H4><A NAME="A.4">A.4: What is the status of LinuxThreads?</A></H4>LinuxThreads implements almost all of Posix 1003.1c, as well as a fewextensions.  The only part of LinuxThreads that does not conform yetto Posix is signal handling (see section <A HREF="#J">J</A>).  Apartfrom the signal stuff, all the Posix 1003.1c base functionality,as well as a number of optional extensions, are provided and conformto the standard (to the best of my knowledge).The signal stuff is hard to get right, at least without special kernelsupport, and while I'm definitely looking at ways to implement thePosix behavior for signals, this might take a long time before it'scompleted.<P><H4><A NAME="A.5">A.5: How stable is LinuxThreads?</A></H4>The basic functionality (thread creation and termination, mutexes,conditions, semaphores) is very stable.  Several industrial-strengthprograms, such as the AOL multithreaded Web server, use LinuxThreadsand seem quite happy about it.  There used to be some rough edges inthe LinuxThreads / C library interface with libc 5, but glibc 2fixes all of those problems and is now the standard C library on majorLinux distributions (see section <A HREF="#C">C</A>). <P><HR><P><H2><A NAME="B">B.  Getting more information</A></H2><H4><A NAME="B.1">B.1: What are good books and other sources ofinformation on POSIX threads?</A></H4>The FAQ for comp.programming.threads lists several books:<A HREF="http://www.serpentine.com/~bos/threads-faq/">http://www.serpentine.com/~bos/threads-faq/</A>.<P>There are also some online tutorials. Follow the links from theLinuxThreads web page:<A HREF="http://pauillac.inria.fr/~xleroy/linuxthreads">http://pauillac.inria.fr/~xleroy/linuxthreads</A>.<P><H4><A NAME="B.2">B.2: I'd like to be informed of future developments onLinuxThreads. Is there a mailing list for this purpose?</A></H4>I post LinuxThreads-related announcements on the newsgroup<A HREF="news:comp.os.linux.announce">comp.os.linux.announce</A>,and also on the mailing list<code>linux-threads@magenet.com</code>.You can subscribe to the latter by writing<A HREF="mailto:majordomo@magenet.com">majordomo@magenet.com</A>.<P><H4><A NAME="B.3">B.3: What are good places for discussingLinuxThreads?</A></H4>For questions about programming with POSIX threads in general, usethe newsgroup<A HREF="news:comp.programming.threads">comp.programming.threads</A>.Be sure you read the<A HREF="http://www.serpentine.com/~bos/threads-faq/">FAQ</A>for this group before you post.<P>For Linux-specific questions, use<AHREF="news:comp.os.linux.development.apps">comp.os.linux.development.apps</A>and <AHREF="news:comp.os.linux.development.kernel">comp.os.linux.development.kernel</A>.The latter is especially appropriate for questions relative to theinterface between the kernel and LinuxThreads.<P><H4><A NAME="B.4">B.4: How should I report a possible bug inLinuxThreads?</A></H4>If you're using glibc 2, the best way by far is to use the<code>glibcbug</code> script to mail a bug report to the glibcmaintainers. <P>If you're using an older libc, or don't have the <code>glibcbug</code>script on your machine, then e-mail me directly(<code>Xavier.Leroy@inria.fr</code>).  <P>In both cases, before sending the bug report, make sure that it is not addressed already in this FAQ.  Also, try to send a short program thatreproduces the weird behavior you observed. <P><H4><A NAME="B.5">B.5: I'd like to read the POSIX 1003.1c standard. Isit available online?</A></H4>Unfortunately, no.  POSIX standards are copyrighted by IEEE, andIEEE does not distribute them freely.  You can buy paper copies fromIEEE, but the price is fairly high ($120 or so). If you disagree withthis policy and you're an IEEE member, be sure to let them know.<P>On the other hand, you probably don't want to read the standard.  It'svery hard to read, written in standard-ese, and targeted toimplementors who already know threads inside-out.  A good book onPOSIX threads provides the same information in a much more readable form.I can personally recommend Dave Butenhof's book, <CITE>Programmingwith POSIX threads</CITE> (Addison-Wesley). Butenhof was part of thePOSIX committee and also designed the Digital Unix implementations ofPOSIX threads, and it shows.<P>Another good source of information is the X/Open Group Single Unixspecification which is available both<A HREF="http://www.rdg.opengroup.org/onlinepubs/7908799/index.html">on-line</A>and as a<A HREF="http://www.UNIX-systems.org/gosolo2/">book and CD/ROM</A>.That specification includes pretty much all the POSIX standards,including 1003.1c, with some extensions and clarifications.<P><HR><P><H2><A NAME="C">C.  Issues related to the C library</A></H2><H4><A NAME="C.1">C.1: Which version of the C library should I usewith LinuxThreads?</A></H4>The best choice by far is glibc 2, a.k.a. libc 6.  It offers very goodsupport for multi-threading, and LinuxThreads has been closelyintegrated with glibc 2.  The glibc 2 distribution contains thesources of a specially adapted version of LinuxThreads.<P>glibc 2 comes preinstalled as the default C library on several Linuxdistributions, such as RedHat 5 and up, and Debian 2.Those distributions include the version of LinuxThreads matchingglibc 2.<P><H4><A NAME="C.2">C.2: My system has libc 5 preinstalled, not glibc2.  Can I still use LinuxThreads?</H4>Yes, but you're likely to run into some problems, as libc 5 onlyoffers minimal support for threads and contains some bugs that affectmultithreaded programs. <P>The versions of libc 5 that work best with LinuxThreads arelibc 5.2.18 on the one hand, and libc 5.4.12 or later on the other hand.Avoid 5.3.12 and 5.4.7: these have problems with the per-thread errnovariable. <P><H4><A NAME="C.3">C.3: So, should I switch to glibc 2, or stay with arecent libc 5?</A></H4>I'd recommend you switch to glibc 2.  Even for single-threadedprograms, glibc 2 is more solid and more standard-conformant than libc5.  And the shortcomings of libc 5 almost preclude any seriousmulti-threaded programming.<P>Switching an already installedsystem from libc 5 to glibc 2 is not completely straightforward.See the <A HREF="http://sunsite.unc.edu/LDP/HOWTO/Glibc2-HOWTO.html">Glibc2HOWTO</A> for more information.  Much easier is (re-)installing aLinux distribution based on glibc 2, such as RedHat 6.<P><H4><A NAME="C.4">C.4: Where can I find glibc 2 and the version ofLinuxThreads that goes with it?</A></H4>On <code>prep.ai.mit.edu</code> and its many, many mirrors around the world.See <AHREF="http://www.gnu.org/order/ftp.html">http://www.gnu.org/order/ftp.html</A>for a list of mirrors.<P><H4><A NAME="C.5">C.5: Where can I find libc 5 and the version ofLinuxThreads that goes with it?</A></H4>For libc 5, see <A HREF="ftp://sunsite.unc.edu/pub/Linux/devel/GCC/"><code>ftp://sunsite.unc.edu/pub/Linux/devel/GCC/</code></A>.<P>For the libc 5 version of LinuxThreads, see<A HREF="ftp://ftp.inria.fr/INRIA/Projects/cristal/Xavier.Leroy/linuxthreads/">ftp://ftp.inria.fr/INRIA/Projects/cristal/Xavier.Leroy/linuxthreads/</A>.<P><H4><A NAME="C.6">C.6: How can I recompile the glibc 2 version of theLinuxThreads sources?</A></H4>You must transfer the whole glibc sources, then drop the LinuxThreadssources in the <code>linuxthreads/</code> subdirectory, then recompileglibc as a whole.  There are now too many inter-dependencies betweenLinuxThreads and glibc 2 to allow separate re-compilation of LinuxThreads.<P><H4><A NAME="C.7">C.7: What is the correspondence between LinuxThreads version numbers, libc version numbers, and RedHat versionnumbers?</A></H4>Here is a summary. (Information on Linux distributions other thanRedHat are welcome.)<P><TABLE><TR><TD>LinuxThreads </TD> <TD>C library</TD> <TD>RedHat</TD></TR><TR><TD>0.7, 0.71 (for libc 5)</TD> <TD>libc 5.x</TD> <TD>RH 4.2</TD></TR><TR><TD>0.7, 0.71 (for glibc 2)</TD> <TD>glibc 2.0.x</TD> <TD>RH 5.x</TD></TR><TR><TD>0.8</TD> <TD>glibc 2.1.1</TD> <TD>RH 6.0</TD></TR><TR><TD>0.8</TD> <TD>glibc 2.1.2</TD> <TD>not yet released</TD></TR></TABLE><P><HR><P><H2><A NAME="D">D. Problems, weird behaviors, potential bugs</A></H2><H4><A NAME="D.1">D.1: When I compile LinuxThreads, I run into problems infile <code>libc_r/dirent.c</code></A></H4>You probably mean:<PRE>        libc_r/dirent.c:94: structure has no member named `dd_lock'</PRE>I haven't actually seen this problem, but several users reported it.My understanding is that something is wrong in the include files ofyour Linux installation (<code>/usr/include/*</code>). Make sureyou're using a supported version of the libc 5 library. (See question <AHREF="#C.2">C.2</A>).<P><H4><A NAME="D.2">D.2: When I compile LinuxThreads, I run into problems with<CODE>/usr/include/sched.h</CODE>: there are several occurrences of<CODE>_p</CODE> that the C compiler does not understand</A></H4>Yes, <CODE>/usr/include/sched.h</CODE> that comes with libc 5.3.12 is broken.Replace it with the <code>sched.h</code> file contained in theLinuxThreads distribution.  But really you should not be using libc5.3.12 with LinuxThreads! (See question <A HREF="#C.2">C.1</A>.)<P><H4><A NAME="D.3">D.3: My program does <CODE>fdopen()</CODE> on a filedescriptor opened on a pipe.  When I link it with LinuxThreads,<CODE>fdopen()</CODE> always returns NULL!</A></H4>You're using one of the buggy versions of libc (5.3.12, 5.4.7., etc).See question <A HREF="#C.1">C.1</A> above.<P><H4><A NAME="D.4">D.4: My program creates a lot of threads, and aftera while <CODE>pthread_create()</CODE> no longer returns!</A></H4>This is known bug in the version of LinuxThreads that comes with glibc2.1.1.  An upgrade to 2.1.2 is recommended. <P><H4><A NAME="D.5">D.5: When I'm running a program that creates Nthreads, <code>top</code> or <code>ps</code>display N+2 processes that are running my program. What do all theseprocesses correspond to?</A></H4>Due to the general "one process per thread" model, there's one processfor the initial thread and N processes for the threads it createdusing <CODE>pthread_create</CODE>.  That leaves one processunaccounted for.  That extra process corresponds to the "threadmanager" thread, a thread created internally by LinuxThreads to handlethread creation and thread termination.  This extra thread is asleepmost of the time.<H4><A NAME="D.6">D.6: Scheduling seems to be very unfair when thereis strong contention on a mutex: instead of giving the mutex to eachthread in turn, it seems that it's almost always the same thread thatgets the mutex. Isn't this completely broken behavior?</A></H4>That behavior has mostly disappeared in recent releases ofLinuxThreads (version 0.8 and up).  It was fairly common in olderreleases, though.What happens in LinuxThreads 0.7 and before is the following: when athread unlocks a mutex, all other threads that were waiting on themutex are sent a signal which makes them runnable.  However, thekernel scheduler may or may not restart them immediately.  If thethread that unlocked the mutex tries to lock it again immediatelyafterwards, it is likely that it will succeed, because the threadshaven't yet restarted.  This results in an apparently very unfairbehavior, when the same thread repeatedly locks and unlocks the mutex,while other threads can't lock the mutex.<P>In LinuxThreads 0.8 and up, <code>pthread_unlock</code> restarts onlyone waiting thread, and pre-assign the mutex to that thread.  Hence,if the thread that unlocked the mutex tries to lock it againimmediately, it will block until other waiting threads have had achance to lock and unlock the mutex.  This results in much fairerscheduling.<P>Notice however that even the old "unfair" behavior is perfectlyacceptable with respect to the POSIX standard: for the defaultscheduling policy, POSIX makes no guarantees of fairness, such as "thethread waiting for the mutex for the longest time always acquires itfirst".  Properly written multithreaded code avoids that kind of heavycontention on mutexes, and does not run into fairness problems.  Ifyou need scheduling guarantees, you should consider using thereal-time scheduling policies <code>SCHED_RR</code> and<code>SCHED_FIFO</code>, which have precisely defined schedulingbehaviors. <P><H4><A NAME="D.7">D.7: I have a simple test program with two threadsthat do nothing but <CODE>printf()</CODE> in tight loops, and from theprintout it seems that only one thread is running, the other doesn'tprint anything!</A></H4>Again, this behavior is characteristic of old releases of LinuxThreads(0.7 and before); more recent versions (0.8 and up) should not exhibitthis behavior.<P>The reason for this behavior is explained inquestion <A HREF="#D.6">D.6</A> above: <CODE>printf()</CODE> performslocking on <CODE>stdout</CODE>, and thus your two threads contend veryheavily for the mutex associated with <CODE>stdout</CODE>.  But if youdo some real work between two calls to <CODE>printf()</CODE>, you'llsee that scheduling becomes much smoother.<P><H4><A NAME="D.8">D.8: I've looked at <code><pthread.h></code>and there seems to be a gross error in the <code>pthread_cleanup_push</code>macro: it opens a block with <code>{</code> but does not close it!Surely you forgot a <code>}</code> at the end of the macro, right?</A></H4>Nope.  That's the way it should be.  The closing brace is provided bythe <code>pthread_cleanup_pop</code> macro.  The POSIX standardrequires <code>pthread_cleanup_push</code> and<code>pthread_cleanup_pop</code> to be used in matching pairs, at thesame level of brace nesting.  This allows<code>pthread_cleanup_push</code> to open a block in order tostack-allocate some data structure, and<code>pthread_cleanup_pop</code> to close that block.  It's ugly, butit's the standard way of implementing cleanup handlers.<P><H4><A NAME="D.9">D.9: I tried to use real-time threads and my programloops like crazy and freezes the whole machine!</A></H4>Versions of LinuxThreads prior to 0.8 are susceptible to ``livelocks''(one thread loops, consuming 100% of the CPU time) in conjunction withreal-time scheduling.  Since real-time threads and processes havehigher priority than normal Linux processes, all other processes onthe machine, including the shell, the X server, etc, cannot run andthe machine appears frozen.<P>The problem is fixed in LinuxThreads 0.8.<P><H4><A NAME="D.10">D.10: My application needs to create thousands ofthreads, or maybe even more.  Can I do this withLinuxThreads?</A></H4>No.  You're going to run into several hard limits:<UL><LI>Each thread, from the kernel's standpoint, is one process.  StockLinux kernels are limited to at most 512 processes for the super-user,and half this number for regular users.  This can be changed bychanging <code>NR_TASKS</code> in <code>include/linux/tasks.h</code>and recompiling the kernel.  On the x86 processors at least,architectural constraints seem to limit <code>NR_TASKS</code> to 4090at most.<LI>LinuxThreads contains a table of all active threads.  This tablehas room for 1024 threads at most.  To increase this limit, you mustchange <code>PTHREAD_THREADS_MAX</code> in the LinuxThreads sourcesand recompile.<LI>By default, each thread reserves 2M of virtual memory space forits stack.  This space is just reserved; actual memory is allocatedfor the stack on demand.  But still, on a 32-bit processor, the totalvirtual memory space available for the stacks is on the order of 1G,meaning that more than 500 threads will have a hard time fitting in.You can overcome this limitation by moving to a 64-bit platform, or byallocating smaller stacks yourself using the <code>setstackaddr</code>attribute.<LI>Finally, the Linux kernel contains many algorithms that run intime proportional to the number of process table entries.  Increasingthis number drastically will slow down the kernel operationsnoticeably.</UL>(Other POSIX threads libraries have similar limitations, by the way.)For all those reasons, you'd better restructure your application sothat it doesn't need more than, say, 100 threads.  For instance,in the case of a multithreaded server, instead of creating a newthread for each connection, maintain a fixed-size pool of workerthreads that pick incoming connection requests from a queue.<P><HR><P><H2><A NAME="E">E. Missing functions, wrong types, etc</A></H2><H4><A NAME="E.1">E.1: Where is <CODE>pthread_yield()</CODE> ? Howcomes LinuxThreads does not implement it?</A></H4>Because it's not part of the (final) POSIX 1003.1c standard.Several drafts of the standard contained <CODE>pthread_yield()</CODE>,but then the POSIX guys discovered it was redundant with<CODE>sched_yield()</CODE> and dropped it.  So, just use<CODE>sched_yield()</CODE> instead.<H4><A NAME="E.2">E.2: I've found some type errors in<code><pthread.h></code>.For instance, the second argument to <CODE>pthread_create()</CODE>should be a <CODE>pthread_attr_t</CODE>, not a<CODE>pthread_attr_t *</CODE>. Also, didn't you forget to declare<CODE>pthread_attr_default</CODE>?</A></H4>No, I didn't.  What you're describing is draft 4 of the POSIXstandard, which is used in OSF DCE threads.  LinuxThreads conforms to thefinal standard.  Even though the functions have the same names as indraft 4 and DCE, their calling conventions are slightly different.  Inparticular, attributes are passed by reference, not by value, anddefault attributes are denoted by the NULL pointer.  Since draft 4/DCEwill eventually disappear, you'd better port your program to use thestandard interface.<P><H4><A NAME="E.3">E.3: I'm porting an application from Solaris and Ihave to rename all thread functions from <code>thr_blah</code> to<CODE>pthread_blah</CODE>.  This is very annoying.  Why did you changeall the function names?</A></H4>POSIX did it.  The <code>thr_*</code> functions correspond to Solaristhreads, an older thread interface that you'll find only underSolaris.  The <CODE>pthread_*</CODE> functions correspond to POSIXthreads, an international standard available for many, many platforms.Even Solaris 2.5 and later support the POSIX threads interface.  So,do yourself a favor and rewrite your code to use POSIX threads: thisway, it will run unchanged under Linux, Solaris, and quite a lot ofother platforms.<P><H4><A NAME="E.4">E.4: How can I suspend and resume a thread fromanother thread? Solaris has the <CODE>thr_suspend()</CODE> and<CODE>thr_resume()</CODE> functions to do that; why don't you?</A></H4>The POSIX standard provides <B>no</B> mechanism by which a thread A cansuspend the execution of another thread B, without cooperation from B.The only way to implement a suspend/restart mechanism is to have Bcheck periodically some global variable for a suspend requestand then suspend itself on a condition variable, which another threadcan signal later to restart B.<P>Notice that <CODE>thr_suspend()</CODE> is inherently dangerous andprone to race conditions.  For one thing, there is no control on wherethe target thread stops: it can very well be stopped in the middle ofa critical section, while holding mutexes.  Also, there is noguarantee on when the target thread will actually stop.  For thesereasons, you'd be much better off using mutexes and conditionsinstead.  The only situations that really require the ability tosuspend a thread are debuggers and some kind of garbage collectors.<P>If you really must suspend a thread in LinuxThreads, you can send it a<CODE>SIGSTOP</CODE> signal with <CODE>pthread_kill</CODE>. Send<CODE>SIGCONT</CODE> for restarting it.Beware, this is specific to LinuxThreads and entirely non-portable.Indeed, a truly conforming POSIX threads implementation will stop allthreads when one thread receives the <CODE>SIGSTOP</CODE> signal!One day, LinuxThreads will implement that behavior, and thenon-portable hack with <CODE>SIGSTOP</CODE> won't work anymore.<P><H4><A NAME="E.5">E.5: Does LinuxThreads implement<CODE>pthread_attr_setstacksize()</CODE> and<CODE>pthread_attr_setstackaddr()</CODE>?</A></H4>These optional functions are provided in recent versions ofLinuxThreads (0.8 and up).  Earlier releases did not provide theseoptional components of the POSIX standard.<P>Even if <CODE>pthread_attr_setstacksize()</CODE> and<CODE>pthread_attr_setstackaddr()</CODE> are now provided, we stillrecommend that you do not use them unless you really have strongreasons for doing so.  The default stack allocation strategy forLinuxThreads is nearly optimal: stacks start small (4k) andautomatically grow on demand to a fairly large limit (2M).Moreover, there is no portable way to estimate the stack requirementsof a thread, so setting the stack size yourself makes your programless reliable and non-portable.<P><H4><A NAME="E.6">E.6: LinuxThreads does not support the<CODE>PTHREAD_SCOPE_PROCESS</CODE> value of the "contentionscope"attribute.  Why? </A></H4>With a "one-to-one" model, as in LinuxThreads (one kernel executioncontext per thread), there is only one scheduler for all processes andall threads on the system.  So, there is no way to obtain the behavior of<CODE>PTHREAD_SCOPE_PROCESS</CODE>.<H4><A NAME="E.7">E.7: LinuxThreads does not implement process-sharedmutexes, conditions, and semaphores. Why?</A></H4>This is another optional component of the POSIX standard.  Portableapplications should test <CODE>_POSIX_THREAD_PROCESS_SHARED</CODE>before using this facility.<P>The goal of this extension is to allow different processes (withdifferent address spaces) to synchronize through mutexes, conditionsor semaphores allocated in shared memory (either SVR4 shared memorysegments or <CODE>mmap()</CODE>ed files).<P>The reason why this does not work in LinuxThreads is that mutexes,conditions, and semaphores are not self-contained: their waitingqueues contain pointers to linked lists of thread descriptors, andthese pointers are meaningful only in one address space.<P>Matt Messier and I spent a significant amount of time trying to design asuitable mechanism for sharing waiting queues between processes.  Wecame up with several solutions that combined two of the followingthree desirable features, but none that combines all three:<UL><LI>allow sharing between processes having different UIDs<LI>supports cancellation<LI>supports <CODE>pthread_cond_timedwait</CODE></UL>We concluded that kernel support is required to share mutexes,conditions and semaphores between processes.  That's one place whereLinus Torvalds's intuition that "all we need in the kernel is<CODE>clone()</CODE>" fails.<P>Until suitable kernel support is available, you'd better usetraditional interprocess communications to synchronize differentprocesses: System V semaphores and message queues, or pipes, or sockets.<P><HR><P><H2><A NAME="F">F. C++ issues</A></H2><H4><A NAME="F.1">F.1: Are there C++ wrappers for LinuxThreads?</A></H4>Douglas Schmidt's ACE library contains, among a lot of otherthings, C++ wrappers for LinuxThreads and quite a number of otherthread libraries.  Check out<A HREF="http://www.cs.wustl.edu/~schmidt/ACE.html">http://www.cs.wustl.edu/~schmidt/ACE.html</A><P><H4><A NAME="F.2">F.2: I'm trying to use LinuxThreads from a C++program, and the compiler complains about the third argument to<CODE>pthread_create()</CODE> !</A></H4>You're probably trying to pass a class member function or someother C++ thing as third argument to <CODE>pthread_create()</CODE>.Recall that <CODE>pthread_create()</CODE> is a C function, and it mustbe passed a C function as third argument.<P><H4><A NAME="F.3">F.3: I'm trying to use LinuxThreads in conjunctionwith libg++, and I'm having all sorts of trouble.</A></H4>>From what I understand, thread support in libg++ is completely broken,especially with respect to locking of iostreams.  H.J.Lu wrote:<BLOCKQUOTE>If you want to use thread, I can only suggest egcs and glibc. Youcan find egcs at<A HREF="http://www.cygnus.com/egcs">http://www.cygnus.com/egcs</A>.egcs has libsdtc++, which is MT safe under glibc 2. If you reallywant to use the libg++, I have a libg++ add-on for egcs.</BLOCKQUOTE><HR><P><H2><A NAME="G">G. Debugging LinuxThreads programs</A></H2><H4><A NAME="G.1">G.1: Can I debug LinuxThreads program using gdb?</A></H4>Yes, but not with the stock gdb 4.17.  You need a specially patchedversion of gdb 4.17 developed by Eric Paire and colleages at The OpenGroup, Grenoble.  The patches against gdb 4.17 are available at<A HREF="http://www.gr.opengroup.org/java/jdk/linux/debug.htm"><code>http://www.gr.opengroup.org/java/jdk/linux/debug.htm</code></A>.Precompiled binaries of the patched gdb are available in RedHat's RPMformat at <AHREF="http://odin.appliedtheory.com/"><code>http://odin.appliedtheory.com/</code></A>.<P>Some Linux distributions provide an already-patched version of gdb;others don't.  For instance, the gdb in RedHat 5.2 is thread-aware,but apparently not the one in RedHat 6.0.  Just ask (politely) themakers of your Linux distributions to please make sure that they applythe correct patches to gdb.<P><H4><A NAME="G.2">G.2: Does it work with post-mortem debugging?</A></H4>Not very well.  Generally, the core file does not correspond to thethread that crashed.  The reason is that the kernel will not dump corefor a process that shares its memory with other processes, such as theother threads of your program.  So, the thread that crashes silentlydisappears without generating a core file.  Then, all other threads ofyour program die on the same signal that killed the crashing thread.(This is required behavior according to the POSIX standard.)  The lastone that dies is no longer sharing its memory with anyone else, so thekernel generates a core file for that thread.  Unfortunately, that'snot the thread you are interested in.<H4><A NAME="G.3">G.3: Any other ways to debug multithreaded programs, then?</A></H4>Assertions and <CODE>printf()</CODE> are your best friends.  Try to debugsequential parts in a single-threaded program first.  Then, put<CODE>printf()</CODE> statements all over the place to get execution traces.Also, check invariants often with the <CODE>assert()</CODE> macro.  In truth,there is no other effective way (save for a full formal proof of yourprogram) to track down concurrency bugs.  Debuggers are not reallyeffective for subtle concurrency problems, because they disruptprogram execution too much.<P><HR><P><H2><A NAME="H">H. Compiling multithreaded code; errno madness</A></H2><H4><A NAME="H.1">H.1: You say all multithreaded code must be compiledwith <CODE>_REENTRANT</CODE> defined. What difference does it make?</A></H4>It affects include files in three ways:<UL><LI> The include files define prototypes for the reentrant variants ofsome of the standard library functions,e.g. <CODE>gethostbyname_r()</CODE> as a reentrant equivalent to<CODE>gethostbyname()</CODE>.<P><LI> If <CODE>_REENTRANT</CODE> is defined, some<code><stdio.h></code> functions are no longer defined as macros,e.g. <CODE>getc()</CODE> and <CODE>putc()</CODE>. In a multithreadedprogram, stdio functions require additional locking, which the macrosdon't perform, so we must call functions instead.<P><LI> More importantly, <code><errno.h></code> redefines errno when<CODE>_REENTRANT</CODE> isdefined, so that errno refers to the thread-specific errno locationrather than the global errno variable.  This is achieved by thefollowing <code>#define</code> in <code><errno.h></code>:<PRE>        #define errno (*(__errno_location()))</PRE>which causes each reference to errno to call the<CODE>__errno_location()</CODE> function for obtaining the locationwhere error codes are stored.  libc provides a default definition of<CODE>__errno_location()</CODE> that always returns<code>&errno</code> (the address of the global errno variable). Thus,for programs not linked with LinuxThreads, defining<CODE>_REENTRANT</CODE> makes no difference w.r.t. errno processing.But LinuxThreads redefines <CODE>__errno_location()</CODE> to return alocation in the thread descriptor reserved for holding the currentvalue of errno for the calling thread.  Thus, each thread operates ona different errno location.</UL><P><H4><A NAME="H.2">H.2: Why is it so important that each thread has itsown errno variable? </A></H4>If all threads were to store error codes in the same, global errnovariable, then the value of errno after a system call or libraryfunction returns would be unpredictable:  between the time a systemcall stores its error code in the global errno and your code inspectserrno to see which error occurred, another thread might have storedanother error code in the same errno location. <P><H4><A NAME="H.3">H.3: What happens if I link LinuxThreads with codenot compiled with <CODE>-D_REENTRANT</CODE>?</A></H4>Lots of trouble.  If the code uses <CODE>getc()</CODE> or<CODE>putc()</CODE>, it will perform I/O without proper interlockingof the stdio buffers; this can cause lost output, duplicate output, orjust crash other stdio functions.  If the code consults errno, it willget back the wrong error code.  The following code fragment is atypical example:<PRE>        do {          r = read(fd, buf, n);          if (r == -1) {            if (errno == EINTR)   /* an error we can handle */              continue;            else {                /* other errors are fatal */              perror("read failed");              exit(100);            }          }        } while (...);</PRE>Assume this code is not compiled with <CODE>-D_REENTRANT</CODE>, andlinked with LinuxThreads.  At run-time, <CODE>read()</CODE> isinterrupted.  Since the C library was compiled with<CODE>-D_REENTRANT</CODE>, <CODE>read()</CODE> stores its error codein the location pointed to by <CODE>__errno_location()</CODE>, whichis the thread-local errno variable.  Then, the code above sees that<CODE>read()</CODE> returns -1 and looks up errno.  Since<CODE>_REENTRANT</CODE> is not defined, the reference to errnoaccesses the global errno variable, which is most likely 0.  Hence thecode concludes that it cannot handle the error and stops.<P><H4><A NAME="H.4">H.4: With LinuxThreads, I can no longer use the signals<code>SIGUSR1</code> and <code>SIGUSR2</code> in my programs! Why? </A></H4>The short answer is: because the Linux kernel you're using does notsupport realtime signals.  <P>LinuxThreads needs two signals for its internal operation.One is used to suspend and restart threads blocked on mutex, conditionor semaphore operations.  The other is used for threadcancellation.<P>On ``old'' kernels (2.0 and early 2.1 kernels), there are only 32signals available and the kernel reserves all of them but two:<code>SIGUSR1</code> and <code>SIGUSR2</code>.  So, LinuxThreads hasno choice but use those two signals.<P>On recent kernels (2.2 and up), more than 32 signals are provided inthe form of realtime signals. When run on one of those kernels,LinuxThreads uses two reserved realtime signals for its internaloperation, thus leaving <code>SIGUSR1</code> and <code>SIGUSR2</code>free for user code.  (This works only with glibc, not with libc 5.) <P><H4><A NAME="H.5">H.5: Is the stack of one thread visible from theother threads?  Can I pass a pointer into my stack to other threads?</A></H4>Yes, you can -- if you're very careful.  The stacks are indeed visiblefrom all threads in the system.  Some non-POSIX thread libraries seemto map the stacks for all threads at the same virtual addresses andchange the memory mapping when they switch from one thread toanother.  But this is not the case for LinuxThreads, as it would makecontext switching between threads more expensive, and at any ratemight not conform to the POSIX standard.<P>So, you can take the address of an "auto" variable and pass it toother threads via shared data structures.  However, you need to makeabsolutely sure that the function doing this will not return as longas other threads need to access this address.  It's the usual mistakeof returning the address of an "auto" variable, only made much worsebecause of concurrency.  It's much, much safer to systematicallyheap-allocate all shared data structures. <P><HR><P><H2><A NAME="I">I.  X-Windows and other libraries</A></H2><H4><A NAME="I.1">I.1: My program uses both Xlib and LinuxThreads.It stops very early with an "Xlib: unknown 0 error" message.  Whatdoes this mean? </A></H4>That's a prime example of the errno problem described in question <AHREF="#H.2">H.2</A>.  The binaries for Xlib you're using have not beencompiled with <CODE>-D_REENTRANT</CODE>.  It happens Xlib contains apiece of code very much like the one in question <AHREF="#H.2">H.2</A>.  So, your Xlib fetches the error code from thewrong errno location and concludes that an error it cannot handleoccurred.<P><H4><A NAME="I.2">I.2: So, what can I do to build a multithreaded XWindows client? </A></H4>The best solution is to use X libraries that have been compiled withmultithreading options set.  Linux distributions that come with glibc2 as the main C library generally provide thread-safe X libraries.At least, that seems to be the case for RedHat 5 and later.<P>You can try to recompile yourself the X libraries with multithreadingoptions set.  They contain optional support for multithreading; it'sjust that the binaries provided by your Linux distribution were builtwithout this support.  See the file <code>README.Xfree3.3</code> inthe LinuxThreads distribution for patches and info on how to compilethread-safe X libraries from the Xfree3.3 distribution.  The Xfree3.3sources are readily available in most Linux distributions, e.g. as asource RPM for RedHat.  Be warned, however, that X Windows is a hugesystem, and recompiling even just the libraries takes a lot of timeand disk space.<P>Another, less involving solution is to call X functions only from themain thread of your program.  Even if all threads have their own errnolocation, the main thread uses the global errno variable for its errnolocation.  Thus, code not compiled with <code>-D_REENTRANT</code>still "sees" the right error values if it executes in the main threadonly. <P><H4><A NAME="I.2">This is a lot of work. Don't you have precompiledthread-safe X libraries that you could distribute?</A></H4>No, I don't.  Sorry.  But consider installing a Linux distributionthat comes with thread-safe X libraries, such as RedHat 6.<P><H4><A NAME="I.3">I.3: Can I use library FOO in a multithreadedprogram?</A></H4>Most libraries cannot be used "as is" in a multithreaded program.For one thing, they are not necessarily thread-safe: callingsimultaneously two functions of the library from two threads might notwork, due to internal use of global variables and the like.  Second,the libraries must have been compiled with <CODE>-D_REENTRANT</CODE> to avoidthe errno problems explained in question <A HREF="#H.2">H.2</A>.<P><H4><A NAME="I.4">I.4: What if I make sure that only one thread callsfunctions in these libraries?</A></H4>This avoids problems with the library not being thread-safe.  Butyou're still vulnerable to errno problems.  At the very least, arecompile of the library with <CODE>-D_REENTRANT</CODE> is needed.<P><H4><A NAME="I.5">I.5: What if I make sure that only the main threadcalls functions in these libraries?</A></H4>That might actually work.  As explained in question <A HREF="#I.1">I.1</A>,the main thread uses the global errno variable, and can thereforeexecute code not compiled with <CODE>-D_REENTRANT</CODE>.<P><H4><A NAME="I.6">I.6: SVGAlib doesn't work with LinuxThreads.  Why?</A></H4>Because both LinuxThreads and SVGAlib use the signals<code>SIGUSR1</code> and <code>SIGUSR2</code>.  See question <AHREF="#H.4">H.4</A>.<P><HR><P><H2><A NAME="J">J.  Signals and threads</A></H2><H4><A NAME="J.1">J.1: When it comes to signals, what is sharedbetween threads and what isn't?</A></H4>Signal handlers are shared between all threads: when a thread calls<CODE>sigaction()</CODE>, it sets how the signal is handled not onlyfor itself, but for all other threads in the program as well.<P>On the other hand, signal masks are per-thread: each thread chooseswhich signals it blocks independently of others.  At thread creationtime, the newly created thread inherits the signal mask of the threadcalling <CODE>pthread_create()</CODE>.  But afterwards, the new threadcan modify its signal mask independently of its creator thread.<P><H4><A NAME="J.2">J.2: When I send a <CODE>SIGKILL</CODE> to aparticular thread using <CODE>pthread_kill</CODE>, all my threads arekilled!</A></H4>That's how it should be.  The POSIX standard mandates that all threadsshould terminate when the process (i.e. the collection of all threadsrunning the program) receives a signal whose effect is toterminate the process (such as <CODE>SIGKILL</CODE> or <CODE>SIGINT</CODE>when no handler is installed on that signal).  This behavior makes alot of sense: when you type "ctrl-C" at the keyboard, or when a threadcrashes on a division by zero or a segmentation fault, you really wantall threads to stop immediately, not just the one that caused thesegmentation violation or that got the <CODE>SIGINT</CODE> signal.(This assumes default behavior for those signals; see question<A HREF="#J.3">J.3</A> if you install handlers for those signals.)<P>If you're trying to terminate a thread without bringing the wholeprocess down, use <code>pthread_cancel()</code>.<P><H4><A NAME="J.3">J.3: I've installed a handler on a signal.  Whichthread executes the handler when the signal is received?</A></H4>If the signal is generated by a thread during its execution (e.g. athread executes a division by zero and thus generates a<CODE>SIGFPE</CODE> signal), then the handler is executed by thatthread.  This also applies to signals generated by<CODE>raise()</CODE>.<P>If the signal is sent to a particular thread using<CODE>pthread_kill()</CODE>, then that thread executes the handler.<P>If the signal is sent via <CODE>kill()</CODE> or the tty interface(e.g. by pressing ctrl-C), then the POSIX specs say that the handleris executed by any thread in the process that does not currently blockthe signal.  In other terms, POSIX considers that the signal is sentto the process (the collection of all threads) as a whole, and anythread that is not blocking this signal can then handle it.<P>The latter case is where LinuxThreads departs from the POSIX specs.In LinuxThreads, there is no real notion of ``the process as a whole'':in the kernel, each thread is really a distinct process with adistinct PID, and signals sent to the PID of a thread can only behandled by that thread.  As long as no thread is blocking the signal,the behavior conforms to the standard: one (unspecified) thread of theprogram handles the signal.  But if the thread to which PID the signalis sent blocks the signal, and some other thread does not block thesignal, then LinuxThreads will simply queue inthat thread and execute the handler only when that thread unblocksthe signal, instead of executing the handler immediately in the otherthread that does not block the signal.<P>This is to be viewed as a LinuxThreads bug, but I currently don't seeany way to implement the POSIX behavior without kernel support.<P><H4><A NAME="J.3">J.3: How shall I go about mixing signals and threadsin my program? </A></H4>The less you mix them, the better.  Notice that all<CODE>pthread_*</CODE> functions are not async-signal safe, meaningthat you should not call them from signal handlers.  Thisrecommendation is not to be taken lightly: your program can deadlockif you call a <CODE>pthread_*</CODE> function from a signal handler!<P>The only sensible things you can do from a signal handler is set aglobal flag, or call <CODE>sem_post</CODE> on a semaphore, to recordthe delivery of the signal.  The remainder of the program can theneither poll the global flag, or use <CODE>sem_wait()</CODE> and<CODE>sem_trywait()</CODE> on the semaphore.<P>Another option is to do nothing in the signal handler, and dedicateone thread (preferably the initial thread) to wait synchronously forsignals, using <CODE>sigwait()</CODE>, and send messages to the otherthreads accordingly.<H4><A NAME="J.4">J.4: When one thread is blocked in<CODE>sigwait()</CODE>, other threads no longer receive the signals<CODE>sigwait()</CODE> is waiting for!  What happens? </A></H4>It's an unfortunate consequence of how LinuxThreads implements<CODE>sigwait()</CODE>.  Basically, it installs signal handlers on allsignals waited for, in order to record which signal was received.Since signal handlers are shared with the other threads, thistemporarily deactivates any signal handlers you might have previouslyinstalled on these signals.<P>Though surprising, this behavior actually seems to conform to thePOSIX standard.  According to POSIX, <CODE>sigwait()</CODE> isguaranteed to work as expected only if all other threads in theprogram block the signals waited for (otherwise, the signals could bedelivered to other threads than the one doing <CODE>sigwait()</CODE>,which would make <CODE>sigwait()</CODE> useless).  In this particularcase, the problem described in this question does not appear.<P>One day, <CODE>sigwait()</CODE> will be implemented in the kernel,along with others POSIX 1003.1b extensions, and <CODE>sigwait()</CODE>will have a more natural behavior (as well as better performances).<P><HR><P><H2><A NAME="K">K.  Internals of LinuxThreads</A></H2><H4><A NAME="K.1">K.1: What is the implementation model forLinuxThreads?</A></H4>LinuxThreads follows the so-called "one-to-one" model: each thread isactually a separate process in the kernel.  The kernel scheduler takescare of scheduling the threads, just like it schedules regularprocesses.  The threads are created with the Linux<code>clone()</code> system call, which is a generalization of<code>fork()</code> allowing the new process to share the memoryspace, file descriptors, and signal handlers of the parent.<P>Advantages of the "one-to-one" model include:<UL><LI> minimal overhead on CPU-intensive multiprocessing (withabout one thread per processor);<LI> minimal overhead on I/O operations;<LI> a simple and robust implementation (the kernel scheduler doesmost of the hard work for us).</UL>The main disadvantage is more expensive context switches on mutex andcondition operations, which must go through the kernel.  This ismitigated by the fact that context switches in the Linux kernel arepretty efficient.<P><H4><A NAME="K.2">K.2: Have you considered other implementationmodels?</A></H4>There are basically two other models.  The "many-to-one" modelrelies on a user-level scheduler that context-switches between thethreads entirely in user code; viewed from the kernel, there is onlyone process running.  This model is completely out of the question forme, since it does not take advantage of multiprocessors, and requireunholy magic to handle blocking I/O operations properly.  There areseveral user-level thread libraries available for Linux, but I foundall of them deficient in functionality, performance, and/or robustness.<P>The "many-to-many" model combines both kernel-level and user-levelscheduling: several kernel-level threads run concurrently, eachexecuting a user-level scheduler that selects between user threads.Most commercial Unix systems (Solaris, Digital Unix, IRIX) implementPOSIX threads this way.  This model combines the advantages of boththe "many-to-one" and the "one-to-one" model, and is attractivebecause it avoids the worst-case behaviors of both models --especially on kernels where context switches are expensive, such asDigital Unix.  Unfortunately, it is pretty complex to implement, andrequires kernel support which Linux does not provide.  Linus Torvaldsand other Linux kernel developers have always been pushing the"one-to-one" model in the name of overall simplicity, and are doing apretty good job of making kernel-level context switches betweenthreads efficient.  LinuxThreads is just following the generaldirection they set.<P><HR><ADDRESS>Xavier.Leroy@inria.fr</ADDRESS></BODY></HTML>
 |