FAQ.html 47 KB

1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889909192939495969798991001011021031041051061071081091101111121131141151161171181191201211221231241251261271281291301311321331341351361371381391401411421431441451461471481491501511521531541551561571581591601611621631641651661671681691701711721731741751761771781791801811821831841851861871881891901911921931941951961971981992002012022032042052062072082092102112122132142152162172182192202212222232242252262272282292302312322332342352362372382392402412422432442452462472482492502512522532542552562572582592602612622632642652662672682692702712722732742752762772782792802812822832842852862872882892902912922932942952962972982993003013023033043053063073083093103113123133143153163173183193203213223233243253263273283293303313323333343353363373383393403413423433443453463473483493503513523533543553563573583593603613623633643653663673683693703713723733743753763773783793803813823833843853863873883893903913923933943953963973983994004014024034044054064074084094104114124134144154164174184194204214224234244254264274284294304314324334344354364374384394404414424434444454464474484494504514524534544554564574584594604614624634644654664674684694704714724734744754764774784794804814824834844854864874884894904914924934944954964974984995005015025035045055065075085095105115125135145155165175185195205215225235245255265275285295305315325335345355365375385395405415425435445455465475485495505515525535545555565575585595605615625635645655665675685695705715725735745755765775785795805815825835845855865875885895905915925935945955965975985996006016026036046056066076086096106116126136146156166176186196206216226236246256266276286296306316326336346356366376386396406416426436446456466476486496506516526536546556566576586596606616626636646656666676686696706716726736746756766776786796806816826836846856866876886896906916926936946956966976986997007017027037047057067077087097107117127137147157167177187197207217227237247257267277287297307317327337347357367377387397407417427437447457467477487497507517527537547557567577587597607617627637647657667677687697707717727737747757767777787797807817827837847857867877887897907917927937947957967977987998008018028038048058068078088098108118128138148158168178188198208218228238248258268278288298308318328338348358368378388398408418428438448458468478488498508518528538548558568578588598608618628638648658668678688698708718728738748758768778788798808818828838848858868878888898908918928938948958968978988999009019029039049059069079089099109119129139149159169179189199209219229239249259269279289299309319329339349359369379389399409419429439449459469479489499509519529539549559569579589599609619629639649659669679689699709719729739749759769779789799809819829839849859869879889899909919929939949959969979989991000100110021003100410051006100710081009101010111012101310141015101610171018101910201021102210231024102510261027102810291030103110321033103410351036103710381039
  1. <HTML>
  2. <HEAD>
  3. <TITLE>LinuxThreads Frequently Asked Questions</TITLE>
  4. </HEAD>
  5. <BODY>
  6. <H1 ALIGN=center>LinuxThreads Frequently Asked Questions <BR>
  7. (with answers)</H1>
  8. <H2 ALIGN=center>[For LinuxThreads version 0.8]</H2>
  9. <HR><P>
  10. <A HREF="#A">A. The big picture</A><BR>
  11. <A HREF="#B">B. Getting more information</A><BR>
  12. <A HREF="#C">C. Issues related to the C library</A><BR>
  13. <A HREF="#D">D. Problems, weird behaviors, potential bugs</A><BR>
  14. <A HREF="#E">E. Missing functions, wrong types, etc</A><BR>
  15. <A HREF="#F">F. C++ issues</A><BR>
  16. <A HREF="#G">G. Debugging LinuxThreads programs</A><BR>
  17. <A HREF="#H">H. Compiling multithreaded code; errno madness</A><BR>
  18. <A HREF="#I">I. X-Windows and other libraries</A><BR>
  19. <A HREF="#J">J. Signals and threads</A><BR>
  20. <A HREF="#K">K. Internals of LinuxThreads</A><P>
  21. <HR>
  22. <P>
  23. <H2><A NAME="A">A. The big picture</A></H2>
  24. <H4><A NAME="A.1">A.1: What is LinuxThreads?</A></H4>
  25. LinuxThreads is a Linux library for multi-threaded programming.
  26. It implements the Posix 1003.1c API (Application Programming
  27. Interface) for threads. It runs on any Linux system with kernel 2.0.0
  28. or more recent, and a suitable C library (see section <A HREF="C">C</A>).
  29. <P>
  30. <H4><A NAME="A.2">A.2: What are threads?</A></H4>
  31. A thread is a sequential flow of control through a program.
  32. Multi-threaded programming is, thus, a form of parallel programming
  33. where several threads of control are executing concurrently in the
  34. program. All threads execute in the same memory space, and can
  35. therefore work concurrently on shared data.<P>
  36. Multi-threaded programming differs from Unix-style multi-processing in
  37. that all threads share the same memory space (and a few other system
  38. resources, such as file descriptors), instead of running in their own
  39. memory space as is the case with Unix processes.<P>
  40. Threads are useful for two reasons. First, they allow a program to
  41. exploit multi-processor machines: the threads can run in parallel on
  42. several processors, allowing a single program to divide its work
  43. between several processors, thus running faster than a single-threaded
  44. program, which runs on only one processor at a time. Second, some
  45. programs are best expressed as several threads of control that
  46. communicate together, rather than as one big monolithic sequential
  47. program. Examples include server programs, overlapping asynchronous
  48. I/O, and graphical user interfaces.<P>
  49. <H4><A NAME="A.3">A.3: What is POSIX 1003.1c?</A></H4>
  50. It's an API for multi-threaded programming standardized by IEEE as
  51. part of the POSIX standards. Most Unix vendors have endorsed the
  52. POSIX 1003.1c standard. Implementations of the 1003.1c API are
  53. already available under Sun Solaris 2.5, Digital Unix 4.0,
  54. Silicon Graphics IRIX 6, and should soon be available from other
  55. vendors such as IBM and HP. More generally, the 1003.1c API is
  56. replacing relatively quickly the proprietary threads library that were
  57. developed previously under Unix, such as Mach cthreads, Solaris
  58. threads, and IRIX sprocs. Thus, multithreaded programs using the
  59. 1003.1c API are likely to run unchanged on a wide variety of Unix
  60. platforms.<P>
  61. <H4><A NAME="A.4">A.4: What is the status of LinuxThreads?</A></H4>
  62. LinuxThreads implements almost all of Posix 1003.1c, as well as a few
  63. extensions. The only part of LinuxThreads that does not conform yet
  64. to Posix is signal handling (see section <A HREF="#J">J</A>). Apart
  65. from the signal stuff, all the Posix 1003.1c base functionality,
  66. as well as a number of optional extensions, are provided and conform
  67. to the standard (to the best of my knowledge).
  68. The signal stuff is hard to get right, at least without special kernel
  69. support, and while I'm definitely looking at ways to implement the
  70. Posix behavior for signals, this might take a long time before it's
  71. completed.<P>
  72. <H4><A NAME="A.5">A.5: How stable is LinuxThreads?</A></H4>
  73. The basic functionality (thread creation and termination, mutexes,
  74. conditions, semaphores) is very stable. Several industrial-strength
  75. programs, such as the AOL multithreaded Web server, use LinuxThreads
  76. and seem quite happy about it. There used to be some rough edges in
  77. the LinuxThreads / C library interface with libc 5, but glibc 2
  78. fixes all of those problems and is now the standard C library on major
  79. Linux distributions (see section <A HREF="#C">C</A>). <P>
  80. <HR>
  81. <P>
  82. <H2><A NAME="B">B. Getting more information</A></H2>
  83. <H4><A NAME="B.1">B.1: What are good books and other sources of
  84. information on POSIX threads?</A></H4>
  85. The FAQ for comp.programming.threads lists several books:
  86. <A HREF="http://www.serpentine.com/~bos/threads-faq/">http://www.serpentine.com/~bos/threads-faq/</A>.<P>
  87. There are also some online tutorials. Follow the links from the
  88. LinuxThreads web page:
  89. <A HREF="http://pauillac.inria.fr/~xleroy/linuxthreads">http://pauillac.inria.fr/~xleroy/linuxthreads</A>.<P>
  90. <H4><A NAME="B.2">B.2: I'd like to be informed of future developments on
  91. LinuxThreads. Is there a mailing list for this purpose?</A></H4>
  92. I post LinuxThreads-related announcements on the newsgroup
  93. <A HREF="news:comp.os.linux.announce">comp.os.linux.announce</A>,
  94. and also on the mailing list
  95. <code>linux-threads@magenet.com</code>.
  96. You can subscribe to the latter by writing
  97. <A HREF="mailto:majordomo@magenet.com">majordomo@magenet.com</A>.<P>
  98. <H4><A NAME="B.3">B.3: What are good places for discussing
  99. LinuxThreads?</A></H4>
  100. For questions about programming with POSIX threads in general, use
  101. the newsgroup
  102. <A HREF="news:comp.programming.threads">comp.programming.threads</A>.
  103. Be sure you read the
  104. <A HREF="http://www.serpentine.com/~bos/threads-faq/">FAQ</A>
  105. for this group before you post.<P>
  106. For Linux-specific questions, use
  107. <A
  108. HREF="news:comp.os.linux.development.apps">comp.os.linux.development.apps</A>
  109. and <A
  110. HREF="news:comp.os.linux.development.kernel">comp.os.linux.development.kernel</A>.
  111. The latter is especially appropriate for questions relative to the
  112. interface between the kernel and LinuxThreads.<P>
  113. <H4><A NAME="B.4">B.4: How should I report a possible bug in
  114. LinuxThreads?</A></H4>
  115. If you're using glibc 2, the best way by far is to use the
  116. <code>glibcbug</code> script to mail a bug report to the glibc
  117. maintainers. <P>
  118. If you're using an older libc, or don't have the <code>glibcbug</code>
  119. script on your machine, then e-mail me directly
  120. (<code>Xavier.Leroy@inria.fr</code>). <P>
  121. In both cases, before sending the bug report, make sure that it is not
  122. addressed already in this FAQ. Also, try to send a short program that
  123. reproduces the weird behavior you observed. <P>
  124. <H4><A NAME="B.5">B.5: I'd like to read the POSIX 1003.1c standard. Is
  125. it available online?</A></H4>
  126. Unfortunately, no. POSIX standards are copyrighted by IEEE, and
  127. IEEE does not distribute them freely. You can buy paper copies from
  128. IEEE, but the price is fairly high ($120 or so). If you disagree with
  129. this policy and you're an IEEE member, be sure to let them know.<P>
  130. On the other hand, you probably don't want to read the standard. It's
  131. very hard to read, written in standard-ese, and targeted to
  132. implementors who already know threads inside-out. A good book on
  133. POSIX threads provides the same information in a much more readable form.
  134. I can personally recommend Dave Butenhof's book, <CITE>Programming
  135. with POSIX threads</CITE> (Addison-Wesley). Butenhof was part of the
  136. POSIX committee and also designed the Digital Unix implementations of
  137. POSIX threads, and it shows.<P>
  138. Another good source of information is the X/Open Group Single Unix
  139. specification which is available both
  140. <A HREF="http://www.rdg.opengroup.org/onlinepubs/7908799/index.html">on-line</A>
  141. and as a
  142. <A HREF="http://www.UNIX-systems.org/gosolo2/">book and CD/ROM</A>.
  143. That specification includes pretty much all the POSIX standards,
  144. including 1003.1c, with some extensions and clarifications.<P>
  145. <HR>
  146. <P>
  147. <H2><A NAME="C">C. Issues related to the C library</A></H2>
  148. <H4><A NAME="C.1">C.1: Which version of the C library should I use
  149. with LinuxThreads?</A></H4>
  150. The best choice by far is glibc 2, a.k.a. libc 6. It offers very good
  151. support for multi-threading, and LinuxThreads has been closely
  152. integrated with glibc 2. The glibc 2 distribution contains the
  153. sources of a specially adapted version of LinuxThreads.<P>
  154. glibc 2 comes preinstalled as the default C library on several Linux
  155. distributions, such as RedHat 5 and up, and Debian 2.
  156. Those distributions include the version of LinuxThreads matching
  157. glibc 2.<P>
  158. <H4><A NAME="C.2">C.2: My system has libc 5 preinstalled, not glibc
  159. 2. Can I still use LinuxThreads?</H4>
  160. Yes, but you're likely to run into some problems, as libc 5 only
  161. offers minimal support for threads and contains some bugs that affect
  162. multithreaded programs. <P>
  163. The versions of libc 5 that work best with LinuxThreads are
  164. libc 5.2.18 on the one hand, and libc 5.4.12 or later on the other hand.
  165. Avoid 5.3.12 and 5.4.7: these have problems with the per-thread errno
  166. variable. <P>
  167. <H4><A NAME="C.3">C.3: So, should I switch to glibc 2, or stay with a
  168. recent libc 5?</A></H4>
  169. I'd recommend you switch to glibc 2. Even for single-threaded
  170. programs, glibc 2 is more solid and more standard-conformant than libc
  171. 5. And the shortcomings of libc 5 almost preclude any serious
  172. multi-threaded programming.<P>
  173. Switching an already installed
  174. system from libc 5 to glibc 2 is not completely straightforward.
  175. See the <A HREF="http://sunsite.unc.edu/LDP/HOWTO/Glibc2-HOWTO.html">Glibc2
  176. HOWTO</A> for more information. Much easier is (re-)installing a
  177. Linux distribution based on glibc 2, such as RedHat 6.<P>
  178. <H4><A NAME="C.4">C.4: Where can I find glibc 2 and the version of
  179. LinuxThreads that goes with it?</A></H4>
  180. On <code>prep.ai.mit.edu</code> and its many, many mirrors around the world.
  181. See <A
  182. HREF="http://www.gnu.org/order/ftp.html">http://www.gnu.org/order/ftp.html</A>
  183. for a list of mirrors.<P>
  184. <H4><A NAME="C.5">C.5: Where can I find libc 5 and the version of
  185. LinuxThreads that goes with it?</A></H4>
  186. For libc 5, see <A HREF="ftp://sunsite.unc.edu/pub/Linux/devel/GCC/"><code>ftp://sunsite.unc.edu/pub/Linux/devel/GCC/</code></A>.<P>
  187. For the libc 5 version of LinuxThreads, see
  188. <A HREF="ftp://ftp.inria.fr/INRIA/Projects/cristal/Xavier.Leroy/linuxthreads/">ftp://ftp.inria.fr/INRIA/Projects/cristal/Xavier.Leroy/linuxthreads/</A>.<P>
  189. <H4><A NAME="C.6">C.6: How can I recompile the glibc 2 version of the
  190. LinuxThreads sources?</A></H4>
  191. You must transfer the whole glibc sources, then drop the LinuxThreads
  192. sources in the <code>linuxthreads/</code> subdirectory, then recompile
  193. glibc as a whole. There are now too many inter-dependencies between
  194. LinuxThreads and glibc 2 to allow separate re-compilation of LinuxThreads.
  195. <P>
  196. <H4><A NAME="C.7">C.7: What is the correspondence between LinuxThreads
  197. version numbers, libc version numbers, and RedHat version
  198. numbers?</A></H4>
  199. Here is a summary. (Information on Linux distributions other than
  200. RedHat are welcome.)<P>
  201. <TABLE>
  202. <TR><TD>LinuxThreads </TD> <TD>C library</TD> <TD>RedHat</TD></TR>
  203. <TR><TD>0.7, 0.71 (for libc 5)</TD> <TD>libc 5.x</TD> <TD>RH 4.2</TD></TR>
  204. <TR><TD>0.7, 0.71 (for glibc 2)</TD> <TD>glibc 2.0.x</TD> <TD>RH 5.x</TD></TR>
  205. <TR><TD>0.8</TD> <TD>glibc 2.1.1</TD> <TD>RH 6.0</TD></TR>
  206. <TR><TD>0.8</TD> <TD>glibc 2.1.2</TD> <TD>not yet released</TD></TR>
  207. </TABLE>
  208. <P>
  209. <HR>
  210. <P>
  211. <H2><A NAME="D">D. Problems, weird behaviors, potential bugs</A></H2>
  212. <H4><A NAME="D.1">D.1: When I compile LinuxThreads, I run into problems in
  213. file <code>libc_r/dirent.c</code></A></H4>
  214. You probably mean:
  215. <PRE>
  216. libc_r/dirent.c:94: structure has no member named `dd_lock'
  217. </PRE>
  218. I haven't actually seen this problem, but several users reported it.
  219. My understanding is that something is wrong in the include files of
  220. your Linux installation (<code>/usr/include/*</code>). Make sure
  221. you're using a supported version of the libc 5 library. (See question <A
  222. HREF="#C.2">C.2</A>).<P>
  223. <H4><A NAME="D.2">D.2: When I compile LinuxThreads, I run into problems with
  224. <CODE>/usr/include/sched.h</CODE>: there are several occurrences of
  225. <CODE>_p</CODE> that the C compiler does not understand</A></H4>
  226. Yes, <CODE>/usr/include/sched.h</CODE> that comes with libc 5.3.12 is broken.
  227. Replace it with the <code>sched.h</code> file contained in the
  228. LinuxThreads distribution. But really you should not be using libc
  229. 5.3.12 with LinuxThreads! (See question <A HREF="#C.2">C.1</A>.)<P>
  230. <H4><A NAME="D.3">D.3: My program does <CODE>fdopen()</CODE> on a file
  231. descriptor opened on a pipe. When I link it with LinuxThreads,
  232. <CODE>fdopen()</CODE> always returns NULL!</A></H4>
  233. You're using one of the buggy versions of libc (5.3.12, 5.4.7., etc).
  234. See question <A HREF="#C.1">C.1</A> above.<P>
  235. <H4><A NAME="D.4">D.4: My program creates a lot of threads, and after
  236. a while <CODE>pthread_create()</CODE> no longer returns!</A></H4>
  237. This is known bug in the version of LinuxThreads that comes with glibc
  238. 2.1.1. An upgrade to 2.1.2 is recommended. <P>
  239. <H4><A NAME="D.5">D.5: When I'm running a program that creates N
  240. threads, <code>top</code> or <code>ps</code>
  241. display N+2 processes that are running my program. What do all these
  242. processes correspond to?</A></H4>
  243. Due to the general "one process per thread" model, there's one process
  244. for the initial thread and N processes for the threads it created
  245. using <CODE>pthread_create</CODE>. That leaves one process
  246. unaccounted for. That extra process corresponds to the "thread
  247. manager" thread, a thread created internally by LinuxThreads to handle
  248. thread creation and thread termination. This extra thread is asleep
  249. most of the time.
  250. <H4><A NAME="D.6">D.6: Scheduling seems to be very unfair when there
  251. is strong contention on a mutex: instead of giving the mutex to each
  252. thread in turn, it seems that it's almost always the same thread that
  253. gets the mutex. Isn't this completely broken behavior?</A></H4>
  254. That behavior has mostly disappeared in recent releases of
  255. LinuxThreads (version 0.8 and up). It was fairly common in older
  256. releases, though.
  257. What happens in LinuxThreads 0.7 and before is the following: when a
  258. thread unlocks a mutex, all other threads that were waiting on the
  259. mutex are sent a signal which makes them runnable. However, the
  260. kernel scheduler may or may not restart them immediately. If the
  261. thread that unlocked the mutex tries to lock it again immediately
  262. afterwards, it is likely that it will succeed, because the threads
  263. haven't yet restarted. This results in an apparently very unfair
  264. behavior, when the same thread repeatedly locks and unlocks the mutex,
  265. while other threads can't lock the mutex.<P>
  266. In LinuxThreads 0.8 and up, <code>pthread_unlock</code> restarts only
  267. one waiting thread, and pre-assign the mutex to that thread. Hence,
  268. if the thread that unlocked the mutex tries to lock it again
  269. immediately, it will block until other waiting threads have had a
  270. chance to lock and unlock the mutex. This results in much fairer
  271. scheduling.<P>
  272. Notice however that even the old "unfair" behavior is perfectly
  273. acceptable with respect to the POSIX standard: for the default
  274. scheduling policy, POSIX makes no guarantees of fairness, such as "the
  275. thread waiting for the mutex for the longest time always acquires it
  276. first". Properly written multithreaded code avoids that kind of heavy
  277. contention on mutexes, and does not run into fairness problems. If
  278. you need scheduling guarantees, you should consider using the
  279. real-time scheduling policies <code>SCHED_RR</code> and
  280. <code>SCHED_FIFO</code>, which have precisely defined scheduling
  281. behaviors. <P>
  282. <H4><A NAME="D.7">D.7: I have a simple test program with two threads
  283. that do nothing but <CODE>printf()</CODE> in tight loops, and from the
  284. printout it seems that only one thread is running, the other doesn't
  285. print anything!</A></H4>
  286. Again, this behavior is characteristic of old releases of LinuxThreads
  287. (0.7 and before); more recent versions (0.8 and up) should not exhibit
  288. this behavior.<P>
  289. The reason for this behavior is explained in
  290. question <A HREF="#D.6">D.6</A> above: <CODE>printf()</CODE> performs
  291. locking on <CODE>stdout</CODE>, and thus your two threads contend very
  292. heavily for the mutex associated with <CODE>stdout</CODE>. But if you
  293. do some real work between two calls to <CODE>printf()</CODE>, you'll
  294. see that scheduling becomes much smoother.<P>
  295. <H4><A NAME="D.8">D.8: I've looked at <code>&lt;pthread.h&gt;</code>
  296. and there seems to be a gross error in the <code>pthread_cleanup_push</code>
  297. macro: it opens a block with <code>{</code> but does not close it!
  298. Surely you forgot a <code>}</code> at the end of the macro, right?
  299. </A></H4>
  300. Nope. That's the way it should be. The closing brace is provided by
  301. the <code>pthread_cleanup_pop</code> macro. The POSIX standard
  302. requires <code>pthread_cleanup_push</code> and
  303. <code>pthread_cleanup_pop</code> to be used in matching pairs, at the
  304. same level of brace nesting. This allows
  305. <code>pthread_cleanup_push</code> to open a block in order to
  306. stack-allocate some data structure, and
  307. <code>pthread_cleanup_pop</code> to close that block. It's ugly, but
  308. it's the standard way of implementing cleanup handlers.<P>
  309. <H4><A NAME="D.9">D.9: I tried to use real-time threads and my program
  310. loops like crazy and freezes the whole machine!</A></H4>
  311. Versions of LinuxThreads prior to 0.8 are susceptible to ``livelocks''
  312. (one thread loops, consuming 100% of the CPU time) in conjunction with
  313. real-time scheduling. Since real-time threads and processes have
  314. higher priority than normal Linux processes, all other processes on
  315. the machine, including the shell, the X server, etc, cannot run and
  316. the machine appears frozen.<P>
  317. The problem is fixed in LinuxThreads 0.8.<P>
  318. <H4><A NAME="D.10">D.10: My application needs to create thousands of
  319. threads, or maybe even more. Can I do this with
  320. LinuxThreads?</A></H4>
  321. No. You're going to run into several hard limits:
  322. <UL>
  323. <LI>Each thread, from the kernel's standpoint, is one process. Stock
  324. Linux kernels are limited to at most 512 processes for the super-user,
  325. and half this number for regular users. This can be changed by
  326. changing <code>NR_TASKS</code> in <code>include/linux/tasks.h</code>
  327. and recompiling the kernel. On the x86 processors at least,
  328. architectural constraints seem to limit <code>NR_TASKS</code> to 4090
  329. at most.
  330. <LI>LinuxThreads contains a table of all active threads. This table
  331. has room for 1024 threads at most. To increase this limit, you must
  332. change <code>PTHREAD_THREADS_MAX</code> in the LinuxThreads sources
  333. and recompile.
  334. <LI>By default, each thread reserves 2M of virtual memory space for
  335. its stack. This space is just reserved; actual memory is allocated
  336. for the stack on demand. But still, on a 32-bit processor, the total
  337. virtual memory space available for the stacks is on the order of 1G,
  338. meaning that more than 500 threads will have a hard time fitting in.
  339. You can overcome this limitation by moving to a 64-bit platform, or by
  340. allocating smaller stacks yourself using the <code>setstackaddr</code>
  341. attribute.
  342. <LI>Finally, the Linux kernel contains many algorithms that run in
  343. time proportional to the number of process table entries. Increasing
  344. this number drastically will slow down the kernel operations
  345. noticeably.
  346. </UL>
  347. (Other POSIX threads libraries have similar limitations, by the way.)
  348. For all those reasons, you'd better restructure your application so
  349. that it doesn't need more than, say, 100 threads. For instance,
  350. in the case of a multithreaded server, instead of creating a new
  351. thread for each connection, maintain a fixed-size pool of worker
  352. threads that pick incoming connection requests from a queue.<P>
  353. <HR>
  354. <P>
  355. <H2><A NAME="E">E. Missing functions, wrong types, etc</A></H2>
  356. <H4><A NAME="E.1">E.1: Where is <CODE>pthread_yield()</CODE> ? How
  357. comes LinuxThreads does not implement it?</A></H4>
  358. Because it's not part of the (final) POSIX 1003.1c standard.
  359. Several drafts of the standard contained <CODE>pthread_yield()</CODE>,
  360. but then the POSIX guys discovered it was redundant with
  361. <CODE>sched_yield()</CODE> and dropped it. So, just use
  362. <CODE>sched_yield()</CODE> instead.
  363. <H4><A NAME="E.2">E.2: I've found some type errors in
  364. <code>&lt;pthread.h&gt;</code>.
  365. For instance, the second argument to <CODE>pthread_create()</CODE>
  366. should be a <CODE>pthread_attr_t</CODE>, not a
  367. <CODE>pthread_attr_t *</CODE>. Also, didn't you forget to declare
  368. <CODE>pthread_attr_default</CODE>?</A></H4>
  369. No, I didn't. What you're describing is draft 4 of the POSIX
  370. standard, which is used in OSF DCE threads. LinuxThreads conforms to the
  371. final standard. Even though the functions have the same names as in
  372. draft 4 and DCE, their calling conventions are slightly different. In
  373. particular, attributes are passed by reference, not by value, and
  374. default attributes are denoted by the NULL pointer. Since draft 4/DCE
  375. will eventually disappear, you'd better port your program to use the
  376. standard interface.<P>
  377. <H4><A NAME="E.3">E.3: I'm porting an application from Solaris and I
  378. have to rename all thread functions from <code>thr_blah</code> to
  379. <CODE>pthread_blah</CODE>. This is very annoying. Why did you change
  380. all the function names?</A></H4>
  381. POSIX did it. The <code>thr_*</code> functions correspond to Solaris
  382. threads, an older thread interface that you'll find only under
  383. Solaris. The <CODE>pthread_*</CODE> functions correspond to POSIX
  384. threads, an international standard available for many, many platforms.
  385. Even Solaris 2.5 and later support the POSIX threads interface. So,
  386. do yourself a favor and rewrite your code to use POSIX threads: this
  387. way, it will run unchanged under Linux, Solaris, and quite a lot of
  388. other platforms.<P>
  389. <H4><A NAME="E.4">E.4: How can I suspend and resume a thread from
  390. another thread? Solaris has the <CODE>thr_suspend()</CODE> and
  391. <CODE>thr_resume()</CODE> functions to do that; why don't you?</A></H4>
  392. The POSIX standard provides <B>no</B> mechanism by which a thread A can
  393. suspend the execution of another thread B, without cooperation from B.
  394. The only way to implement a suspend/restart mechanism is to have B
  395. check periodically some global variable for a suspend request
  396. and then suspend itself on a condition variable, which another thread
  397. can signal later to restart B.<P>
  398. Notice that <CODE>thr_suspend()</CODE> is inherently dangerous and
  399. prone to race conditions. For one thing, there is no control on where
  400. the target thread stops: it can very well be stopped in the middle of
  401. a critical section, while holding mutexes. Also, there is no
  402. guarantee on when the target thread will actually stop. For these
  403. reasons, you'd be much better off using mutexes and conditions
  404. instead. The only situations that really require the ability to
  405. suspend a thread are debuggers and some kind of garbage collectors.<P>
  406. If you really must suspend a thread in LinuxThreads, you can send it a
  407. <CODE>SIGSTOP</CODE> signal with <CODE>pthread_kill</CODE>. Send
  408. <CODE>SIGCONT</CODE> for restarting it.
  409. Beware, this is specific to LinuxThreads and entirely non-portable.
  410. Indeed, a truly conforming POSIX threads implementation will stop all
  411. threads when one thread receives the <CODE>SIGSTOP</CODE> signal!
  412. One day, LinuxThreads will implement that behavior, and the
  413. non-portable hack with <CODE>SIGSTOP</CODE> won't work anymore.<P>
  414. <H4><A NAME="E.5">E.5: Does LinuxThreads implement
  415. <CODE>pthread_attr_setstacksize()</CODE> and
  416. <CODE>pthread_attr_setstackaddr()</CODE>?</A></H4>
  417. These optional functions are provided in recent versions of
  418. LinuxThreads (0.8 and up). Earlier releases did not provide these
  419. optional components of the POSIX standard.<P>
  420. Even if <CODE>pthread_attr_setstacksize()</CODE> and
  421. <CODE>pthread_attr_setstackaddr()</CODE> are now provided, we still
  422. recommend that you do not use them unless you really have strong
  423. reasons for doing so. The default stack allocation strategy for
  424. LinuxThreads is nearly optimal: stacks start small (4k) and
  425. automatically grow on demand to a fairly large limit (2M).
  426. Moreover, there is no portable way to estimate the stack requirements
  427. of a thread, so setting the stack size yourself makes your program
  428. less reliable and non-portable.<P>
  429. <H4><A NAME="E.6">E.6: LinuxThreads does not support the
  430. <CODE>PTHREAD_SCOPE_PROCESS</CODE> value of the "contentionscope"
  431. attribute. Why? </A></H4>
  432. With a "one-to-one" model, as in LinuxThreads (one kernel execution
  433. context per thread), there is only one scheduler for all processes and
  434. all threads on the system. So, there is no way to obtain the behavior of
  435. <CODE>PTHREAD_SCOPE_PROCESS</CODE>.
  436. <H4><A NAME="E.7">E.7: LinuxThreads does not implement process-shared
  437. mutexes, conditions, and semaphores. Why?</A></H4>
  438. This is another optional component of the POSIX standard. Portable
  439. applications should test <CODE>_POSIX_THREAD_PROCESS_SHARED</CODE>
  440. before using this facility.
  441. <P>
  442. The goal of this extension is to allow different processes (with
  443. different address spaces) to synchronize through mutexes, conditions
  444. or semaphores allocated in shared memory (either SVR4 shared memory
  445. segments or <CODE>mmap()</CODE>ed files).
  446. <P>
  447. The reason why this does not work in LinuxThreads is that mutexes,
  448. conditions, and semaphores are not self-contained: their waiting
  449. queues contain pointers to linked lists of thread descriptors, and
  450. these pointers are meaningful only in one address space.
  451. <P>
  452. Matt Messier and I spent a significant amount of time trying to design a
  453. suitable mechanism for sharing waiting queues between processes. We
  454. came up with several solutions that combined two of the following
  455. three desirable features, but none that combines all three:
  456. <UL>
  457. <LI>allow sharing between processes having different UIDs
  458. <LI>supports cancellation
  459. <LI>supports <CODE>pthread_cond_timedwait</CODE>
  460. </UL>
  461. We concluded that kernel support is required to share mutexes,
  462. conditions and semaphores between processes. That's one place where
  463. Linus Torvalds's intuition that "all we need in the kernel is
  464. <CODE>clone()</CODE>" fails.
  465. <P>
  466. Until suitable kernel support is available, you'd better use
  467. traditional interprocess communications to synchronize different
  468. processes: System V semaphores and message queues, or pipes, or sockets.
  469. <P>
  470. <HR>
  471. <P>
  472. <H2><A NAME="F">F. C++ issues</A></H2>
  473. <H4><A NAME="F.1">F.1: Are there C++ wrappers for LinuxThreads?</A></H4>
  474. Douglas Schmidt's ACE library contains, among a lot of other
  475. things, C++ wrappers for LinuxThreads and quite a number of other
  476. thread libraries. Check out
  477. <A HREF="http://www.cs.wustl.edu/~schmidt/ACE.html">http://www.cs.wustl.edu/~schmidt/ACE.html</A><P>
  478. <H4><A NAME="F.2">F.2: I'm trying to use LinuxThreads from a C++
  479. program, and the compiler complains about the third argument to
  480. <CODE>pthread_create()</CODE> !</A></H4>
  481. You're probably trying to pass a class member function or some
  482. other C++ thing as third argument to <CODE>pthread_create()</CODE>.
  483. Recall that <CODE>pthread_create()</CODE> is a C function, and it must
  484. be passed a C function as third argument.<P>
  485. <H4><A NAME="F.3">F.3: I'm trying to use LinuxThreads in conjunction
  486. with libg++, and I'm having all sorts of trouble.</A></H4>
  487. >From what I understand, thread support in libg++ is completely broken,
  488. especially with respect to locking of iostreams. H.J.Lu wrote:
  489. <BLOCKQUOTE>
  490. If you want to use thread, I can only suggest egcs and glibc. You
  491. can find egcs at
  492. <A HREF="http://www.cygnus.com/egcs">http://www.cygnus.com/egcs</A>.
  493. egcs has libsdtc++, which is MT safe under glibc 2. If you really
  494. want to use the libg++, I have a libg++ add-on for egcs.
  495. </BLOCKQUOTE>
  496. <HR>
  497. <P>
  498. <H2><A NAME="G">G. Debugging LinuxThreads programs</A></H2>
  499. <H4><A NAME="G.1">G.1: Can I debug LinuxThreads program using gdb?</A></H4>
  500. Yes, but not with the stock gdb 4.17. You need a specially patched
  501. version of gdb 4.17 developed by Eric Paire and colleages at The Open
  502. Group, Grenoble. The patches against gdb 4.17 are available at
  503. <A HREF="http://www.gr.opengroup.org/java/jdk/linux/debug.htm"><code>http://www.gr.opengroup.org/java/jdk/linux/debug.htm</code></A>.
  504. Precompiled binaries of the patched gdb are available in RedHat's RPM
  505. format at <A
  506. HREF="http://odin.appliedtheory.com/"><code>http://odin.appliedtheory.com/</code></A>.<P>
  507. Some Linux distributions provide an already-patched version of gdb;
  508. others don't. For instance, the gdb in RedHat 5.2 is thread-aware,
  509. but apparently not the one in RedHat 6.0. Just ask (politely) the
  510. makers of your Linux distributions to please make sure that they apply
  511. the correct patches to gdb.<P>
  512. <H4><A NAME="G.2">G.2: Does it work with post-mortem debugging?</A></H4>
  513. Not very well. Generally, the core file does not correspond to the
  514. thread that crashed. The reason is that the kernel will not dump core
  515. for a process that shares its memory with other processes, such as the
  516. other threads of your program. So, the thread that crashes silently
  517. disappears without generating a core file. Then, all other threads of
  518. your program die on the same signal that killed the crashing thread.
  519. (This is required behavior according to the POSIX standard.) The last
  520. one that dies is no longer sharing its memory with anyone else, so the
  521. kernel generates a core file for that thread. Unfortunately, that's
  522. not the thread you are interested in.
  523. <H4><A NAME="G.3">G.3: Any other ways to debug multithreaded programs, then?</A></H4>
  524. Assertions and <CODE>printf()</CODE> are your best friends. Try to debug
  525. sequential parts in a single-threaded program first. Then, put
  526. <CODE>printf()</CODE> statements all over the place to get execution traces.
  527. Also, check invariants often with the <CODE>assert()</CODE> macro. In truth,
  528. there is no other effective way (save for a full formal proof of your
  529. program) to track down concurrency bugs. Debuggers are not really
  530. effective for subtle concurrency problems, because they disrupt
  531. program execution too much.<P>
  532. <HR>
  533. <P>
  534. <H2><A NAME="H">H. Compiling multithreaded code; errno madness</A></H2>
  535. <H4><A NAME="H.1">H.1: You say all multithreaded code must be compiled
  536. with <CODE>_REENTRANT</CODE> defined. What difference does it make?</A></H4>
  537. It affects include files in three ways:
  538. <UL>
  539. <LI> The include files define prototypes for the reentrant variants of
  540. some of the standard library functions,
  541. e.g. <CODE>gethostbyname_r()</CODE> as a reentrant equivalent to
  542. <CODE>gethostbyname()</CODE>.<P>
  543. <LI> If <CODE>_REENTRANT</CODE> is defined, some
  544. <code>&lt;stdio.h&gt;</code> functions are no longer defined as macros,
  545. e.g. <CODE>getc()</CODE> and <CODE>putc()</CODE>. In a multithreaded
  546. program, stdio functions require additional locking, which the macros
  547. don't perform, so we must call functions instead.<P>
  548. <LI> More importantly, <code>&lt;errno.h&gt;</code> redefines errno when
  549. <CODE>_REENTRANT</CODE> is
  550. defined, so that errno refers to the thread-specific errno location
  551. rather than the global errno variable. This is achieved by the
  552. following <code>#define</code> in <code>&lt;errno.h&gt;</code>:
  553. <PRE>
  554. #define errno (*(__errno_location()))
  555. </PRE>
  556. which causes each reference to errno to call the
  557. <CODE>__errno_location()</CODE> function for obtaining the location
  558. where error codes are stored. libc provides a default definition of
  559. <CODE>__errno_location()</CODE> that always returns
  560. <code>&errno</code> (the address of the global errno variable). Thus,
  561. for programs not linked with LinuxThreads, defining
  562. <CODE>_REENTRANT</CODE> makes no difference w.r.t. errno processing.
  563. But LinuxThreads redefines <CODE>__errno_location()</CODE> to return a
  564. location in the thread descriptor reserved for holding the current
  565. value of errno for the calling thread. Thus, each thread operates on
  566. a different errno location.
  567. </UL>
  568. <P>
  569. <H4><A NAME="H.2">H.2: Why is it so important that each thread has its
  570. own errno variable? </A></H4>
  571. If all threads were to store error codes in the same, global errno
  572. variable, then the value of errno after a system call or library
  573. function returns would be unpredictable: between the time a system
  574. call stores its error code in the global errno and your code inspects
  575. errno to see which error occurred, another thread might have stored
  576. another error code in the same errno location. <P>
  577. <H4><A NAME="H.3">H.3: What happens if I link LinuxThreads with code
  578. not compiled with <CODE>-D_REENTRANT</CODE>?</A></H4>
  579. Lots of trouble. If the code uses <CODE>getc()</CODE> or
  580. <CODE>putc()</CODE>, it will perform I/O without proper interlocking
  581. of the stdio buffers; this can cause lost output, duplicate output, or
  582. just crash other stdio functions. If the code consults errno, it will
  583. get back the wrong error code. The following code fragment is a
  584. typical example:
  585. <PRE>
  586. do {
  587. r = read(fd, buf, n);
  588. if (r == -1) {
  589. if (errno == EINTR) /* an error we can handle */
  590. continue;
  591. else { /* other errors are fatal */
  592. perror("read failed");
  593. exit(100);
  594. }
  595. }
  596. } while (...);
  597. </PRE>
  598. Assume this code is not compiled with <CODE>-D_REENTRANT</CODE>, and
  599. linked with LinuxThreads. At run-time, <CODE>read()</CODE> is
  600. interrupted. Since the C library was compiled with
  601. <CODE>-D_REENTRANT</CODE>, <CODE>read()</CODE> stores its error code
  602. in the location pointed to by <CODE>__errno_location()</CODE>, which
  603. is the thread-local errno variable. Then, the code above sees that
  604. <CODE>read()</CODE> returns -1 and looks up errno. Since
  605. <CODE>_REENTRANT</CODE> is not defined, the reference to errno
  606. accesses the global errno variable, which is most likely 0. Hence the
  607. code concludes that it cannot handle the error and stops.<P>
  608. <H4><A NAME="H.4">H.4: With LinuxThreads, I can no longer use the signals
  609. <code>SIGUSR1</code> and <code>SIGUSR2</code> in my programs! Why? </A></H4>
  610. The short answer is: because the Linux kernel you're using does not
  611. support realtime signals. <P>
  612. LinuxThreads needs two signals for its internal operation.
  613. One is used to suspend and restart threads blocked on mutex, condition
  614. or semaphore operations. The other is used for thread
  615. cancellation.<P>
  616. On ``old'' kernels (2.0 and early 2.1 kernels), there are only 32
  617. signals available and the kernel reserves all of them but two:
  618. <code>SIGUSR1</code> and <code>SIGUSR2</code>. So, LinuxThreads has
  619. no choice but use those two signals.<P>
  620. On recent kernels (2.2 and up), more than 32 signals are provided in
  621. the form of realtime signals. When run on one of those kernels,
  622. LinuxThreads uses two reserved realtime signals for its internal
  623. operation, thus leaving <code>SIGUSR1</code> and <code>SIGUSR2</code>
  624. free for user code. (This works only with glibc, not with libc 5.) <P>
  625. <H4><A NAME="H.5">H.5: Is the stack of one thread visible from the
  626. other threads? Can I pass a pointer into my stack to other threads?
  627. </A></H4>
  628. Yes, you can -- if you're very careful. The stacks are indeed visible
  629. from all threads in the system. Some non-POSIX thread libraries seem
  630. to map the stacks for all threads at the same virtual addresses and
  631. change the memory mapping when they switch from one thread to
  632. another. But this is not the case for LinuxThreads, as it would make
  633. context switching between threads more expensive, and at any rate
  634. might not conform to the POSIX standard.<P>
  635. So, you can take the address of an "auto" variable and pass it to
  636. other threads via shared data structures. However, you need to make
  637. absolutely sure that the function doing this will not return as long
  638. as other threads need to access this address. It's the usual mistake
  639. of returning the address of an "auto" variable, only made much worse
  640. because of concurrency. It's much, much safer to systematically
  641. heap-allocate all shared data structures. <P>
  642. <HR>
  643. <P>
  644. <H2><A NAME="I">I. X-Windows and other libraries</A></H2>
  645. <H4><A NAME="I.1">I.1: My program uses both Xlib and LinuxThreads.
  646. It stops very early with an "Xlib: unknown 0 error" message. What
  647. does this mean? </A></H4>
  648. That's a prime example of the errno problem described in question <A
  649. HREF="#H.2">H.2</A>. The binaries for Xlib you're using have not been
  650. compiled with <CODE>-D_REENTRANT</CODE>. It happens Xlib contains a
  651. piece of code very much like the one in question <A
  652. HREF="#H.2">H.2</A>. So, your Xlib fetches the error code from the
  653. wrong errno location and concludes that an error it cannot handle
  654. occurred.<P>
  655. <H4><A NAME="I.2">I.2: So, what can I do to build a multithreaded X
  656. Windows client? </A></H4>
  657. The best solution is to use X libraries that have been compiled with
  658. multithreading options set. Linux distributions that come with glibc
  659. 2 as the main C library generally provide thread-safe X libraries.
  660. At least, that seems to be the case for RedHat 5 and later.<P>
  661. You can try to recompile yourself the X libraries with multithreading
  662. options set. They contain optional support for multithreading; it's
  663. just that the binaries provided by your Linux distribution were built
  664. without this support. See the file <code>README.Xfree3.3</code> in
  665. the LinuxThreads distribution for patches and info on how to compile
  666. thread-safe X libraries from the Xfree3.3 distribution. The Xfree3.3
  667. sources are readily available in most Linux distributions, e.g. as a
  668. source RPM for RedHat. Be warned, however, that X Windows is a huge
  669. system, and recompiling even just the libraries takes a lot of time
  670. and disk space.<P>
  671. Another, less involving solution is to call X functions only from the
  672. main thread of your program. Even if all threads have their own errno
  673. location, the main thread uses the global errno variable for its errno
  674. location. Thus, code not compiled with <code>-D_REENTRANT</code>
  675. still "sees" the right error values if it executes in the main thread
  676. only. <P>
  677. <H4><A NAME="I.2">This is a lot of work. Don't you have precompiled
  678. thread-safe X libraries that you could distribute?</A></H4>
  679. No, I don't. Sorry. But consider installing a Linux distribution
  680. that comes with thread-safe X libraries, such as RedHat 6.<P>
  681. <H4><A NAME="I.3">I.3: Can I use library FOO in a multithreaded
  682. program?</A></H4>
  683. Most libraries cannot be used "as is" in a multithreaded program.
  684. For one thing, they are not necessarily thread-safe: calling
  685. simultaneously two functions of the library from two threads might not
  686. work, due to internal use of global variables and the like. Second,
  687. the libraries must have been compiled with <CODE>-D_REENTRANT</CODE> to avoid
  688. the errno problems explained in question <A HREF="#H.2">H.2</A>.
  689. <P>
  690. <H4><A NAME="I.4">I.4: What if I make sure that only one thread calls
  691. functions in these libraries?</A></H4>
  692. This avoids problems with the library not being thread-safe. But
  693. you're still vulnerable to errno problems. At the very least, a
  694. recompile of the library with <CODE>-D_REENTRANT</CODE> is needed.
  695. <P>
  696. <H4><A NAME="I.5">I.5: What if I make sure that only the main thread
  697. calls functions in these libraries?</A></H4>
  698. That might actually work. As explained in question <A HREF="#I.1">I.1</A>,
  699. the main thread uses the global errno variable, and can therefore
  700. execute code not compiled with <CODE>-D_REENTRANT</CODE>.<P>
  701. <H4><A NAME="I.6">I.6: SVGAlib doesn't work with LinuxThreads. Why?
  702. </A></H4>
  703. Because both LinuxThreads and SVGAlib use the signals
  704. <code>SIGUSR1</code> and <code>SIGUSR2</code>. See question <A
  705. HREF="#H.4">H.4</A>.
  706. <P>
  707. <HR>
  708. <P>
  709. <H2><A NAME="J">J. Signals and threads</A></H2>
  710. <H4><A NAME="J.1">J.1: When it comes to signals, what is shared
  711. between threads and what isn't?</A></H4>
  712. Signal handlers are shared between all threads: when a thread calls
  713. <CODE>sigaction()</CODE>, it sets how the signal is handled not only
  714. for itself, but for all other threads in the program as well.<P>
  715. On the other hand, signal masks are per-thread: each thread chooses
  716. which signals it blocks independently of others. At thread creation
  717. time, the newly created thread inherits the signal mask of the thread
  718. calling <CODE>pthread_create()</CODE>. But afterwards, the new thread
  719. can modify its signal mask independently of its creator thread.<P>
  720. <H4><A NAME="J.2">J.2: When I send a <CODE>SIGKILL</CODE> to a
  721. particular thread using <CODE>pthread_kill</CODE>, all my threads are
  722. killed!</A></H4>
  723. That's how it should be. The POSIX standard mandates that all threads
  724. should terminate when the process (i.e. the collection of all threads
  725. running the program) receives a signal whose effect is to
  726. terminate the process (such as <CODE>SIGKILL</CODE> or <CODE>SIGINT</CODE>
  727. when no handler is installed on that signal). This behavior makes a
  728. lot of sense: when you type "ctrl-C" at the keyboard, or when a thread
  729. crashes on a division by zero or a segmentation fault, you really want
  730. all threads to stop immediately, not just the one that caused the
  731. segmentation violation or that got the <CODE>SIGINT</CODE> signal.
  732. (This assumes default behavior for those signals; see question
  733. <A HREF="#J.3">J.3</A> if you install handlers for those signals.)<P>
  734. If you're trying to terminate a thread without bringing the whole
  735. process down, use <code>pthread_cancel()</code>.<P>
  736. <H4><A NAME="J.3">J.3: I've installed a handler on a signal. Which
  737. thread executes the handler when the signal is received?</A></H4>
  738. If the signal is generated by a thread during its execution (e.g. a
  739. thread executes a division by zero and thus generates a
  740. <CODE>SIGFPE</CODE> signal), then the handler is executed by that
  741. thread. This also applies to signals generated by
  742. <CODE>raise()</CODE>.<P>
  743. If the signal is sent to a particular thread using
  744. <CODE>pthread_kill()</CODE>, then that thread executes the handler.<P>
  745. If the signal is sent via <CODE>kill()</CODE> or the tty interface
  746. (e.g. by pressing ctrl-C), then the POSIX specs say that the handler
  747. is executed by any thread in the process that does not currently block
  748. the signal. In other terms, POSIX considers that the signal is sent
  749. to the process (the collection of all threads) as a whole, and any
  750. thread that is not blocking this signal can then handle it.<P>
  751. The latter case is where LinuxThreads departs from the POSIX specs.
  752. In LinuxThreads, there is no real notion of ``the process as a whole'':
  753. in the kernel, each thread is really a distinct process with a
  754. distinct PID, and signals sent to the PID of a thread can only be
  755. handled by that thread. As long as no thread is blocking the signal,
  756. the behavior conforms to the standard: one (unspecified) thread of the
  757. program handles the signal. But if the thread to which PID the signal
  758. is sent blocks the signal, and some other thread does not block the
  759. signal, then LinuxThreads will simply queue in
  760. that thread and execute the handler only when that thread unblocks
  761. the signal, instead of executing the handler immediately in the other
  762. thread that does not block the signal.<P>
  763. This is to be viewed as a LinuxThreads bug, but I currently don't see
  764. any way to implement the POSIX behavior without kernel support.<P>
  765. <H4><A NAME="J.3">J.3: How shall I go about mixing signals and threads
  766. in my program? </A></H4>
  767. The less you mix them, the better. Notice that all
  768. <CODE>pthread_*</CODE> functions are not async-signal safe, meaning
  769. that you should not call them from signal handlers. This
  770. recommendation is not to be taken lightly: your program can deadlock
  771. if you call a <CODE>pthread_*</CODE> function from a signal handler!
  772. <P>
  773. The only sensible things you can do from a signal handler is set a
  774. global flag, or call <CODE>sem_post</CODE> on a semaphore, to record
  775. the delivery of the signal. The remainder of the program can then
  776. either poll the global flag, or use <CODE>sem_wait()</CODE> and
  777. <CODE>sem_trywait()</CODE> on the semaphore.<P>
  778. Another option is to do nothing in the signal handler, and dedicate
  779. one thread (preferably the initial thread) to wait synchronously for
  780. signals, using <CODE>sigwait()</CODE>, and send messages to the other
  781. threads accordingly.
  782. <H4><A NAME="J.4">J.4: When one thread is blocked in
  783. <CODE>sigwait()</CODE>, other threads no longer receive the signals
  784. <CODE>sigwait()</CODE> is waiting for! What happens? </A></H4>
  785. It's an unfortunate consequence of how LinuxThreads implements
  786. <CODE>sigwait()</CODE>. Basically, it installs signal handlers on all
  787. signals waited for, in order to record which signal was received.
  788. Since signal handlers are shared with the other threads, this
  789. temporarily deactivates any signal handlers you might have previously
  790. installed on these signals.<P>
  791. Though surprising, this behavior actually seems to conform to the
  792. POSIX standard. According to POSIX, <CODE>sigwait()</CODE> is
  793. guaranteed to work as expected only if all other threads in the
  794. program block the signals waited for (otherwise, the signals could be
  795. delivered to other threads than the one doing <CODE>sigwait()</CODE>,
  796. which would make <CODE>sigwait()</CODE> useless). In this particular
  797. case, the problem described in this question does not appear.<P>
  798. One day, <CODE>sigwait()</CODE> will be implemented in the kernel,
  799. along with others POSIX 1003.1b extensions, and <CODE>sigwait()</CODE>
  800. will have a more natural behavior (as well as better performances).<P>
  801. <HR>
  802. <P>
  803. <H2><A NAME="K">K. Internals of LinuxThreads</A></H2>
  804. <H4><A NAME="K.1">K.1: What is the implementation model for
  805. LinuxThreads?</A></H4>
  806. LinuxThreads follows the so-called "one-to-one" model: each thread is
  807. actually a separate process in the kernel. The kernel scheduler takes
  808. care of scheduling the threads, just like it schedules regular
  809. processes. The threads are created with the Linux
  810. <code>clone()</code> system call, which is a generalization of
  811. <code>fork()</code> allowing the new process to share the memory
  812. space, file descriptors, and signal handlers of the parent.<P>
  813. Advantages of the "one-to-one" model include:
  814. <UL>
  815. <LI> minimal overhead on CPU-intensive multiprocessing (with
  816. about one thread per processor);
  817. <LI> minimal overhead on I/O operations;
  818. <LI> a simple and robust implementation (the kernel scheduler does
  819. most of the hard work for us).
  820. </UL>
  821. The main disadvantage is more expensive context switches on mutex and
  822. condition operations, which must go through the kernel. This is
  823. mitigated by the fact that context switches in the Linux kernel are
  824. pretty efficient.<P>
  825. <H4><A NAME="K.2">K.2: Have you considered other implementation
  826. models?</A></H4>
  827. There are basically two other models. The "many-to-one" model
  828. relies on a user-level scheduler that context-switches between the
  829. threads entirely in user code; viewed from the kernel, there is only
  830. one process running. This model is completely out of the question for
  831. me, since it does not take advantage of multiprocessors, and require
  832. unholy magic to handle blocking I/O operations properly. There are
  833. several user-level thread libraries available for Linux, but I found
  834. all of them deficient in functionality, performance, and/or robustness.
  835. <P>
  836. The "many-to-many" model combines both kernel-level and user-level
  837. scheduling: several kernel-level threads run concurrently, each
  838. executing a user-level scheduler that selects between user threads.
  839. Most commercial Unix systems (Solaris, Digital Unix, IRIX) implement
  840. POSIX threads this way. This model combines the advantages of both
  841. the "many-to-one" and the "one-to-one" model, and is attractive
  842. because it avoids the worst-case behaviors of both models --
  843. especially on kernels where context switches are expensive, such as
  844. Digital Unix. Unfortunately, it is pretty complex to implement, and
  845. requires kernel support which Linux does not provide. Linus Torvalds
  846. and other Linux kernel developers have always been pushing the
  847. "one-to-one" model in the name of overall simplicity, and are doing a
  848. pretty good job of making kernel-level context switches between
  849. threads efficient. LinuxThreads is just following the general
  850. direction they set.<P>
  851. <HR>
  852. <ADDRESS>Xavier.Leroy@inria.fr</ADDRESS>
  853. </BODY>
  854. </HTML>